Six Things That Go Wrong With Discussions About Testing

Talking about software testing is not easy. It’s not natural! Testing is a “meta” activity. It’s not just a task, but a task that generates new tasks (by finding bugs that should be fixed or finding new risks that must be examined). It’s a task that can never be “completed” yet must get “done.”

Confusion about testing leads to ineffective conversations that focus on unimportant issues while ignoring the things that matter. Here are some specific ways that testing conversations fail:

  1. When people care about how many test cases they have instead of what their testing actually does. The number of test cases (e.g. 500, 257, 39345) tells nothing to anyone about “how much testing” you are doing. The reason that developers don’t brag about how many files they created today while developing their product is that everyone knows that it’s silly to count files, or keystrokes, or anything like that. For the same reasons, it is silly to count test cases. The same test activity can be represented as one test case or one million test cases. What if a tester writes software that automatically creates 100,000 variations of a single test case? Is that really “100,000” test cases, or is it one big test case, or is it no test case at all? The next time someone gives you a test case count, practice saying to yourself “that tells me nothing at all.” Then ask a question about what the tests actually do: What do they cover? What bugs can they detect? What risks are they motivated by?
  2. When people speak of a test as an object rather than an event. A test is not a physical object, although physical things such as documentation, data, and code can be a part of tests. A test is a performance; an activity; it’s something that you do. By speaking of a test as an object rather than a performance, you skip right over the most important part of a test: the attention, motivation, integrity, and skill of the tester. No two different testers ever perform the “same test” in the “same way” in all the ways that matter. Technically, you can’t take a test case and give it to someone else without changing the resulting test in some way (just as no quarterback or baseball player will execute the same play in the same way twice) although the changes don’t necessarily matter.
  3. When people can’t describe their test strategy. Test strategy is the set of ideas that guide your choices about what tests to design and what tests to perform in any given situation. Test strategy could also be called the reasoning behind the actions that comprise each test. Test strategy is the answer to questions such as “why are these tests worth doing?” “why not do different tests instead?” “what could we change if we wanted to test more deeply?” “what would we change if we wanted to test more quickly?” “why are we doing testing this way?”The ability to design and discuss test strategy is a hallmark of professional testing. Otherwise, testing would just be a matter of habit and intuition.
  4. When people talk as if automation does testing instead of humans. If developers spoke of development the way that so many people speak of testing, they would say that their compiler created their product, and that all they do is operate the compiler. They would say that the product was created “automatically” rather than by particular people who worked hard and smart to write the code. And management would become obsessed with “automating development” by getting ever better tools instead of hiring and training excellent developers. A better way to speak about testing is the same way we speak about development: it’s something that people do, not tools. Tools help, but tools do not do testing.There is no such thing as an automated test. The most a tool can do is operate a product according to a script and check for specific output according to a script. That would not be a test, but rather a fact check about the product. Tools can do fact checking very well. But testing is more than fact checking because testers must use technical judgment and ingenuity to create the checks and evaluate them and maintain and improve them. The name for that entire human process (supported by tools) is testing. When you focus on “automated tests” you usually defocus from the skills, judgment, problem-solving, and motivation that actually controls the quality of the testing. And then you are not dealing with the important factors that control the quality of testing.
  5. When people talk as if there is only one kind of test coverage. There are many ways you can cover the product when you test it. Each method of assessing coverage is different and has its own dynamics. No one way of talking about it (e.g. code coverage) gives you enough of the story. Just as one example, if you test a page that provides search results for a query, you have covered the functionality represented by the kind of query that you just did (function coverage), and you have covered it with the particular data set of items that existed at that time (data coverage). If you change the query to invoke a different kind of search, you will get new functional coverage. If you change the data set, you will get new data coverage. Either way, you may find a new bug with that new coverage. Functions interact with data; therefore good testing involves covering not just one or the other but also with both together in different combinations.
  6. When people talk as if testing is a static task that is easily formalized. Testing is a learning task; it is fundamentally about learning. If you tell me you are testing, but not learning anything, I say you are not testing at all. And the nature of any true learning is that you can’t know what you will discover next– it is an exploratory enterprise.It’s the same way with many things we do in life, from driving a car to managing a company. There are indeed things that we can predict will happen and patterns we might use to organize our actions, but none of that means you can sleepwalk through it by putting your head down and following a script. To test is to continually question what you are doing and seeing.

    The process of professional testing is not design test cases and then follow the test cases. No responsible tester works this way. Responsible testing is a constant process of investigation and experiment design. This may involve designing procedures and automation that systematically collects data about the product, but all of that must be done with the understanding that we respond to the situation in front of us as it unfolds. We deviate frequently from procedures we establish because software is complicated and surprising; and because the organization has shifting needs; and because we learn of better ways to test as we go.

Through these and other failures in testing conversations, people persist in the belief that good testing is just a matter of writing ever more “test cases” (regardless of what they do); automating them (regardless of what automation can’t do); passing them from one untrained tester to another; all the while fetishizing the files and scripts themselves instead of looking at what the testers are doing with them from day to day.

Reinventing Testing: What is Integration Testing? (part 2)

These thoughts have become better because of these specific commenters on part 1: Jeff Nyman, James Huggett, Sean McErlean, Liza Ivinskaia, Jokin Aspiazu, Maxim Mikhailov, Anita Gujarathi, Mike Talks, Amit Wertheimer, Simon Morley, Dimitar Dimitrov, John Stevenson. Additionally, thank you Michael Bolton and thanks to the student whose productive confusion helped me discover a blindspot in my work, Anita Gujarathi.

Integration testing is a term I don’t use much– not because it doesn’t matter, but because it is so fundamental that it is already baked into many of the other working concepts and techniques of testing. Still, in the past week, I decided to upgrade my ability to quickly explain integration, integration risk, and integration testing. This is part of a process I recommend for all serious testers. I call it: reinventing testing. Each of us may reinvent testing concepts for ourselves, and engage in vigorous debates about them (see the comments on part 1, which is now the most commented of any post I have ever done).

For those of you interested in getting to a common language for testing, this is what I believe is the best way we have available to us. As each of us works to clarify his own thinking, a de facto consensus about reasonable testing ontology will form over time, community by community.

So here we go…

There several kinds of testing that involve or overlap with or may even be synonymous with integration testing, including: regression testing, system testing, field testing, interoperability testing, compatibility testing, platform testing, and risk-based testing. Most testing, in fact, no matter what it’s called, is also integration testing.

Here is my definition of integration testing, based on my own analysis, conversations with RST instructors (mainly Michael Bolton), and stimulated by the many commenters from part 1. All of my assertions and definitions are true within the Rapid Software Testing methodology namespace, which means that you don’t have to agree with me unless you claim to be using RST.

What is integration testing?

Integration testing is:
1. Testing motivated by potential risk related to integration.
2. Tests designed specifically to assess risk related to integration.


1. “Motivated by” and “designed specifically to” overlap but are not the same. For instance, if you know that a dangerous criminal is on the loose in your neighborhood you may behave in a generally cautious or vigilant way even if you don’t know where the criminal is or what he looks like. But if you know what he looks like, what he is wearing, how he behaves or where he is, you can take more specific measures to find him or avoid him. Similarly, a newly integrated product may create a situation where any kind of testing may be worth doing, even if that testing is not specifically aimed at uncovering integration bugs, as such; OR you can perform tests aimed at exposing just the sort of bugs that integration typically causes, such as by performing operations that maximize the interaction of components.

The phrase “integration testing” may therefore represent ANY testing performed specifically in an “integration context”, or applying a specific “integration test technique” in ANY context.

This is a special case of the difference between risk-based test management and risk-based test design. The former assigns resources to places where there is potential risk but does not dictate the testing to be performed; whereas the latter crafts specific tests to examine the product for specific kinds of problems.

2. “Potential risk” is not the same as “risk.” Risk is the danger of something bad happening, and it can be viewed from at least three perspectives: probability of a bad event occurring, the impact of that event if it occurs, and our uncertainty about either of those things. A potential risk is a risk about which there is substantial uncertainty (in other words, you don’t know how likely the bug is to be in the product or you don’t know how bad it could be if it were present). The main point of testing is to eliminate uncertainty about risk, so this often begins with guessing about potential risk (in other words, making wild guesses, educated guesses, or highly informed analyses about where bugs are likely to be).

Example: I am testing something for the first time. I don’t know how it will deal with stressful input, but stress often causes failure, so that’s a potential risk. If I were to perform stress testing, I would learn a lot about how the product really handles stress, and the potential risk would be transformed into a high risk (if I found serious bugs related to stress) or a low risk (if the product handled stress in a consistently graceful way).

What is integration?

General definition from the Oxford English Dictionary: “The making up or composition of a whole by adding together or combining the separate parts or elements; combination into an integral whole: a making whole or entire.”

Based on this, we can make a simple technical definition related to products:

Integration is:
v. the process of constructing a product from parts.
n. a product constructed from parts.

Now, based on General Systems Theory, we make these assertions:

An integration, in some way and to some degree:

  1. Is composed of parts:
  • …that come from differing sources.
  • …that were produced for differing purposes.
  • …that were produced at different times.
  • …that have differing attributes.
  1. Creates or represents an internal environment for its parts:
  • …in which its parts interact among themselves.
  • …in which its parts depend on each other.
  • …in which its parts interact with or depend on an external environment.
  • …in which these things are not visible from the outside.
  1. Possesses attributes relative to its parts:
  • …that depend on them.
  • …that differ from them.

Therefore, you might not be able to discern everything you want to know about an integration just by looking at its parts.

This is why integration risk exists. In complex or important systems, integration testing will be critically important, especially after changes have been made.

It may be possible to gain enough knowledge about an integration to characterize the risk (or to speak more plainly: it may be possible to find all the important integration bugs) without doing integration testing. You might be able to do it with unit testing. However, that process, although possible in some cases, might be impractical. This is the case partly because the parts may have been produced by different people with different assumptions, because it is difficult to simulate the environment of an integration prior to actual integration, or because unit testing tends to focus on what the units CAN do and not on what they ACTUALLY NEED to do. (If you unit test a calculator, that’s a lot of work. But if that calculator will only ever be asked to add numbers under 50, you don’t need to do all that work.)

Integration testing, although in some senses being complex, may actually simplify your testing since some parts mask the behavior of other parts and maybe all you need to care about is the final outputs.


1. “In some way and to some degree” means that these assertions are to be interpreted heuristically. In any specific situation, these assertions are highly likely to apply in some interesting or important way, but might not. An obvious example is where I wrote above that the “parts interact with each other.” The stricter truth is that the parts within an integration probably do not EACH directly interact with ALL the other ones, and probably do not interact to the same degree and in the same ways. To think of it heuristically, interpret it as a gentle warning such as  “if you integrate something, make it your business to know how the parts might interact or depend on each other, because that knowledge is probably important.”

By using the phrase “in some way and to some degree” as a blanket qualifier, I can simplify the rest of the text, since I don’t have to embed other qualifiers.

2. “Constructing from parts” does not necessarily mean that the parts pre-existed the product, or have a separate existence outside the product, or are unchanged by the process of integration. It just means that we can think productively about pieces of the product and how they interact with other pieces.

3. A product may possess attributes that none of its parts possess, or that differ from them in unanticipated or unknown ways. A simple example is the stability of a tripod, which is not found in any of its individual legs, but in all the legs working together.

4. Disintegration also creates integration risk. When you takes things away, or take things apart, you end up with a new integration, and that is subject to the much the same risk as putting them together.

5. The attributes of a product and all its behaviors obviously depend largely on the parts that comprise it, but also on other factors such as the state of those parts, the configurations and states of external and internal environments, and the underlying rules by which those things operate (ultimately, physics, but more immediately, the communication and processing protocols of the computing environment).

6. Environment refers to the outside of some object (an object being a product or a part of a product), comprising factors that may interact with that object. A particular environment might be internal in some respects or external in other respects, at the same time.

  • An internal environment is an environment controlled by the product and accessible only to its parts. It is inside the product, but from the point vantage point of some of parts, it’s outside of them. For instance, to a spark plug the inside of an engine cylinder is an environment, but since it is not outside the car as a whole, it’s an internal environment. Technology often consists of deeply nested environments.
  • An external environment is an environment inhabited but not controlled by the product.
  • Control is not an all-or-nothing thing. There are different levels and types of control. For this reason it is not always possible to strictly identify the exact scope of a product or its various and possibly overlapping environments. This fact is much of what makes testing– and especially security testing– such a challenging problem. A lot of malicious hacking is based on the discovery that something that the developers thought was outside the product is sometimes inside it.

7. An interaction occurs when one thing influences another thing. (A “thing” can be a part, an environment, a whole product, or anything else.)

8. A dependency occurs when one thing requires another thing to perform an action or possess an attribute (or not to) in order for the first thing to behave in a certain way or fulfill a certain requirement. See connascence and coupling.

9. Integration is not all or nothing– there are differing degrees and kinds. A product may be accidentally integrated, in that it works using parts that no one realizes that it has. It may be loosely integrated, such as a gecko that can jettison its tail, or a browser with a plugin. It may be tightly integrated, such as when we take the code from one product and add it to another product in different places, editing as we go. (Or when you digest food.) It may preserve the existing interfaces of its parts or violate them or re-design them or eliminate them. The integration definition and assertions, above, form a heuristic pattern– a sort of lens– by which we can make better sense of the product and how it might fail. Different people may identify different things as parts, environments or products. That’s okay. We are free to move the lens around and try out different perspectives, too.

Example of an Integration Problem


This diagram shows a classic integration bug: dueling dependencies. In the top two panels, two components are happy to work within their own environments. Neither is aware of the other while they work on, let’s say, separate computers.

But when they are installed together on the same machine, it may turn out that each depends on factors that exclude the other. Even though the components themselves don’t clash (the blue A box and the blue B boxes don’t overlap). Often such dependencies are poorly documented, and may be entirely unknown to the developer before integration time.

It is possible to discover this through unit testing… but so much easier and probably cheaper just to try to integrate sooner rather than later and test in that context.


Re-Inventing Testing: What is Integration Testing? (Part 1)

(Thank you, Anne-Marie Charrett, for reviewing my work and helping with this post.)

One of the reasons I obsessively coach other testers is that they help me test my own expertise. Here is a particularly nice case of that, while working with a particularly bright and resilient student, Anita Gujrathi, (whose full name I am using here with her permission).

The topic was integration testing. I chose it from a list of skills Anita made for herself. It stood out because integration testing is one of those labels that everyone uses, yet few can define. Part of what I do with testers is help them become aware of things that they might think they know, yet may have only a vague intuition about. Once we identify those things, we can study and deepen that knowledge together.

Here is the start of our conversation (with minor edits for grammar and punctuation, and commentary in brackets):

What do you mean by integration testing?
[As I ask her this question I am simultaneously asking myself the same question. This is part of a process known as transpection. Also, I am not looking for “one right answer” but rather am exploring and exercising her thought processes, which is called the Socratic Method.]

Integration test is the test conducted when we are integrating two or more systems.
[This is not a wrong answer, but it is shallow, so I will press for more details.

By shallow, I mean that leaves out a lot of detail and nuances. A shallow answer may be fine in a lot of situations, but in coaching it is a black box that I must open.]

What do you mean by integrated?

That means kind of joining two systems such that they give and take data.
[This is a good answer but again it is shallow. She said “kind of” which I take as a signal that she may be not quite sure what words to use. I am wondering if she understands the technical aspects of how components are joined together during integration. For instance, when two systems share an operating space, they may have conflicting dependencies which may be discovered only in certain situations. I want to push for a more detailed answer in order to see what she knows about that sort of thing.]

What does it mean to join two systems?
[This process is called “driving to detail” or “drilling down”. I just keep asking for more depth in the answer by picking key ideas and asking what they mean. Sometimes I do this by asking for an example.]

For example, there is an application called WorldMate which processes the itineraries of the travellers and generates an XML file, and there is another application which creates the trip in its own format to track the travellers using that XML.
[Students will frequently give me an example when they don’t know how to explain a concept. They are usually hoping I will “get it” and thus release them from having to explain anything more. Examples are helpful, of course, but I’m not going to let her off the hook. I want to know how well she understands the concept of joining systems.

The interesting thing about this example is that it illustrates a weak form of integration– so weak that if she doesn’t understand the concept of integration well enough, I might be able to convince her that no integration is illustrated here.

What makes her example a case of weak integration is that the only point of contact between the two programs is a file that uses a standardized format. No other dependencies or mode of interaction is mentioned. This is exactly what designers do when they want to minimize interaction between components and eliminate risks due to integration.]

I still don’t know what it means to join two systems.
[This is because an example is not an explanation, and can never be an explanation. If someone asks what a flower is and you hold up a rose, they still know nothing about what a flower is, because you could hold up a rose in response to a hundred other such questions: what is a plant? what is a living thing? what is botany? what is a cell? what is red? what is carbon? what is a proton? what is your favorite thing? what is advertising? what is danger? Each time the rose is an answer to some specific aspect of the question, but not all aspects, but how do you know what the example of a rose actually refers to? Without an explanation, you are just guessing.]

I am coming to that. So, here we are joining WorldMate (which is third-party application) to my product so that when a traveller books a ticket from a service and receives the itinerary confirmation email, it then goes to WorldMate which generates XML to give it to my product. Thus, we have joined or created the communication between WorldMate and my application.
[It’s nice that Anita asserts herself, here. She sounds confident.

What she refers to is indeed communication, although not a very interesting form of communication in the context of integration risk. It’s not the sort of communication that necessarily requires integration testing, because the whole point of using XML structures is to cleanly separate two systems so that you don’t have to do anything special or difficult to integrate them.]

I still don’t see the answer to my question. I could just as easily say the two systems are not joined. But rather independent. What does join really mean?
[I am pretending not to see the answer in order to pressure her for more clarity. I won’t use this tactic as a coach unless I feel that the student is reasonably confident.]

Okay, basically when I say join I mean that we are creating the communication between the two systems
[This is the beginning of a good answer, but her example shows only a weak sort of communication.]

I don’t see any communication here. One creates an XML, the other reads it. Neither knows about the other.
[It was wrong of me to say I don’t see any communication. I should have said it was simplistic communication. What I was trying to do is provoke her to argue with me, but I regret saying it so strongly.]

It is a one-way communication.
[I agree it is one-way. That’s part of why I say it is a weak form of integration.]

Is Google integrated with Bing?
[One major tactic of the Socratic method is to find examples that seem to fit the student’s idea and yet refute what they were trying to prove. I am trying to test what Anita thinks is the difference between two things that are integrated and two things that are simply “nearby.”]

Ah no?

According to you, they are! Because I can Google something, then I can take the output and feed it to Bing, and Bing will do a search on that. I can Google for a business name and then paste the name into Bing and learn about the business. The example you gave is just an example of two independent programs that happen to deal with the same file.


So, if I test the two independent programs, haven’t I done all the testing that needs to be done? How is integration testing anything more or different or special?

At this point, Anita seems confused. This would be a good time to switch into lecture mode and help her get clarity. Or I could send her away to research the matter. But what I realized in that moment is that I was not satisfied with my own ideas about integration. When I asked myself “what would I say if I were her?” my answers sounded not much deeper than hers. I decided I needed to do some offline thinking about integration testing.

Lots of things in out world are slightly integrated. Some things are very integrated. This seems intuitively obvious, but what exactly is that difference? I’ve thought it through and I have answers now. Before I blog about it, what do you think?

Exploratory Testing 3.0

[Authors’ note: Others have already made the point we make here: that exploratory testing ought to be called testing. In fact, Michael said that about tests in 2009, and James wrote a blog post in 2010 that seems to say that about testers. Aaron Hodder said it quite directly in 2011, and so did Paul Gerrard. While we have long understood and taught that all testing is exploratory (here’s an example of what James told one student, last year), we have not been ready to make the rhetorical leap away from pushing the term “exploratory testing.” Even now, we are not claiming you should NOT use the term, only that it’s time to begin assuming that testing means exploratory testing, instead of assuming that it means scripted testing that also has exploration in it to some degree.]

[Second author’s note: Some people start reading this with a narrow view of what we mean by the word “script.” We are not referring to text! By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This includes text instructions, but also any form of instructions, or even biases that are not instructions.]

By James Bach and Michael Bolton

In the beginning, there was testing. No one distinguished between exploratory and scripted testing. Jerry Weinberg’s 1961 chapter about testing in his book, Computer Programming Fundamentals, depicted testing as inherently exploratory and expressed caution about formalizing it. He wrote, “It is, of course, difficult to have the machine check how well the program matches the intent of the programmer without giving a great deal of information about that intent. If we had some simple way of presenting that kind of information to the machine for checking, we might just as well have the machine do the coding. Let us not forget that complex logical operations occur through a combination of simple instructions executed by the computer and not by the computer logically deducing or inferring what is desired.”

Jerry understood the division between human work and machine work. But, then the formalizers came and confused everyone. The formalizers—starting officially in 1972 with the publication of the first testing book, Program Test Methods—focused on the forms of testing, rather than its essences. By forms, we mean words, pictures, strings of bits, data files, tables, flowcharts and other explicit forms of modeling. These are things that we can see, read, point to, move from place to place, count, store, retrieve, etc. It is tempting to look at these artifacts and say “Lo! There be testing!” But testing is not in any artifact. Testing, at the intersection of human thought processes and activities, makes use of artifacts. Artifacts of testing without the humans are like state of the art medical clinics without doctors or nurses: at best nearly useless, at worst, a danger to the innocents who try to make use of them.

We don’t blame the innovators. At that time, they were dealing with shiny new conjectures. The sky was their oyster! But formalization and mechanization soon escaped the lab. Reckless talk about “test factories” and poorly designed IEEE standards followed. Soon all “respectable” talk about testing was script-oriented. Informal testing was equated to unprofessional testing. The role of thinking, feeling, communicating humans became displaced.

James joined the fray in 1987 and tried to make sense of all this. He discovered, just by watching testing in progress, that “ad hoc” testing worked well for finding bugs and highly scripted testing did not. (Note: We don’t mean to make this discovery sound easy. It wasn’t. We do mean to say that the non-obvious truths about testing are in evidence all around us, when we put aside folklore and look carefully at how people work each day.) He began writing and speaking about his experiences. A few years into his work as a test manager, mostly while testing compilers and other developer tools, he discovered that Cem Kaner had coined a term—”exploratory testing”—to represent the opposite of scripted testing. In that original passage, just a few pages long, Cem didn’t define the term and barely described it, but he was the first to talk directly about designing tests while performing them.

Thus emerged what we, here, call ET 1.0.

(See The History of Definitions of ET for a chronological guide to our terminology.)

ET 1.0: Rebellion

Testing with and without a script are different experiences. At first, we were mostly drawn to the quality of ideas that emerged from unscripted testing. When we did ET, we found more bugs and better bugs. It just felt like better testing. We hadn’t yet discovered why this was so. Thus, the first iteration of exploratory testing (ET) as rhetoric and theory focused on escaping the straitjacket of the script and making space for that “better testing”. We were facing the attitude that “Ad hoc testing is uncontrolled and unmanageable; something you shouldn’t do.” We were pushing against that idea, and in that context ET was a special activity. So, the crusaders for ET treated it as a technique and advocated using that technique. “Put aside your scripts and look at the product! Interact with it! Find bugs!”

Most of the world still thinks of ET in this way: as a technique and a distinct activity. But we were wrong about characterizing it that way. Doing so, we now realize, marginalizes and misrepresents it. It was okay as a start, but thinking that way leads to a dead end. Many people today, even people who have written books about ET, seem to be happy with that view.

This era of ET 1.0 began to fade in 1995. At that time, there were just a handful of people in the industry actively trying to develop exploratory testing into a discipline, despite the fact that all testers unconsciously or informally pursued it, and always have. For these few people, it was not enough to leave ET in the darkness.

ET 1.5: Explication

Through the late ‘90s, a small community of testers beginning in North America (who eventually grew into the worldwide Context-Driven community, with some jumping over into the Agile testing community) was also struggling with understanding the skills and thought processes that constitute testing work in general. To do that, they pursued two major threads of investigation. One was Jerry Weinberg’s humanist approach to software engineering, combining systems thinking with family psychology. The other was Cem Kaner’s advocacy of cognitive science and Popperian critical rationalism. This work would soon cause us to refactor our notions of scripted and exploratory testing. Why? Because our understanding of the deep structures of testing itself was evolving fast.

When James joined ST Labs in 1995, he was for the first time fully engaged in developing a vision and methodology for software testing. This was when he and Cem began their fifteen-year collaboration. This was when Rapid Software Testing methodology first formed. One of the first big innovations on that path was the introduction of guideword heuristics as one practical way of joining real-time tester thinking with a comprehensive underlying model of the testing process. Lists of test techniques or documentation templates had been around for a long time, but as we developed vocabulary and cognitive models for skilled software testing in general, we started to see exploratory testing in a new light. We began to compare and contrast the important structures of scripted and exploratory testing and the relationships between them, instead of seeing them as activities that merely felt different.

In 1996, James created the first testing class called “Exploratory Testing.”  He had been exposed to design patterns thinking and had tried to incorporate that into the class. He identified testing competencies.

Note: During this period, James distinguished between exploratory and ad hoc testing—a distinction we no longer make. ET is an ad hoc process, in the dictionary sense: ad hoc means “to this; to the purpose”. He was really trying to distinguish between skilled and unskilled testing, and today we know better ways to do that. We now recognize unskilled ad hoc testing as ET, just as unskilled cooking is cooking, and unskilled dancing is dancing. The value of the label “exploratory testing” is simply that it is more descriptive of an activity that is, among other things, ad hoc.

In 1999, James was commissioned to define a formalized process of ET for Microsoft. The idea of a “formal ad hoc process” seemed paradoxical, however, and this set up a conflict which would be resolved via a series of constructive debates between James and Cem. Those debates would lead to we here will call ET 2.0.

There was also progress on making ET more friendly to project management. In 2000, inspired by the work for Microsoft, James and Jon Bach developed “Session-Based Test Management” for a group at Hewlett-Packard. In a sense this was a generalized form of the Microsoft process, with the goal of creating a higher level of accountability around informal exploratory work. SBTM was intended to help defend exploratory work from compulsive formalizers who were used to modeling testing in terms of test cases. In one sense, SBTM was quite successful in helping people to recognize that exploratory work was entirely manageable. SBTM helped to transform attitudes from “don’t do that” to “okay, blocks of ET time are things just like test cases are things.”

By 2000, most of the testing world seemed to have heard something about exploratory testing. We were beginning to make the world safe for better testing.

ET 2.0: Integration

The era of ET 2.0 has been a long one, based on a key insight: the exploratory-scripted continuum. This is a sliding bar on which testing ranges from completely exploratory to completely scripted. All testing work falls somewhere on this scale. Having recognized this, we stopped speaking of exploratory testing as a technique, but rather as an approach that applies to techniques (or as Cem likes to say, a “style” of testing).

We could think of testing that way because, unlike ten years earlier, we now had a rich idea of the skills and elements of testing. It was no longer some “creative and mystical” act that some people are born knowing how to do “intuitively”. We saw testing as involving specific structures, models, and cognitive processes other than exploring, so we felt we could separate exploring from testing in a useful way. Much of what we had called exploratory testing in the early 90’s we now began to call “freestyle exploratory testing.”

By 2006, we settled into a simple definition of ET, simultaneous learning, test design, and test execution. To help push the field forward, James and Cem convened a meeting called the Exploratory Testing Research Summit in January 2006. (The participants were James Bach, Jonathan Bach, Scott Barber, Michael Bolton, Elisabeth Hendrickson, Cem Kaner, Mike Kelly, Jonathan Kohl, James Lyndsay, and Rob Sabourin.) As we prepared for that, we made a disturbing discovery: every single participant in the summit agreed with the definition of ET, but few of us agreed on what the definition actually meant. This is a phenomenon we had no name for at the time, but is now called shallow agreement in the CDT community. To combat shallow agreement and promote better understanding of ET, some of us decided to adopt a more evocative and descriptive definition of it, proposed originally by Cem and later edited by several others: “a style of testing that emphasizes the freedom and responsibility of the individual tester to continually optimize the quality of his work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the course of the project.” Independently of each other, Jon Bach and Michael had suggested the “freedom and responsibility” part to that definition.

And so we had come to a specific and nuanced idea of exploration and its role in testing. Exploration can mean many things: searching a space, being creative, working without a map, doing things no one has done before, confronting complexity, acting spontaneously, etc. With the advent of the continuum concept (which James’ brother Jon actually called the “tester freedom scale”) and the discussions at the ExTRS peer conference, we realized most of those different notions of exploration are already central to testing, in general. What the adjective “exploratory” added, and how it contrasted with “scripted,” was the dimension of agency. In other words: self-directedness.

The full implications of the new definition became clear in the years that followed, and James and Michael taught and consulted in Rapid Software Testing methodology. We now recognize that by “exploratory testing”, we had been trying to refer to rich, competent testing that is self-directed. In other words, in all respects other than agency, skilled exploratory testing is not distinguishable from skilled scripted testing. Only agency matters, not documentation, nor deliberation, nor elapsed time, nor tools, nor conscious intent. You can be doing scripted testing without any scrap of paper nearby (scripted testing does not require that you follow a literal script). You can be doing scripted testing that has not been in any way pre-planned (someone else may be telling you what to do in real-time as they think of ideas). You can be doing scripted testing at a moment’s notice (someone might have just handed you a script, or you might have just developed one yourself). You can be doing scripted testing with or without tools (tools make testing different, but not necessarily more scripted). You can be doing scripted testing even unconsciously (perhaps you feel you are making free choices, but your models and habits have made an invisible prison for you). The essence of scripted testing is that the tester is not in control, but rather is being controlled by some other agent or process. This one simple, vital idea took us years to apprehend!

In those years we worked further on our notions of the special skills of exploratory testing. James and Jon Bach created the Exploratory Skills and Tactics reference sheet to bring specificity and detail to answer the question “what specifically is exploratory about exploratory testing?”

In 2007, another big slow leap was about to happen. It started small: inspired in part by a book called The Shape of Actions, James began distinguishing between processes that required human judgment and wisdom and those which did not. He called them “sapient” vs. “non-sapient.” This represented a new frontier for us: systematic study and development of tacit knowledge.

In 2009, Michael followed that up by distinguishing between testing and checking. Testing cannot be automated, but checking can be completely automated. Checking is embedded within testing. At first, James objected that, since there was already a concept of sapient testing, the distinction was unnecessary. To him, checking was simply non-sapient testing. But after a few years of applying these ideas in our consulting and training, we came to realize (as neither of us did at first) that checking and testing was a better way to think and speak than sapience and non-sapience. This is because “non-sapience” sounds like “stupid” and therefore it sounded like we were condemning checking by calling it non-sapient.

Do you notice how fine distinctions of language and thought can take years to work out? These ideas are the tools we need to sort out our practical decisions. Yet much like new drugs on the market, it can sometimes take a lot of experience to understand not only benefits, but also potentially harmful side effects of our ideas and terms. That may explain why those of us who’ve been working in the craft a long time are not always patient with colleagues or clients who shrug and tell us that “it’s just semantics.” It is our experience that semantics like these mean the difference between clear communication that motivates action and discipline, and fragile folklore that gets displaced by the next swarm of buzzwords to capture the fancy of management.

ET 3.0: Normalization

In 2011, sociologist Harry Collins began to change everything for us. It started when Michael read Tacit and Explicit Knowledge. We were quickly hooked on Harry’s clear writing and brilliant insight. He had spent many years studying scientists in action, and his ideas about the way science works fit perfectly with what we see in the testing field.

By studying the work of Harry and his colleagues, we learned how to talk about the difference between tacit and explicit knowledge, which allows us to recognize what can and cannot be encoded in a script or other artifacts. He distinguished between behaviour (the observable, describable aspects of an activity) and actions (behaviours with intention) (which had inspired James’ distinction between sapient and non-sapient testing). He untangled the differences between mimeomorphic actions (actions that we want to copy and to perform in the same way every time) and polimorphic actions (actions that we must vary in order to deal with social conditions); in doing that, he helped to identify the extents and limits of automation’s power. He wrote a book (with Trevor Pinch) about how scientific knowledge is constructed; another (with Rob Evans) about expertise; yet another about how scientists decide to evaluate a specific experimental result.

Harry’s work helped lend structure to other ideas that we had gathered along the way.

  • McLuhan’s ideas about media and tools
  • Karl Weick’s work on sensemaking
  • Venkatesh Rao’s notions of tempo which in turn pointed us towards James C. Scott’s notion of legibility
  • The realization (brought to our attention by an innocent question from a tester at Barclays Bank) that the “exploratory-scripted continuum” is actually the “formality continuum.” In other words, to formalize an activity means to make it more scripted.
  • The realization of the important difference between spontaneous and deliberative testing, which is the degree of reflection that the tester is exercising. (This is not the same as exploratory vs. scripted, which is about the degree of agency.)
  • The concept of “responsible tester” (defined as a tester who takes full, personal, responsibility for the quality of his work).
  • The advent of the vital distinction between checking and testing, which replaced need to talk about “sapience” in our rhetoric of testing.
  • The subsequent redefinition of the term “testing” within the Rapid Software Testing namespace to make these things more explicit (see below).

About That Last Bullet Point

ET 3.0 as a term is a bit paradoxical because what we are working toward, within the Rapid Software Testing methodology, is nothing less than the deprecation of the term “exploratory testing.”

Yes, we are retiring that term, after 22 years. Why?

Because we now define all testing as exploratory.  Our definition of testing is now this:

“Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes: questioning, study, modeling, observation and inference, output checking, etc.”

Where does scripted testing fit, then?  By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.

By defining testing to be exploratory, scripting becomes a guest in the house of our craft; a potentially useful but foreign element to testing, one that is interesting to talk about and apply as a tactic in specific situations. An excellent tester should not be complacent or dismissive about scripting, any more than a lumberjack can be complacent or dismissive about heavy equipment. This stuff can help you or ruin you, but no serious professional can ignore it.

Are you doing testing? Then you are already doing exploratory testing. Are you doing scripted testing? If you’re doing it responsibly, you are doing exploratory testing with scripting (and perhaps with checking).  If you’re only doing “scripted testing,” then you are just doing unmotivated checking, and we would say that you are not really testing. You are trying to behave like a machine, not a responsible tester.

ET 3.0, in a sentence, is the demotion of scripting to a technique, and the promotion of exploratory testing to, simply, testing.

History of Definitions of ET

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software (There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.)
1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory.
1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.”
1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner.
1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class)
1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0)
1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0)
2001(post WHET #1) The Bach View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers)

2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner
2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion)
2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.)


“Are you listening? Say something!”

[Note: J. Michael Hammond suggests that I note right at the top of this post that the dictionary definition of listen does not restrict the word to the mere noticing of sounds. In the Oxford English Dictionary, as well as others, an extended and more active definition of listening is also given. It’s that extended definition of listening I am writing about here.]

I’m tired of hearing the simplistic advice about how to listen one must not talk. That’s not what listening means. I listen by reacting. As an extravert, I react partly by talking. Talking is how I chew on what you’ve told me. If I don’t chew on what you say, I will choke or get tummy aches and nightmares. You don’t want me to have nightmares, do you? Until you interrupt me to say otherwise, I charitably assume you don’t.

Below is an alternative theory of listening; one that does not require passivity. I will show how this theory is consistent the “don’t talk” advice if you consider that being quiet while other people speak is one heuristic of good listening, rather than the definition or foundation of it. I am tempted to say that listening requires talking, but that is not quite true. This is my proposal of a universal truth of listening: Listening requires you to change.

To Listen is to Change

  1. I propose that to listen is to react coherently and charitably to incoming information. That is how I would define listening.
  2. To react is to change. The reactions of listening may involve a change of mood, attention, concept, or even a physical action.

Notice that I said “coherently and charitably” and not “constructively” or “agreeably.” I think I can be listening to a criminal who demands ransom even if I am not constructive in my response to him. Reacting coherently is not the same as accepting someone’s view of the world. If I don’t agree with you or do what you want me to, that is not proof of my poor listening. “Coherently” refers to a way of making sense of something by interpreting it such that it does not contradict anything important that you also believe is true and important about the world. “Charitably” refers to making sense of something in a way most likely to fit the intent of the speaker.

Also, notice that coherence does not require understanding. I would not be a bad listener, necessarily, if I didn’t understand the intent or implications of what was told to me. Understanding is too high a burden to require for listening. Coherence and charitability already imply a reasonable attempt to understand, and that is the important part.

Poor listening would be the inability or refusal to do the following:

  • take in data at a reasonable pace. (“reasonable pace” is subject to disagreement)
  • make sense of data that is reasonably sensible in that context, including empathizing with it. (“reasonably sensible” is subject to disagreement)
  • reason appropriately about the data. (“reason appropriately” is subject to disagreement)
  • take appropriate responsibility for one’s feelings about the data (“appropriate responsibility” is subject to disagreement)
  • make a coherent response. (“coherent response” is subject to disagreement)
  • comprehend the reasonable purposes and nature of the interaction (“reasonable purposes and nature” is subject to disagreement)

Although all these elements are subject to disagreement, you might not choose to actively dispute them in a given situation, because maybe you feel that the disagreement is not very important. (As an example, I originally wrote “dispute” in the text above, which I think is fine, but during review, after hearing me read the above, Michael Bolton suggested changing “dispute” to “disagreement” and that seemed okay, too, so I made the change. In making his suggestion, he did not need to explain or defend his preference, because he’s earned a lot of trust with me and I felt listened to.)

I was recently told, in an argument, that I was not listening. I didn’t bother to reply to the man that I also felt he wasn’t listening to me. For the record, I think I was listening well enough, and what the man wanted from me was not listening– he wanted compliance to his world view, which was the very matter of dispute! Clearly he wasn’t getting the reaction he wanted, and the word he used for that was listening. Meanwhile, I had reacted to his statements with arguments against them. To me, this is close to the essence of listening.

If you really believe someone isn’t listening, it’s unlikely that it will help to say that, unless you have a strong personal relationship. When my wife tells me I’m not listening, that’s a very special case. She’s weaker than me and crucial to my health and happiness, therefore I will use every tool at my disposal to make myself easy for her to talk to. I generally do the same for children, dogs, people who seem mentally unstable, fire, and dangerous things, but not for most colleagues. I do get crossed up sometimes. Absolutely. Especially on Twitter. Sometimes I assume a colleague feels powerful, and respond to him that way, only later to discover he was afraid of me.

(This happened again just the other day on Twitter. Which is why it is unlikely you will see me teach in Finland any time soon! I am bitten by such a mistake a few times a year, at least. For me this is not a reason to be softer with my colleagues. Then again, it may be. I struggle with the pros and cons. There is no simple answer. I regularly receive counsel from my most trusted colleagues on this point.)

A Sign of Being Listened to is the Change that Happens

Introspect for a moment. How do you know that your computer is listening to you? At this moment, as I am typing, the letters I want to see are appearing on the screen as I press the keys. WordPress is talking back to me. WordPress is changing, and its changes seem coherent and reasonable to me. My purposes are apparently being served. The computer is listening. Consider what happens when you don’t see a response from your computer. How many times have you clicked “save” or “print” or “calculate” or “paste” and suffered that sinking feeling as the forest noises go completely silent and your screen goes glassy and gets that faraway grayed out look of the damned? You feel out of control. You want to shout at your screen “Come back! I’ve changed my mind! Undo! Cancel!” How do you feel then? You don’t say to yourself “what a good listener my computer is!”

Why is this so? It’s because you are involved in a cybernetic control loop with your computer. Without frequent feedback from your system you lose your control over it. You don’t know what it needs or what to do about it. It may be listening to something, but when nothing changes in a manner that seems to relate to your input, you suspect it is not listening to you.

Based just on this example I conjecture that we feel listened to when a system responds to our utterances and actions in a harmonious manner that honors our purposes. I further conjecture that the advice to maintain attentive silence in order to listen better is a special case of change in such a way as to foster harmony and supportiveness.

Can we think of a situation where listening to someone means shouting loudly over them? I can. I was recently in a situation where a quiet colleague was trying to get students to return to her tutorial after a break. The hallway was too noisy and few people could hear her. I noticed that, so I repeated her words very loudly that her students might hear. I would argue that I listened and responded harmoniously in support of her needs. I didn’t ask her if she felt that I listened to her. She knows I did. I could tell by her smile.

If my wife cries “brake!” when I’m driving, I hit the brake. The physical action of my foot on the brake is her evidence that I listened, not attentive silence or passivity.

It may be a small change or a large change, but for the person communicating with you to feel listened to, they must see good evidence of an appropriate change (or change process) in you.

Let me tell you about being a father of a strong-minded son. I have been in numerous arguments with my boy. I have learned how to get my point across: plant the idea, argue for a while, and then let go of it. I discovered it doesn’t matter if he seems to reject the idea. In fact, I’ve come to believe he cannot reject any idea of mine unless it is genuinely wrong for him. I know he’s listening because he argues with me. And if he gets upset, that means he must be taking it quite seriously. Then I wait. And I invariably see a response in the days that follow (I mean not a single instance of this not happening comes to mind right now).

One of the tragedies of fatherhood is that many fathers can’t tell when their children are listening because they need to see too specific a response too quickly. Some listening is a long process. I know that my son needs to chew on difficult ideas in order to process them. This is how to think about the listening process. True listening implies digestion and incubation. The mental metabolism is subtle, complicated, and absolutely vital.

Let People Chew on Your Ideas

Listening is not primarily about taking information into yourself, any more than eating is about taking food into yourself. With eating the real point is digestion. And for good listening you need to digest, too. Part of digestion is chewing, and for humans part of listening is reacting to the raw data for the purposes of testing understanding and contrasting the incoming data with other data they have. Listening well about any complicated thing requires testing. Does this apply to your spouse and children, too? Yes! But perhaps it applies differently to them than to a colleague at work, and certainly differently than testing-as-listening to politician or a telemarketer.

Why does this matter so much? Because if we uncritically accept ideas we risk falling prey to shallow agreement, which is the appearance of agreement despite an unrecognized deep disagreement. I don’t want to find out in the middle of a critical moment on a project that your definition of testing, or role, or collaboration, or curiosity doesn’t match mine. I want to have conversations about the meanings of words well before that. Therefore I test my understanding. Too many in the Agile culture seem to confuse a vacant smile with philosophical and practical comprehension. I was told recently that for an Agile tester, “collaboration” may be more important than testing skill. That is probably the stupidest thing I have heard all year. By “stupid” I mean willfully refusing to use one’s mind. I was talking to a smart man who would not use his smarts in that moment, because, by his argument, the better tester is the one who agrees to do anything for anyone, not the one who knows how to find important bugs quickly. In other words, any unskilled day laborer off the street, desperate for work, is apparently a better tester than me. Yeah… Right…

In addition to the idea digestion process, listening also has a critical social element. As I said above, whether or not you are listening is, practically speaking, always a matter of potential dispute. That’s the way of it. Listening practices and instances are all tied up in socially constructed rituals and heuristics. And these rituals are all about making ourselves open to reasonable change in response to each other. Listening is about the maintenance of social order as well as maintaining specific social relationships. This is the source of all that advice about listening by keeping attentively quiet while someone else speaks. What that misses is that the speaker also has a duty to perform in the social system. The speaker cannot blather on in ignorance or indifference to the idea processing practices of his audience. When I teach, I ask my students to interrupt me, and I strive to reward them for doing so. When I get up to speak, I know I must skillfully use visual materials, volume control, rhythm, and other rhetorical flourishes in order to package what I’m communicating into a more digestible form.

Unlike many teachers, I don’t interpret silence as listening. Silence is easy. If an activity can be done better and cheaper by a corpse or an inanimate object, I don’t consider it automatically worth doing as a living human.

I strongly disagree with Paul Klipp when he writes: “Then stop thinking about talking and pretend, if it’s not obvious to you yet, that the person who is talking is as good at thinking as you are. You may suddenly have a good idea, or you may have information that the person speaking doesn’t. That’s not a good enough reason to interrupt them when they are thinking.” Paul implies that interrupting a speaker is an expression of dominance or subversion. Yes, it can be, but it is not necessarily so, and I wish someone trained in Anthropology would avoid such an uncharitable oversimplification. Some interruptions are harmful and some are helpful. In fact, I would say that every social act is both harmful and helpful in some way. We must use our judgment to know what to say, how to say it and when. Stating favorite heuristics as if they were “best practices” is patronizing and unnecessary.

One Heuristic of Listening: Stop Talking

Where I agree with Paul and others like him is that one way of improving the harmony of communication and that feeling of being coherently and charitably responded to is to talk less. I’m more likely to use that in a situation where I’m dealing with someone whom I suspect is feeling weak, and whom I want to encourage to speak to me. However, another heuristic I use in that situation is to speak more. I do this when I want to create a rhetorical framework to help the person get his idea across. This has the side effect of taking pressure of someone who may not want to speak at all. I say this based on the vivid personal experience of my first date with the one who would become my wife. I estimate I spoke many thousands of words that evening. She said about a dozen. I found out later that’s just what she was looking for. How do I know? After two dates we got married. We’ve been married 23 years, so far. I also have many vivid experiences of difficult conversations that required me to sit next to her in silence for as long as 10 minutes until she was ready to speak. Both the “talk more” and “talk less” heuristics are useful for having a conversation.

What does this have to do with testing?

My view of listening can be annoying to people for exactly the same reason that testing is annoying to people. A developer may want me to accept his product without “judgment.” Sorry, man. That is not the tester’s way. A tester who doesn’t subject your product to criticism is, in fact, not taking it seriously. You should not feel honored by that, but rather insulted. Testing is how I honor strong, good products. And arguing with you may be how I honor your ideas.

Listening, I claim, is itself a testing process. It must be, because testing is how we come to comprehend anything deeply. Testing is a practice that enables deep learning and deeply trusting what we know.

Are You Listening to Me?

Then feel free to respond. Even if you disagree, you could well have been listening. I might be able to tell from your response, if that matters to you.

If you want to challenge this post, try reading it carefully… I will understand if you skip parts, or see one thing and want to argue with that. Go ahead. That might be okay. If I feel that there is critical information that you are missing, I will suggest that you read the post again. I don’t require that people read or listen to me thoroughly before responding. I ask only that you make a reasonable and charitable effort to make sense of this.


A Test is a Performance

Testing is a performance, not an artifact.

Artifacts may be produced before, during, or after the act of testing. Whatever they are, they are not tests. They may be test instructions, test results, or test tools. They cannot be tests.

Note: I am speaking a) authoritatively about how we use terms in Rapid Testing Methodology, b) non-authoritatively of my best knowledge of how testing is thought of more broadly within the Context-Driven school, and c) of my belief about how anyone, anywhere should think of testing if they want a clean and powerful way to talk about it.

I may informally say “I created a test.” What I mean by that is that I designed an experience, or I made a plan for a testing event. That plan itself is not the test, anymore than a picture of a car is a car. Therefore, strictly speaking, the only way to create a test is to perform a test. As Michael Bolton likes to say, there’s a world of difference between sheet music and a musical performance, even though we might commonly refer to either one as “music.” Consider these sentences: “The music at the symphony last night was amazing.” vs. “Oh no, I left the music on my desk at home.”

We don’t always have to speak strictly, but we should know how and know why we might want to.

Why can’t a test be an artifact?

Because artifacts don’t think or learn in the full human sense of that word, that’s why, and thinking is central to the test process. So to claim that an artifact is a test is like wearing a sock puppet on your hand and claiming that it’s a little creature talking to you. That would be no more than you talking to yourself, obviously, and if you removed yourself from that equation the puppet wouldn’t be a little creature, would it? It would be a decorated sock lying on the floor. The testing value of an artifact can be delivered only in concert with an appropriately skilled and motivated tester.

With procedures or code you can create a check. See here for a detailed look at the difference between checking and testing. Checking is part of testing, of course. Anyone who runs checks that fail knows that the next step is figuring out what the failures mean. A tester must also evaluate whether the checks are working properly and whether there are enough of them, or too many, or the wrong kind. All of that is part of the performance of testing.

When a “check engine” light goes on in your car, or any strange alert, you can’t know until you go to a mechanic whether that represents a big problem or a little problem. The check is not testing. The testing is more than the check itself.

But I’ve seen people follow test scripts and only do what the test document tells them to do!

Have you really witnessed that? I think the most you could possibly have witnessed is…


a tester who appeared to do “only” what the test document tells him, while constantly and perhaps unconsciously adjusting and reacting to what’s happening with the system under test. (Such a tester may find bugs, but does so by contributing interpretation, judgment, and analysis; by performing.)


a tester who necessarily missed a lot of bugs that he could have found, either because the test instructions were far too complex, or far too vague, or there was far too little of it (because that documentation is darn expensive) and the tester failed to perform as a tester to compensate.

In either case, the explicitly written or coded “test” artifact can only be an inanimate sock, or a sock puppet animated by the tester. You can choose to suffer without a tester, or to cover up the presence of the tester. Reality will assert itself either way.

What danger could there be in speaking informally about writing “tests?”

It’s not necessarily dangerous to speak informally. However, a possible danger is that non-testing managers and clients of our work will think of testers as “test case writers” instead of as people who perform the skilled process of testing. This may cause them to treat testers as fungible commodities producing “tests” that are comprised solely of explicit rules. Such a theory of testing– which is what we call the Factory school of testing thought– leads to expensive artifacts that uncover few bugs. Their value is mainly in that they look impressive to ignorant people.

If you are talking to people who fully understand that testing is a performance, it is fine to speak informally. Just be on your guard when you hear people say “Where are your tests?” “Have you written any tests?” or “Should you automate those tests?” (I would rather hear “How do you test this?” “Where are you focusing you testing?” or “Are you using tools to help your testing?”)

Thanks to Michael Bolton and Aleksander Simic for reviewing and improving this post.


Justifying Real Acceptance Testing

This post is not about the sort of testing people talk about when nearing a release and deciding whether it’s done. I have another word for that. I call it “testing,” or sometimes final testing or release testing. Many projects perform that testing in such a perfunctory way that it is better described as checking, according to the distinction between testing and checking I have previously written of on this blog. As Michael Bolton points out, that checking may better be described as rejection checking since a “fail” supposedly establishes a basis for saying the product is not done, whereas no amount of “passes” can show that it is done.

Acceptance testing can be defined in various ways. This post is about what I consider real acceptance testing, which I define as testing by a potential acceptor (a customer), performed for the purpose of informing a decision to accept (to purchase or rely upon) a product.

Do we need acceptance testing?

Whenever a business decides to purchase and rely upon a component or service, there is a danger that the product will fail and the business will suffer. One approach to dealing with that problem is to adopt the herd solution: follow the thickest part of the swarm; choose a popular product that is advertised or reputed to do what you want it to do and you will probably be okay. I have done that with smartphones, ruggedized laptops, file-sharing services, etc. with good results, though sometimes I am disappointed.

My business is small. I am nimble compared to almost every other company in the world. My acceptance testing usually takes the form of getting a trial subscription to service, or downloading the “basic” version of a product. Then I do some work with it and see how I feel. In this way I learned to love Dropbox, despite its troubling security situation (I can’t lock up my Dropbox files), or the fact that there is a significant chance it will corrupt very large files. (I no longer trust it with anything over half of a gig).

But what if I were advising a large company about whether to adopt a service or product that it will rely upon across dozens or hundreds or thousands of employees? What if the product has been customized or custom built specifically for them? That’s when acceptance testing becomes important.

Doesn’t the Service Level Agreement guarantee that the product will work?

There are a couple of problems with relying on vendor promises. First, the vendor probably isn’t promising total satisfaction. The service “levels” in the contract are probably narrowly and specifically drawn. That means if you don’t think of everything that matters and put that in the contract, it’s not covered. Testing is a process that helps reveal the dimensions of the service that matter.

Second, there’s an issue with timing. By the time you discover a problem with the vendor’s product, you may be already relying on it. You may already have deployed it widely. It may be too late to back out or switch to a different solution. Perhaps your company negotiated remedies in that case, but there are practical limitations to any remedy. If your vendor is very small, they may not be able to afford to fix their product quickly. If you vendor is very large, they may be able to afford to drag their feet on the fixes.

Acceptance testing protects you and makes the vendor take quality more seriously.

Acceptance testing should never be handled by the vendor. I was once hired by a vendor to do penetration testing on their product in order to appease a customer. But the vendor had no incentive to help me succeed in my assignment, nor to faithfully report the vulnerabilities I discovered. It would have been far better if the customer had hired me.

Only the accepting party has the incentive to test well. Acceptance testing should not be pre-agreed or pre-planned in any detail– otherwise the vendor will be sure that the product passes those specific tests. It should be unpredictable, so that the vendor has an incentive to make the product truly meet its requirements in a general sense. It should be adaptive (exploratory) so that any weakness you find can be examined and exploited.

The vendor wants your money. If your company is large enough, and the vendor is hungry, they will move mountains to make the product work well if they know you are paying attention. Acceptance testing, done creatively by skilled testers on a mission, keeps the vendor on its toes.

By explicitly testing in advance of your decision to accept the product, you have a fighting chance to avoid the disaster of discovering too late that the product is a lemon.

My management doesn’t think acceptance testing matters. What do I do?

1. Make the argument, above.
2. Communicate with management, formally, about this. (In writing, so that there is a record.)
It is up to management to make decisions about business risk. They may feel the risk is not worth worrying about. In that case, you must wait and watch. People are usually more persuaded by vivid experiences, rather than abstract principles, so:
1. Collect specific examples of the problems you are talking about. What bugs have you experienced in vendor products?
2. Collect news reports about bugs in products of your vendors (or other vendors) that have been disruptive.
3. In the event you get to do even a little acceptance testing, make a record of the problems you find and be ready to remind management of that history.


To The New Tester

About once a week, I get an email like one of these:

Hi James,

I’m from Hyderabad, India. I’m working as a Testing Engineer and doing Manual Testing from the Last 1 Year. I want to know what are things I need to follow to become a good tester. I didn’t have any programming background. I want to learn automation testing for career growth, I want to improve my test case preparing skills and documentation skills. Please suggest me in this regard.

Hi, James,

I am looking for software testing jobs. Actually I’m fresher, I don’t know much about testing but I interested too. Can you please suggest me which type of testing is good?(like manual or automation or etc..) I have got some ideas about manual testing and DB-PL/SQL.

Respected Sir,

I am a Big Fan Of You.

BTech Graduate In Electrical & Electronics. Done Software testing course in QTP and Selenium. I got job as Mobile Application Tester. I am the only tester in this company. iOS and Android Apps. Now i have 3 months experience. I am mainly doing Manual testing. I dont know , whether i am going in right path.

Can you help me please? (Advice)

This blog post shall be my standard answer to these emails…

Dear New Tester/Testing Hopeful,

Thank you for choosing testing. We need smart people in our craft, and I hope you are one of them. I know you must have something going for you because, unlike most young testers, you reached out to me. That’s not easy.

Here are some general ideas:

  • Read my blog (
  • Read the materials on my website (
  • Consider reading either of of my books: Lessons Learned in Software Testing, or Secrets of a Buccaneer-Scholar.
  • Read the materials and blog at
  • Get on Twitter and watch the conversations among the Context-Driven community. Participate in those discussions.
  • Join It’s a quiet, moderated email forum for Context-Driven testers.
  • I offer free coaching, too, but you have to read this blog post first.
  • Practice testing things that aren’t secret, and then post your test results online so that others can see them.
  • Consider taking the RTI Online class.
  • Consider taking the BBST class.
  • Participate in sessions conducted by
  • Consider attending the Let’s Test conference or the CAST conference.
  • If you want to get good at using tools while testing, try learning Python. Personally, I use Perl, but I hear Python is good.

Here are some books to read:

  • Introduction to General Systems Thinking, by Gerald M. Weinberg
  • Quality Software Management, Vol. 1: Systems Thinking, by Gerald M. Weinberg
  • Tacit and Explicit Knowledge, by Harry Collins
  • The Black Swan, by Nassim Taleb
  • Testing Computer Software, by Cem Kaner, remains a good classic testing book.

What about certification?

  • Don’t get certified. There are no respectable commercial testing certifications.
  • If you are forced to get certified for some reason, do not take it seriously. It’s not an achievement, it’s just a conveyor belt that extracts your money and gives you nothing you couldn’t get for the price of a Google search.
  • True certification remains this: the respect of respectable people.

What kinds of testers are there?

  • Automated? Manual? There is no such thing as manual or automated testing. It’s all just testing. Testing is often supported by tools that attempt to simulate user interaction with the system. This is what people call “test automation” even though it is only automating a crude approximation of one aspect of testing. If you have the ambition to be a one-man test team, it is extremely valuable to learn how to make your own tools.
  • Exploratory? Scripted? There is no such thing as an exploratory or scripted tester. All good testing is exploratory to some degree and scripted to some degree.
  • Tester. This is a testing generalist who can contribute to any test team. Sometimes called a QA analyst, QA engineer, or test engineer. I prefer the simplicity of “tester.”
  • Omega Tester. The omega tester (which I sometimes call a test jumper, after the analogy of a paratrooper) is one who can do anything. An omega tester is equipped to be the only tester in a project team, if necessary. Omega testers can lead testing, or work with a team of other testers. I am an omega tester. I aspire to be a good one.
  • Performance Tester. The performance tester understands the mathematics and dynamics of the performance of large-scale systems. They use tools that create high loads and measure the performance envelope of systems as they scale up. Performance testers often don’t think of themselves as testers.
  • Usability Tester. The usability tester is a bit mythical. I have met only two dedicated usability testers in my entire career, but I have seen more of them from a distance. A usability tester specializes in studying how users feel about using and learning a product.
  • Security Tester. Security testers also often don’t think of themselves as testers. Security is an exciting, specialized form of testing that requires the mastery of a great many facts about a great many technologies.
  • Testing Toolsmith. A testing toolsmith is a programmer dedicated to writing and maintaining tools that help testers. This is what a lot of people would call an “automated tester” but you better not use that term around me.
  • “SDET” Software Development Engineer in Test. This means a full on programmer who does testing using his programming skills.

I want to help everyone, but at the same time, I have my own work to do. So, I limit my attention to a special kind of student. I work with people who study and practice and want to test better and understand testing more deeply than others. If that describes you, then instead of writing to me with a question, write to me with examples of your work.

Good luck!

— James

P.S. See also what Allen Johnson has to say about this.


A Public Service Announcement About Exploratory Testing

[Updated: I revamped and added some more examples to the list.]

I got this message from Oliver Vilson, today:

Oliver V.: hi James. Just had a chat with Helena_JM. She reminded me something… don’t know if you’ve written blog about it or pushed it into  RST.. One Test lead from another company mentioned me he has problems with his testers. Some of them are saying that they don’t have to do test plans, since your teaching seems to align that…
James Bach: Any more details?
Oliver V.: rough translation from test team lead : “ET seems to have reputation as “excuse for shitty testing”. People can’t explain what they did and why. If you ask them for test plan or explanation, all you get is “but Bach said…”.

I have, from time to time, heard rumors that some people cite my writings and teachings as an excuse to do bad testing. I think it would help to do a public service announcement…

Attention Testers and Managers of Testers

If a tester claims he is justified in doing bad work because of something I’ve published or said, please email me at, or Skype me, and I will help you stop that silliness.

I teach skilled software testing for people who intend to do an excellent job. That process is necessarily exploratory in nature. It also necessarily will have some scripted elements– partly due to the nature of thinking and partly due to the requirements of excellent intellectual work.

I do not teach evasiveness or obscurantism. I do not ever tell a tester that he can get away with refusing to explain his test process. Explaining testing is an important part of being a professional.

Why People Get Confused

I reinvented software testing for myself, from first principles. So, I teach from a very different set of premises. This is necessary, because common ideas about testing are so idiotic. But it does result in confusion when my ideas are taken out of context and “mixed in” to the idiocy. Consider: “I won’t create a detailed test plan document” is a perfectly ordinary and potentially reasonable thing to say in RST. It is a statement about things made explicit in a document, not a statement about lack of planning. Yet Factory School methodology confuses documents for content. If you say that to one of them, it may be mistaken for a refusal to apply appropriate rigor to your work.

Here are some examples of how someone might misapply my teachings:

  1. Rapid Software Testing methodology (RST) is not the same thing as exploratory testing. ET is very simple. Anyone can do ET, just as anyone can look at a painting. But there’s a huge difference between a skilled appraisal of a painting by an expert and a bored glance by a schoolkid. RST is a methodology for doing testing (including scripted and exploratory testing) well. Therefore, anyone doing ET badly is not doing my methodology.
  2. In RST, a plan is not a document, it’s a set of ideas. Therefore, I say you don’t need to have a test plan template, or any sort of written test plan document in order to have a good test plan. I often document my test ideas, though, in different ways, when that helps. Therefore, the lack of a test plan (a guiding set of ideas) probably represents an immature and possibly inadequate test process, but the lack of a test plan document is not necessarily a problem.
  3. In RST, a test is not a document, it’s a performance. Therefore the lack of documented tests is not necessarily a problem, but poor testing (which can be determined by direct observation by a skilled tester or test manager, just as poor carpentry or poor doctoring can be detected) is a problem.
  4. In RST, we have no templates for reporting. But reporting is crucial. Reporting skills are crucial. Accountability is crucial. Credibility is crucial. We teach the art of telling a testing story. Therefore, anyone who declines to explain himself when asked about his testing is not practicing RST. I disavow such testers. (However, just because explaining oneself is an important part of testing doesn’t mean a manager can insist on arbitrarily voluminous documentation or arbitrary metrics. I suspect that, in some cases, managers who complain about testers refusing to document or explain themselves are really just obsessed with a specific method of documentation and refusing to accept other viable solutions to the same problem.)
  5. In RST we say that testing cannot be automated, and that tools can become an obsession. This leads some to think I am against tools. No, I am against bad work. Unfortunately, some tools, such as expensive HP/Mercury tools, are often used to wastefully automate weak fact checking at he expense of good testing. Yes., tools and the technical skills to create and apply them play an important role in great testing. It’s not automating testing when I use tools, because testing is whatever testers do, not what tools do. Therefore a tester who refuses to learn and use tools in general is not practicing RST.
  6. In RST we distinguish between checking and testing. This allows us to distinguish between a test process that is appropriately thoughtful and deep, and one (based solely on checking) that would be reckless and shallow. But when we criticize a checking-only test strategy, some people get confused and think we are criticizing the presence of checking rather than the lack of testing. Therefore, a tester who refuses to design or perform checks that are actually economical and helpful is not doing RST.
  7. In RST, we ban unscientific, abusive attempts at using metrics to control the test process. But when some people hear us attack, say, the counting of test cases, they assume that means we don’t believe in even the concept or principle of measurement. Instead, we support using inquiry-focused metrics (which inspire questions rather than dictating decisions), we promote active skepticism about numbers applied to social systems, and we promote the development of observation, reasoning, and social skills that limit the need for quantification. Therefore any tester who simply refuses to consider using metrics of any kind is not doing RST.
  8. Some people hear about the freedom of exploratory testing, and they confuse that with irresponsibility. But that’s silly. If you drive a car, you are free to run over pedestrians or smash into buildings– except you don’t, because you are responsible! Also, it’s against the law. Freedom is not the same thing as having a right. Therefore, anyone who accepts the freedom of exploratory testing and cannot or will not manage that testing appropriately is an incompetent or irresponsible tester.

Benjamin Mitchell and the Trap of False Hypocrisy

One of the puzzles of intellectual life is how to criticize something you admire without sounding like you don’t admire it. Benjamin Mitchell has given an insightful talk about social dynamics in Agile projects. You should see it. I enjoyed it, but I also felt pricked by several missed opportunities where he could have done an even deeper analysis. This post is about one example of that.

Benjamin offers an example of feedback he got about feedback he gave to a member of his team:

“Your feedback to the team member was poor because:
it did not focus on any positive actions, and
it didn’t use any examples”

Benjamin immediately noticed that this statement appears to violate itself. Obviously, it doesn’t focus on positive actions and it doesn’t use any examples. To Benjamin this demonstrates hypocrisy and a sort of incompetence and he got his reviewer (who uttered the statement) to agree with him about that. “It’s incompetent in the sense that it has a theory of effectiveness that it violates,” Benjamin says. From his tone, he clearly doesn’t see this as the product of anything sinister, but more as an indicator of how hard it is to deeply walk our talk. Let’s try harder not to be hypocrites, I think he’s saying.

Except this is not an example of hypocrisy.

In this case, the mistake lies with Benjamin, and then with the reviewer for not explaining and defending himself when challenged.

It’s worth dwelling on this because methodologists, especially serious professional ones like Benjamin and me, are partly in the business of listening to people who have trouble saying what they mean (a population that includes all of humanity), then helping them say it better. He and I need to be very very good at what social scientists call “verbal protocol analysis.” So, let’s learn from this incident.

In order to demonstrate my point, I’d like to see if you agree to two principles:

  1. Context Principle: Everything that we ever do, we do in some particular situation, and that context has a large impact on what, how, and why we do things. For instance, I’m writing this in the situation of a quiet afternoon on Orcas Island, purely by choice, and not because I’m paid or forced to write it by a shadowy client with a sinister agenda.
  2. Enoughness Principle: Anything we do that is good or bad could have been even better, or even worse. Although it makes sense to try to do good work, that comes at a cost, and therefore in practice we stop at whatever we consider to be “good enough” and not necessarily the best we can do.

Assuming you accept those principles, see what happens when I slightly reword the offending comment:

In that situation, your feedback to the team member was poor compared to what you could easily have achieved because:
it did not focus on any positive actions, and
it didn’t use any examples”

Having added the words, what happens if Benjamin tells me that this statement doesn’t focus on positive actions and doesn’t cite an example? I reply like this:

“That’s a reasonable observation, but I think it’s out of place here. My advice pertains to giving feedback to people who feel frightened or threatened or may not have the requisite skills to comprehend the feedback or in a situation where I am not seen as a credible reviewer. And my advice pertains to situations where you want to invest in giving vivid, powerful advice– advice that teaches. However, in this case, I felt it was good enough (not perfect but good at a reasonable investment of my time) to ignore the positive (because, Benjamin, you already know you’re good, and you know that I know that you are good– so you don’t need me to give you a swig of brandy before telling you the “bad news”) and I thought that investing in careful phrasing of a vivid example might actually sound patronizing to you, because you already know what I’m talking about, man.”

In other words, with the added words in bold face, it becomes a little clearer that the situation of him advising his client, and us advising him, are different in important ways.

Imagine that Benjamin spots a misspelled word in my post. Does he need to give me an example of how to spell it? Does he need to speak about the potential benefits of good spelling? Does he need to praise my use of commas before broaching the subject of spelling? No. He just needs to point and say “that’s spelled wrong.” He can do that without being a hypocrite, don’t you think?

(Of course, if the situations are not different and the quality of the comment made to Benjamin is clearly not good enough, then it is fair to raise the issue that the feedback does not meet its own implied standard.)

Finally: I added those bolded words, but if I’m in a community that assumes them, I don’t need to add them. They are there whether I say them or not. We don’t need to make explicit that which is already a part of our culture. Perhaps the person who offered this feedback to Benjamin was assuming that he understood that advice is situational, and that a summary form of feedback is better in this case than a lengthy ritual of finding something to praise about Benjamin and then citing at least three examples.

…unless Benjamin is a frightened student… which he isn’t. Look at him in that video. He exudes self-confidence. That man is a responsible adult. He can take a punch.

Who’s the Real Monster?

“Best practice” thinking itself causes these misunderstandings. Many people seek to memorize protocols such as “how to give feedback… always do this… step 1: always say something nice step 2: always focus on solutions not problems… etc.” instead of understanding the underlying dynamics of communication and relationships. Then when they slip and accidentally behave in an insightful and effective way instead of following their silly scripts, their friends accuse them of being hypocrites.

When the explicit parts of our procedures are at war with the tacit parts, we chronically fall into such traps.

There is a silver lining here: it’s good to be a hypocrite if you are preaching the wrong things. Watch yourself. The next time you fail in your discipline to do X, seriously consider if your discipline is actually wrong, and your “failure” is actually success of some kind.

This is why when I talk about procedures, I speak of heuristics (which are fallible) and skills (which are plastic) and context (which varies). There are no best practices.

I’m going to wrap this up with some positive feedback, because he doesn’t know me very well, yet. Benjamin, I appreciate how, in your work, you question what you are told and reflect on your own thought processes in a spirit of both humility and confidence. YOU don’t seem infected by “best practice” folklore. Thank you for that.



Testing and Checking Refined

This post is co-authored with Michael Bolton. We have spent hours arguing about nearly every sentence. We also thank Iain McCowatt for his rapid review and comments.

Testing and tool use are two things that have characterized humanity from its beginnings. (Not the only two things, of course, but certainly two of the several characterizing things.) But while testing is cerebral and largely intangible, tool use is out in the open. Tools encroach into every process they touch and tools change those processes. Hence, for at least a hundred or a thousand centuries the more philosophical among our kind have wondered “Did I do that or did the tool do that? Am I a warrior or just spear throwing platform? Am I a farmer or a plow pusher?” As Marshall McLuhan said “We shape our tools, and thereafter our tools shape us.”

This evolution can be an insidious process that challenges how we label ourselves and things around us. We may witness how industrialization changes cabinet craftsmen into cabinet factories, and that may tempt us to speak of the changing role of the cabinet maker, but the cabinet factory worker is certainly not a mutated cabinet craftsman. The cabinet craftsmen are still out there– fewer of them, true– nowhere near a factory, turning out expensive and well-made cabinets. The skilled cabineteer (I’m almost motivated enough to Google whether there is a special word for cabinet expert) is still in demand, to solve problems IKEA can’t solve. This situation exists in the fields of science and medicine, too. It exists everywhere: what are the implications of the evolution of tools on skilled human work? Anyone who seeks excellence in his craft must struggle with the appropriate role of tools.

Therefore, let’s not be surprised that testing, today, is a process that involves tools in many ways, and that this challenges the idea of a tester.

This has always been a problem– I’ve been working with and arguing over this since 1987, and the literature of it goes back at least to 1961– but something new has happened: large-scale mobile and distributed computing. Yes, this is new. I see this is the greatest challenge to testing as we know it since the advent of micro-computers. Why exactly is it a challenge? Because in addition to the complexity of products and platforms which has been growing steadily for decades, there now exists a vast marketplace for software products that are expected to be distributed and updated instantly.

We want to test a product very quickly. How do we do that? It’s tempting to say “Let’s make tools do it!” This puts enormous pressure on skilled software testers and those who craft tools for testers to use. Meanwhile, people who aren’t skilled software testers have visions of the industrialization of testing similar to those early cabinet factories. Yes, there have always been these pressures, to some degree. Now the drumbeat for “continuous deployment” has opened another front in that war.

We believe that skilled cognitive work is not factory work. That’s why it’s more important than ever to understand what testing is and how tools can support it.

Checking vs. Testing

For this reason, in the Rapid Software Testing methodology, we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do. This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.

Now that Michael and I have had over three years experience working with this distinction, we have sharpened our language even further, with updated definitions and a new distinction between human checking and machine checking.

First let’s look at testing and checking. Here are our proposed new definitions, which soon will replace the ones we’ve used for years (subject to review and comment by colleagues):

Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.

(A test is an instance of testing.)

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

(A check is an instance of checking.)

Explanatory notes:

  • “evaluating” means making a value judgment; is it good? is it bad? pass? fail? how good? how bad? Anything like that.
  • “evaluations” as a noun refers to the product of the evaluation, which in the context of checking is going to be an artifact of some kind; a string of bits.
  • “learning” is the process of developing one’s mind. Only humans can learn in the fullest sense of the term as we are using it here, because we are referring to tacit as well as explicit knowledge.
  • “exploration” implies that testing is inherently exploratory. All testing is exploratory to some degree, but may also be structured by scripted elements.
  • “experimentation” implies interaction with a subject and observation of it as it is operating, but we are also referring to “thought experiments” that involve purely hypothetical interaction. By referring to experimentation, we are not denying or rejecting other kinds of learning; we are merely trying to express that experimentation is a practice that characterizes testing. It also implies that testing is congruent with science.
  • the list of words in the testing definition are not exhaustive of everything that might be involved in testing, but represent the mental processes we think are most vital and characteristic.
  • “algorithmic” means that it can be expressed explicitly in a way that a tool could perform.
  • “observations” is intended to encompass the entire process of observing, and not just the outcome.
  • “specific observations” means that the observation process results in a string of bits (otherwise, the algorithmic decision rules could not operate on them).

There are certain implications of these definitions:

  • Testing encompasses checking (if checking exists at all), whereas checking cannot encompass testing.
  • Testing can exist without checking. A test can exist without a check. But checking is a very popular and important part of ordinary testing, even very informal testing.
  • Checking is a process that can, in principle be performed by a tool instead of a human, whereas testing can only be supported by tools. Nevertheless, tools can be used for much more than checking.
  • We are not saying that a check MUST be automated. But the defining feature of a check is that it can be COMPLETELY automated, whereas testing is intrinsically a human activity.
  • Testing is an open-ended investigation– think “Sherlock Holmes”– whereas checking is short for “fact checking” and focuses on specific facts and rules related to those facts.
  • Checking is not the same as confirming. Checks are often used in a confirmatory way (most typically during regression testing), but we can also imagine them used for disconfirmation or for speculative exploration (i.e. a set of automatically generated checks that randomly stomp through a vast space, looking for anything different).
  • One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.
  • A check is describable; a test might not be (that’s because, unlike a check, a test involves tacit knowledge).
  • An assertion, in the Computer Science sense, is a kind of check. But not all checks are assertions, and even in the case of assertions, there may be code before the assertion which is part of the check, but not part of the assertion.
  • These definitions are not moral judgments. We’re not saying that checking is an inherently bad thing to do. On the contrary, checking may be very important to do. We are asserting that for checking to be considered good, it must happen in the context of a competent testing process. Checking is a tactic of testing.

Whither Sapience?

If you follow our work, you know that we have made a big deal about sapience. A sapient process is one that requires an appropriately skilled human to perform. However, in several years of practicing with that label, we have found that it is nearly impossible to avoid giving the impression that a non-sapient process (i.e. one that does not require a human but could involve a very talented and skilled human nonetheless) is a stupid process for stupid people. That’s because the word sapience sounds like intelligence. Some of our colleagues have taken strong exception to our discussion of non-sapient processes based on that misunderstanding. We therefore feel it’s time to offer this particular term of art its gold watch and wish it well in its retirement.

Human Checking vs. Machine Checking

Although sapience is problematic as a label, we still need to distinguish between what humans can do and what tools can do. Hence, in addition to the basic distinction between checking and testing, we also distinguish between human checking and machine checking. This may seem a bit confusing at first, because checking is, by definition, something that can be done by machines. You could be forgiven for thinking that human checking is just the same as machine checking. But it isn’t. It can’t be.

In human checking, humans are attempting to follow an explicit algorithmic process. In the case of tools, however, the tools aren’t just following that process, they embody it. Humans cannot embody such an algorithm. Here’s a thought experiment to prove it: tell any human to follow a set of instructions. Get him to agree. Now watch what happens if you make it impossible for him ever to complete the instructions. He will not just sit there until he dies of thirst or exposure. He will stop himself and change or exit the process. And that’s when you know for sure that this human– all along– was embodying more than just the process he agreed to follow and tried to follow. There’s no getting around this if we are talking about people with ordinary, or even minimal cognitive capability. Whatever procedure humans appear to be following, they are always doing something else, too. Humans are constantly interpreting and adjusting their actions in ways that tools cannot. This is inevitable.

Humans can perform motivated actions; tools can only exhibit programmed behaviour (see Harry Collins and Martin Kusch’s brilliant book The Shape of Actions, for a full explanation of why this is so). The bottom line is: you can define a check easily enough, but a human will perform at least a little more during that check– and also less in some ways– than a tool programmed to execute the same algorithm.

Please understand, a robust role for tools in testing must be embraced. As we work toward a future of skilled, powerful, and efficient testing, this requires a careful attention to both the human side and the mechanical side of the testing equation. Tools can help us in many ways far beyond the automation of checks. But in this, they necessarily play a supporting role to skilled humans; and the unskilled use of tools may have terrible consequences.

You might also wonder why we don’t just call human checking “testing.” Well, we do. Bear in mind that all this is happening within the sphere of testing. Human checking is part of testing. However, we think when a human is explicitly trying to restrict his thinking to the confines of a check– even though he will fail to do that completely– it’s now a specific and restricted tactic of testing and not the whole activity of testing. It deserves a label of its own within testing.

With all of this in mind, and with the goal of clearing confusion, sharpening our perception, and promoting collaboration, recall our definition of checking:

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

From that, we have identified three kinds of checking:

Human checking is an attempted checking process wherein humans collect the observations and apply the rules without the mediation of tools.

Machine checking is a checking process wherein tools collect the observations and apply the rules without the mediation of humans.

Human/machine checking is an attempted checking process wherein both humans and tools interact to collect the observations and apply the rules.

In order to explain this thoroughly, we will need to talk about specific examples. Look for those in an upcoming post.

Meanwhile, we invite you to comment on this.

UPDATE APRIL 10th: As a result of intense discussions at the SWET5 peer conference, I have updated the diagram of checking and testing. Notice that testing is now sitting outside the box, since it is describing the whole thing, a description of testing is inside of it. Human checking is characterized by a cloud, because its boundary with non-checking aspects of testing is not always clearly discernible. Machine checking is characterized by a precise dashed line, because although its boundary is clear, it is an optional activity. Technically, human checking is also optional, but it would be a strange test process indeed that didn’t include at least some human checking. I thank the attendees of SWET5 for helping me with this: Rikard Edgren, Martin Jansson, Henrik Andersson, Michael Albrecht, Simon Morley, and Micke Ulander.

Mechanical or Magical? Noah Says “Neither.”

As I was having dinner with Noah Höjeberg tonight, he said an interesting thing. “Some people think testing is mechanical, and that’s bad enough. But a lot of people seem to think the alternative to mechanical is magical.”

(Noah is the new test strategist at Scila AB, in Stockholm. Interesting guy. I’ve played a lot of testing dice with him, in the past. I meant to do the Art Show game with him, too, but we got so much into our conversation that I completely forgot.)

Mechanical and magical are false opposites. In Rapid Testing, we pursue another path: Heuristical. In other words, skilled testing, achieved through systematic study and the deliberate application of heuristics. This is neither a mechanical, algorithmic process, not is it magical, mystical. We can show it, talk about it, etc. And yet it cannot be automated.

How I Invented Sympathetic Testing

I did not invent sympathetic testing. Anyone who says I claim to have invented it will have only read the title of this post, but nothing further. Now you know.

I may have been the first in my circle to recognize one specific benefit of sympathetic testing. But if so, that is a minor technical point. Because I know that my recognition came directly from a point of view championed by Cem Kaner. From where I stand, it came largely from him.

Okay, James, this is a bit confusing. Why are you talking about this, today?

Ajay Balamurugadas came to me today asking if I came up with the idea of “sympathetic testing.” This is an idea he’s found in Mike Kelly’s work, which derives partly from my work (on tours). It’s also in the section of Lessons Learned in Software Testing that I wrote (and which was edited by Cem Kaner and Bret Pettichord). He had already asked Cem Kaner, who said he got it from my brother [correction: Ajay tells me he saw Cem say that on a video]. But if so, I know exactly where Jon got it: from me, during a project we worked on around 1997. That’s where he was working for me on a court case where I had to do an analysis of “good enough” quality, using a method strongly influenced by Cem’s thinking about cognitive biases.

I’m talking about this today because the provenance of ideas can be confusing when people work closely together. Who cares who invented it, or named it? You might be surprised. Credit is important to people who live by their reputations in a world of ideas. Reputation is money, to put it bluntly. To achieve high income and control over your work as a tester, one way– and the only way I know, actually– is to build your public reputation. Moreover, a lot of our motivation is the respect of our peers. Therefore, this matters.

This isn’t really about “sympathetic testing”, then. Still, what IS sympathetic testing?

Sympathetic testing, as I think of it, is sometimes slightly confused with the closely related ideas of “happy path” or “positive testing.” Sympathetic testing means testing while affirming (rather than challenging) each assumption, resource, or service. But if that’s all it meant, then it would be the same as positive testing. For me, sympathetic testing means more. It means asking “what is wonderful about this product?” rather than “what is broken about this product?” I can do positive testing all day and still be focused on bugs. With sympathetic testing, I may find bugs, but that is not my focus or purpose– my main purpose is learning; building a rich model in my mind.

The insight I had, when working with my brother (way back before he was a recognized test expert in his own right) was that sympathetic testing made me better at unsympathetic testing. Searching happily prepares me to search aggressively.

Okay, then what’s really troubling you?

What bugs me is that Cem deserves more credit for sympathetic testing. But if he claims it himself, he probably thinks he will anger me, and if he gives me full credit, it would anger him: because Cem and I aren’t on speaking terms, at the moment. (I hope that’s temporary.)

I’m hereby offering him credit. Cem and I collaborated for roughly 16 years. We had probably hundreds of deep conversations about testing. Among those conversations, he lectured me on mental models and biases. We once had a fairly bitter argument about what a model is. He won that argument, thankfully, and I have been comfortable with the outcome ever since. He introduced me to Cognitive Science and pushed me deeper into Epistemology. It was that sensibility that led to me discovering a lot of things: among them the power of sympathetic testing. I’m certain that the first person I ran to with my discovery was Cem, but not only that, this is exactly the sort of thing that he was famous for encouraging: to look at things in a sympathetic way. I can’t remember the conversation itself or what he said. I bet he said something like “That’s just what I’ve been trying to tell you!”

Therefore, “sympathetic testing” is part of our joint work. Between about 1997 and 2007, it’s a good bet that I didn’t have a significant thought about testing which wasn’t also sifted and tested by Cem. We didn’t always agree, but we were extremely close. It’s for that reason that he put my name on his BBST class, even though I didn’t do any direct work on it. He was recognizing my influence on him. I’m doing that now.

I’m my own man. I have reinvented testing for myself (just as I recommend that others do). And yet a few people have deeply influenced me in that process. The three people who have had the most influence on my are my father, Jerry Weinberg, and Cem Kaner. I sometimes joke that I can sell out and offer baloney certifications to gullible testers, just like the ISTQB does– but only after those three men have died and I’m no longer seeking their respect.

[Postscript: Why did I title this piece “How I Invented Sympathetic Testing”? Because it was the best way I could think of to attract the attention of people who might be upset that I would make such a claim. Then maybe they would read this post, and feel a lot better.]

What Exploratory Practitioners Are Called

An exploratory DOCTOR is known as… a “doctor.”
An exploratory WELDER is known as… a “welder.”
An exploratory PILOT is known as… a “pilot.”
An exploratory WRITER is known as… a “writer.”
An exploratory SCIENTIST is known as… a “scientist.”
An exploratory TRUCK DRIVER is known as… a “truck driver.”

A non-exploratory doctor is known as… “irresponsible.”
A non-exploratory welder is known as… “irresponsible.”
A non-exploratory pilot is known as… “killed in a plane crash.”
A non-exploratory writer known as… “a plagiarist.”
A non-exploratory scientist is known as… “a tobacco company scientist.” (also “Creationist”)
A non-exploratory truck driver is known as… “lost.”

Can you spot the pattern here?

There is no such thing as an “exploratory tester” except inasmuch as a good tester obviously can and will do exploration as a basic part of his work.