Exploratory Testing 3.0

[Authors’ note: Others have already made the point we make here: that exploratory testing ought to be called testing. In fact, Michael said that about tests in 2009, and James wrote a blog post in 2010 that seems to say that about testers. Aaron Hodder said it quite directly in 2011, and so did Paul Gerrard. While we have long understood and taught that all testing is exploratory (here’s an example of what James told one student, last year), we have not been ready to make the rhetorical leap away from pushing the term “exploratory testing.” Even now, we are not claiming you should NOT use the term, only that it’s time to begin assuming that testing means exploratory testing, instead of assuming that it means scripted testing that also has exploration in it to some degree.]

[Second author’s note: Some people start reading this with a narrow view of what we mean by the word “script.” We are not referring to text! By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This includes text instructions, but also any form of instructions, or even biases that are not instructions.]

By James Bach and Michael Bolton

In the beginning, there was testing. No one distinguished between exploratory and scripted testing. Jerry Weinberg’s 1961 chapter about testing in his book, Computer Programming Fundamentals, depicted testing as inherently exploratory and expressed caution about formalizing it. He wrote, “It is, of course, difficult to have the machine check how well the program matches the intent of the programmer without giving a great deal of information about that intent. If we had some simple way of presenting that kind of information to the machine for checking, we might just as well have the machine do the coding. Let us not forget that complex logical operations occur through a combination of simple instructions executed by the computer and not by the computer logically deducing or inferring what is desired.”

Jerry understood the division between human work and machine work. But, then the formalizers came and confused everyone. The formalizers—starting officially in 1972 with the publication of the first testing book, Program Test Methods—focused on the forms of testing, rather than its essences. By forms, we mean words, pictures, strings of bits, data files, tables, flowcharts and other explicit forms of modeling. These are things that we can see, read, point to, move from place to place, count, store, retrieve, etc. It is tempting to look at these artifacts and say “Lo! There be testing!” But testing is not in any artifact. Testing, at the intersection of human thought processes and activities, makes use of artifacts. Artifacts of testing without the humans are like state of the art medical clinics without doctors or nurses: at best nearly useless, at worst, a danger to the innocents who try to make use of them.

We don’t blame the innovators. At that time, they were dealing with shiny new conjectures. The sky was their oyster! But formalization and mechanization soon escaped the lab. Reckless talk about “test factories” and poorly designed IEEE standards followed. Soon all “respectable” talk about testing was script-oriented. Informal testing was equated to unprofessional testing. The role of thinking, feeling, communicating humans became displaced.

James joined the fray in 1987 and tried to make sense of all this. He discovered, just by watching testing in progress, that “ad hoc” testing worked well for finding bugs and highly scripted testing did not. (Note: We don’t mean to make this discovery sound easy. It wasn’t. We do mean to say that the non-obvious truths about testing are in evidence all around us, when we put aside folklore and look carefully at how people work each day.) He began writing and speaking about his experiences. A few years into his work as a test manager, mostly while testing compilers and other developer tools, he discovered that Cem Kaner had coined a term—”exploratory testing”—to represent the opposite of scripted testing. In that original passage, just a few pages long, Cem didn’t define the term and barely described it, but he was the first to talk directly about designing tests while performing them.

Thus emerged what we, here, call ET 1.0.

(See The History of Definitions of ET for a chronological guide to our terminology.)

ET 1.0: Rebellion

Testing with and without a script are different experiences. At first, we were mostly drawn to the quality of ideas that emerged from unscripted testing. When we did ET, we found more bugs and better bugs. It just felt like better testing. We hadn’t yet discovered why this was so. Thus, the first iteration of exploratory testing (ET) as rhetoric and theory focused on escaping the straitjacket of the script and making space for that “better testing”. We were facing the attitude that “Ad hoc testing is uncontrolled and unmanageable; something you shouldn’t do.” We were pushing against that idea, and in that context ET was a special activity. So, the crusaders for ET treated it as a technique and advocated using that technique. “Put aside your scripts and look at the product! Interact with it! Find bugs!”

Most of the world still thinks of ET in this way: as a technique and a distinct activity. But we were wrong about characterizing it that way. Doing so, we now realize, marginalizes and misrepresents it. It was okay as a start, but thinking that way leads to a dead end. Many people today, even people who have written books about ET, seem to be happy with that view.

This era of ET 1.0 began to fade in 1995. At that time, there were just a handful of people in the industry actively trying to develop exploratory testing into a discipline, despite the fact that all testers unconsciously or informally pursued it, and always have. For these few people, it was not enough to leave ET in the darkness.

ET 1.5: Explication

Through the late ‘90s, a small community of testers beginning in North America (who eventually grew into the worldwide Context-Driven community, with some jumping over into the Agile testing community) was also struggling with understanding the skills and thought processes that constitute testing work in general. To do that, they pursued two major threads of investigation. One was Jerry Weinberg’s humanist approach to software engineering, combining systems thinking with family psychology. The other was Cem Kaner’s advocacy of cognitive science and Popperian critical rationalism. This work would soon cause us to refactor our notions of scripted and exploratory testing. Why? Because our understanding of the deep structures of testing itself was evolving fast.

When James joined ST Labs in 1995, he was for the first time fully engaged in developing a vision and methodology for software testing. This was when he and Cem began their fifteen-year collaboration. This was when Rapid Software Testing methodology first formed. One of the first big innovations on that path was the introduction of guideword heuristics as one practical way of joining real-time tester thinking with a comprehensive underlying model of the testing process. Lists of test techniques or documentation templates had been around for a long time, but as we developed vocabulary and cognitive models for skilled software testing in general, we started to see exploratory testing in a new light. We began to compare and contrast the important structures of scripted and exploratory testing and the relationships between them, instead of seeing them as activities that merely felt different.

In 1996, James created the first testing class called “Exploratory Testing.”  He had been exposed to design patterns thinking and had tried to incorporate that into the class. He identified testing competencies.

Note: During this period, James distinguished between exploratory and ad hoc testing—a distinction we no longer make. ET is an ad hoc process, in the dictionary sense: ad hoc means “to this; to the purpose”. He was really trying to distinguish between skilled and unskilled testing, and today we know better ways to do that. We now recognize unskilled ad hoc testing as ET, just as unskilled cooking is cooking, and unskilled dancing is dancing. The value of the label “exploratory testing” is simply that it is more descriptive of an activity that is, among other things, ad hoc.

In 1999, James was commissioned to define a formalized process of ET for Microsoft. The idea of a “formal ad hoc process” seemed paradoxical, however, and this set up a conflict which would be resolved via a series of constructive debates between James and Cem. Those debates would lead to we here will call ET 2.0.

There was also progress on making ET more friendly to project management. In 2000, inspired by the work for Microsoft, James and Jon Bach developed “Session-Based Test Management” for a group at Hewlett-Packard. In a sense this was a generalized form of the Microsoft process, with the goal of creating a higher level of accountability around informal exploratory work. SBTM was intended to help defend exploratory work from compulsive formalizers who were used to modeling testing in terms of test cases. In one sense, SBTM was quite successful in helping people to recognize that exploratory work was entirely manageable. SBTM helped to transform attitudes from “don’t do that” to “okay, blocks of ET time are things just like test cases are things.”

By 2000, most of the testing world seemed to have heard something about exploratory testing. We were beginning to make the world safe for better testing.

ET 2.0: Integration

The era of ET 2.0 has been a long one, based on a key insight: the exploratory-scripted continuum. This is a sliding bar on which testing ranges from completely exploratory to completely scripted. All testing work falls somewhere on this scale. Having recognized this, we stopped speaking of exploratory testing as a technique, but rather as an approach that applies to techniques (or as Cem likes to say, a “style” of testing).

We could think of testing that way because, unlike ten years earlier, we now had a rich idea of the skills and elements of testing. It was no longer some “creative and mystical” act that some people are born knowing how to do “intuitively”. We saw testing as involving specific structures, models, and cognitive processes other than exploring, so we felt we could separate exploring from testing in a useful way. Much of what we had called exploratory testing in the early 90’s we now began to call “freestyle exploratory testing.”

By 2006, we settled into a simple definition of ET, simultaneous learning, test design, and test execution. To help push the field forward, James and Cem convened a meeting called the Exploratory Testing Research Summit in January 2006. (The participants were James Bach, Jonathan Bach, Scott Barber, Michael Bolton, Elisabeth Hendrickson, Cem Kaner, Mike Kelly, Jonathan Kohl, James Lyndsay, and Rob Sabourin.) As we prepared for that, we made a disturbing discovery: every single participant in the summit agreed with the definition of ET, but few of us agreed on what the definition actually meant. This is a phenomenon we had no name for at the time, but is now called shallow agreement in the CDT community. To combat shallow agreement and promote better understanding of ET, some of us decided to adopt a more evocative and descriptive definition of it, proposed originally by Cem and later edited by several others: “a style of testing that emphasizes the freedom and responsibility of the individual tester to continually optimize the quality of his work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the course of the project.” Independently of each other, Jon Bach and Michael had suggested the “freedom and responsibility” part to that definition.

And so we had come to a specific and nuanced idea of exploration and its role in testing. Exploration can mean many things: searching a space, being creative, working without a map, doing things no one has done before, confronting complexity, acting spontaneously, etc. With the advent of the continuum concept (which James’ brother Jon actually called the “tester freedom scale”) and the discussions at the ExTRS peer conference, we realized most of those different notions of exploration are already central to testing, in general. What the adjective “exploratory” added, and how it contrasted with “scripted,” was the dimension of agency. In other words: self-directedness.

The full implications of the new definition became clear in the years that followed, and James and Michael taught and consulted in Rapid Software Testing methodology. We now recognize that by “exploratory testing”, we had been trying to refer to rich, competent testing that is self-directed. In other words, in all respects other than agency, skilled exploratory testing is not distinguishable from skilled scripted testing. Only agency matters, not documentation, nor deliberation, nor elapsed time, nor tools, nor conscious intent. You can be doing scripted testing without any scrap of paper nearby (scripted testing does not require that you follow a literal script). You can be doing scripted testing that has not been in any way pre-planned (someone else may be telling you what to do in real-time as they think of ideas). You can be doing scripted testing at a moment’s notice (someone might have just handed you a script, or you might have just developed one yourself). You can be doing scripted testing with or without tools (tools make testing different, but not necessarily more scripted). You can be doing scripted testing even unconsciously (perhaps you feel you are making free choices, but your models and habits have made an invisible prison for you). The essence of scripted testing is that the tester is not in control, but rather is being controlled by some other agent or process. This one simple, vital idea took us years to apprehend!

In those years we worked further on our notions of the special skills of exploratory testing. James and Jon Bach created the Exploratory Skills and Tactics reference sheet to bring specificity and detail to answer the question “what specifically is exploratory about exploratory testing?”

In 2007, another big slow leap was about to happen. It started small: inspired in part by a book called The Shape of Actions, James began distinguishing between processes that required human judgment and wisdom and those which did not. He called them “sapient” vs. “non-sapient.” This represented a new frontier for us: systematic study and development of tacit knowledge.

In 2009, Michael followed that up by distinguishing between testing and checking. Testing cannot be automated, but checking can be completely automated. Checking is embedded within testing. At first, James objected that, since there was already a concept of sapient testing, the distinction was unnecessary. To him, checking was simply non-sapient testing. But after a few years of applying these ideas in our consulting and training, we came to realize (as neither of us did at first) that checking and testing was a better way to think and speak than sapience and non-sapience. This is because “non-sapience” sounds like “stupid” and therefore it sounded like we were condemning checking by calling it non-sapient.

Do you notice how fine distinctions of language and thought can take years to work out? These ideas are the tools we need to sort out our practical decisions. Yet much like new drugs on the market, it can sometimes take a lot of experience to understand not only benefits, but also potentially harmful side effects of our ideas and terms. That may explain why those of us who’ve been working in the craft a long time are not always patient with colleagues or clients who shrug and tell us that “it’s just semantics.” It is our experience that semantics like these mean the difference between clear communication that motivates action and discipline, and fragile folklore that gets displaced by the next swarm of buzzwords to capture the fancy of management.

ET 3.0: Normalization

In 2011, sociologist Harry Collins began to change everything for us. It started when Michael read Tacit and Explicit Knowledge. We were quickly hooked on Harry’s clear writing and brilliant insight. He had spent many years studying scientists in action, and his ideas about the way science works fit perfectly with what we see in the testing field.

By studying the work of Harry and his colleagues, we learned how to talk about the difference between tacit and explicit knowledge, which allows us to recognize what can and cannot be encoded in a script or other artifacts. He distinguished between behaviour (the observable, describable aspects of an activity) and actions (behaviours with intention) (which had inspired James’ distinction between sapient and non-sapient testing). He untangled the differences between mimeomorphic actions (actions that we want to copy and to perform in the same way every time) and polimorphic actions (actions that we must vary in order to deal with social conditions); in doing that, he helped to identify the extents and limits of automation’s power. He wrote a book (with Trevor Pinch) about how scientific knowledge is constructed; another (with Rob Evans) about expertise; yet another about how scientists decide to evaluate a specific experimental result.

Harry’s work helped lend structure to other ideas that we had gathered along the way.

  • McLuhan’s ideas about media and tools
  • Karl Weick’s work on sensemaking
  • Venkatesh Rao’s notions of tempo which in turn pointed us towards James C. Scott’s notion of legibility
  • The realization (brought to our attention by an innocent question from a tester at Barclays Bank) that the “exploratory-scripted continuum” is actually the “formality continuum.” In other words, to formalize an activity means to make it more scripted.
  • The realization of the important difference between spontaneous and deliberative testing, which is the degree of reflection that the tester is exercising. (This is not the same as exploratory vs. scripted, which is about the degree of agency.)
  • The concept of “responsible tester” (defined as a tester who takes full, personal, responsibility for the quality of his work).
  • The advent of the vital distinction between checking and testing, which replaced need to talk about “sapience” in our rhetoric of testing.
  • The subsequent redefinition of the term “testing” within the Rapid Software Testing namespace to make these things more explicit (see below).

About That Last Bullet Point

ET 3.0 as a term is a bit paradoxical because what we are working toward, within the Rapid Software Testing methodology, is nothing less than the deprecation of the term “exploratory testing.”

Yes, we are retiring that term, after 22 years. Why?

Because we now define all testing as exploratory.  Our definition of testing is now this:

“Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes: questioning, study, modeling, observation and inference, output checking, etc.”

Where does scripted testing fit, then?  By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.

By defining testing to be exploratory, scripting becomes a guest in the house of our craft; a potentially useful but foreign element to testing, one that is interesting to talk about and apply as a tactic in specific situations. An excellent tester should not be complacent or dismissive about scripting, any more than a lumberjack can be complacent or dismissive about heavy equipment. This stuff can help you or ruin you, but no serious professional can ignore it.

Are you doing testing? Then you are already doing exploratory testing. Are you doing scripted testing? If you’re doing it responsibly, you are doing exploratory testing with scripting (and perhaps with checking).  If you’re only doing “scripted testing,” then you are just doing unmotivated checking, and we would say that you are not really testing. You are trying to behave like a machine, not a responsible tester.

ET 3.0, in a sentence, is the demotion of scripting to a technique, and the promotion of exploratory testing to, simply, testing.

History of Definitions of ET

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software (There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.)
1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory.
1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.”
1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner.
1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class)
1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0)
1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0)
2001(post WHET #1) The Bach View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers)

2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner
2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion)
2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.)