Exploratory Testing 3.0

[Authors’ note: Others have already made the point we make here: that exploratory testing ought to be called testing. In fact, Michael said that about tests in 2009, and James wrote a blog post in 2010 that seems to say that about testers. Aaron Hodder said it quite directly in 2011, and so did Paul Gerrard. While we have long understood and taught that all testing is exploratory (here’s an example of what James told one student, last year), we have not been ready to make the rhetorical leap away from pushing the term “exploratory testing.” Even now, we are not claiming you should NOT use the term, only that it’s time to begin assuming that testing means exploratory testing, instead of assuming that it means scripted testing that also has exploration in it to some degree.]

[Second author’s note: Some people start reading this with a narrow view of what we mean by the word “script.” We are not referring to text! By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This includes text instructions, but also any form of instructions, or even biases that are not instructions.]

By James Bach and Michael Bolton

In the beginning, there was testing. No one distinguished between exploratory and scripted testing. Jerry Weinberg’s 1961 chapter about testing in his book, Computer Programming Fundamentals, depicted testing as inherently exploratory and expressed caution about formalizing it. He wrote, “It is, of course, difficult to have the machine check how well the program matches the intent of the programmer without giving a great deal of information about that intent. If we had some simple way of presenting that kind of information to the machine for checking, we might just as well have the machine do the coding. Let us not forget that complex logical operations occur through a combination of simple instructions executed by the computer and not by the computer logically deducing or inferring what is desired.”

Jerry understood the division between human work and machine work. But, then the formalizers came and confused everyone. The formalizers—starting officially in 1972 with the publication of the first testing book, Program Test Methods—focused on the forms of testing, rather than its essences. By forms, we mean words, pictures, strings of bits, data files, tables, flowcharts and other explicit forms of modeling. These are things that we can see, read, point to, move from place to place, count, store, retrieve, etc. It is tempting to look at these artifacts and say “Lo! There be testing!” But testing is not in any artifact. Testing, at the intersection of human thought processes and activities, makes use of artifacts. Artifacts of testing without the humans are like state of the art medical clinics without doctors or nurses: at best nearly useless, at worst, a danger to the innocents who try to make use of them.

We don’t blame the innovators. At that time, they were dealing with shiny new conjectures. The sky was their oyster! But formalization and mechanization soon escaped the lab. Reckless talk about “test factories” and poorly designed IEEE standards followed. Soon all “respectable” talk about testing was script-oriented. Informal testing was equated to unprofessional testing. The role of thinking, feeling, communicating humans became displaced.

James joined the fray in 1987 and tried to make sense of all this. He discovered, just by watching testing in progress, that “ad hoc” testing worked well for finding bugs and highly scripted testing did not. (Note: We don’t mean to make this discovery sound easy. It wasn’t. We do mean to say that the non-obvious truths about testing are in evidence all around us, when we put aside folklore and look carefully at how people work each day.) He began writing and speaking about his experiences. A few years into his work as a test manager, mostly while testing compilers and other developer tools, he discovered that Cem Kaner had coined a term—”exploratory testing”—to represent the opposite of scripted testing. In that original passage, just a few pages long, Cem didn’t define the term and barely described it, but he was the first to talk directly about designing tests while performing them.

Thus emerged what we, here, call ET 1.0.

(See The History of Definitions of ET for a chronological guide to our terminology.)

ET 1.0: Rebellion

Testing with and without a script are different experiences. At first, we were mostly drawn to the quality of ideas that emerged from unscripted testing. When we did ET, we found more bugs and better bugs. It just felt like better testing. We hadn’t yet discovered why this was so. Thus, the first iteration of exploratory testing (ET) as rhetoric and theory focused on escaping the straitjacket of the script and making space for that “better testing”. We were facing the attitude that “Ad hoc testing is uncontrolled and unmanageable; something you shouldn’t do.” We were pushing against that idea, and in that context ET was a special activity. So, the crusaders for ET treated it as a technique and advocated using that technique. “Put aside your scripts and look at the product! Interact with it! Find bugs!”

Most of the world still thinks of ET in this way: as a technique and a distinct activity. But we were wrong about characterizing it that way. Doing so, we now realize, marginalizes and misrepresents it. It was okay as a start, but thinking that way leads to a dead end. Many people today, even people who have written books about ET, seem to be happy with that view.

This era of ET 1.0 began to fade in 1995. At that time, there were just a handful of people in the industry actively trying to develop exploratory testing into a discipline, despite the fact that all testers unconsciously or informally pursued it, and always have. For these few people, it was not enough to leave ET in the darkness.

ET 1.5: Explication

Through the late ‘90s, a small community of testers beginning in North America (who eventually grew into the worldwide Context-Driven community, with some jumping over into the Agile testing community) was also struggling with understanding the skills and thought processes that constitute testing work in general. To do that, they pursued two major threads of investigation. One was Jerry Weinberg’s humanist approach to software engineering, combining systems thinking with family psychology. The other was Cem Kaner’s advocacy of cognitive science and Popperian critical rationalism. This work would soon cause us to refactor our notions of scripted and exploratory testing. Why? Because our understanding of the deep structures of testing itself was evolving fast.

When James joined ST Labs in 1995, he was for the first time fully engaged in developing a vision and methodology for software testing. This was when he and Cem began their fifteen-year collaboration. This was when Rapid Software Testing methodology first formed. One of the first big innovations on that path was the introduction of guideword heuristics as one practical way of joining real-time tester thinking with a comprehensive underlying model of the testing process. Lists of test techniques or documentation templates had been around for a long time, but as we developed vocabulary and cognitive models for skilled software testing in general, we started to see exploratory testing in a new light. We began to compare and contrast the important structures of scripted and exploratory testing and the relationships between them, instead of seeing them as activities that merely felt different.

In 1996, James created the first testing class called “Exploratory Testing.”  He had been exposed to design patterns thinking and had tried to incorporate that into the class. He identified testing competencies.

Note: During this period, James distinguished between exploratory and ad hoc testing—a distinction we no longer make. ET is an ad hoc process, in the dictionary sense: ad hoc means “to this; to the purpose”. He was really trying to distinguish between skilled and unskilled testing, and today we know better ways to do that. We now recognize unskilled ad hoc testing as ET, just as unskilled cooking is cooking, and unskilled dancing is dancing. The value of the label “exploratory testing” is simply that it is more descriptive of an activity that is, among other things, ad hoc.

In 1999, James was commissioned to define a formalized process of ET for Microsoft. The idea of a “formal ad hoc process” seemed paradoxical, however, and this set up a conflict which would be resolved via a series of constructive debates between James and Cem. Those debates would lead to we here will call ET 2.0.

There was also progress on making ET more friendly to project management. In 2000, inspired by the work for Microsoft, James and Jon Bach developed “Session-Based Test Management” for a group at Hewlett-Packard. In a sense this was a generalized form of the Microsoft process, with the goal of creating a higher level of accountability around informal exploratory work. SBTM was intended to help defend exploratory work from compulsive formalizers who were used to modeling testing in terms of test cases. In one sense, SBTM was quite successful in helping people to recognize that exploratory work was entirely manageable. SBTM helped to transform attitudes from “don’t do that” to “okay, blocks of ET time are things just like test cases are things.”

By 2000, most of the testing world seemed to have heard something about exploratory testing. We were beginning to make the world safe for better testing.

ET 2.0: Integration

The era of ET 2.0 has been a long one, based on a key insight: the exploratory-scripted continuum. This is a sliding bar on which testing ranges from completely exploratory to completely scripted. All testing work falls somewhere on this scale. Having recognized this, we stopped speaking of exploratory testing as a technique, but rather as an approach that applies to techniques (or as Cem likes to say, a “style” of testing).

We could think of testing that way because, unlike ten years earlier, we now had a rich idea of the skills and elements of testing. It was no longer some “creative and mystical” act that some people are born knowing how to do “intuitively”. We saw testing as involving specific structures, models, and cognitive processes other than exploring, so we felt we could separate exploring from testing in a useful way. Much of what we had called exploratory testing in the early 90’s we now began to call “freestyle exploratory testing.”

By 2006, we settled into a simple definition of ET, simultaneous learning, test design, and test execution. To help push the field forward, James and Cem convened a meeting called the Exploratory Testing Research Summit in January 2006. (The participants were James Bach, Jonathan Bach, Scott Barber, Michael Bolton, Elisabeth Hendrickson, Cem Kaner, Mike Kelly, Jonathan Kohl, James Lyndsay, and Rob Sabourin.) As we prepared for that, we made a disturbing discovery: every single participant in the summit agreed with the definition of ET, but few of us agreed on what the definition actually meant. This is a phenomenon we had no name for at the time, but is now called shallow agreement in the CDT community. To combat shallow agreement and promote better understanding of ET, some of us decided to adopt a more evocative and descriptive definition of it, proposed originally by Cem and later edited by several others: “a style of testing that emphasizes the freedom and responsibility of the individual tester to continually optimize the quality of his work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the course of the project.” Independently of each other, Jon Bach and Michael had suggested the “freedom and responsibility” part to that definition.

And so we had come to a specific and nuanced idea of exploration and its role in testing. Exploration can mean many things: searching a space, being creative, working without a map, doing things no one has done before, confronting complexity, acting spontaneously, etc. With the advent of the continuum concept (which James’ brother Jon actually called the “tester freedom scale”) and the discussions at the ExTRS peer conference, we realized most of those different notions of exploration are already central to testing, in general. What the adjective “exploratory” added, and how it contrasted with “scripted,” was the dimension of agency. In other words: self-directedness.

The full implications of the new definition became clear in the years that followed, and James and Michael taught and consulted in Rapid Software Testing methodology. We now recognize that by “exploratory testing”, we had been trying to refer to rich, competent testing that is self-directed. In other words, in all respects other than agency, skilled exploratory testing is not distinguishable from skilled scripted testing. Only agency matters, not documentation, nor deliberation, nor elapsed time, nor tools, nor conscious intent. You can be doing scripted testing without any scrap of paper nearby (scripted testing does not require that you follow a literal script). You can be doing scripted testing that has not been in any way pre-planned (someone else may be telling you what to do in real-time as they think of ideas). You can be doing scripted testing at a moment’s notice (someone might have just handed you a script, or you might have just developed one yourself). You can be doing scripted testing with or without tools (tools make testing different, but not necessarily more scripted). You can be doing scripted testing even unconsciously (perhaps you feel you are making free choices, but your models and habits have made an invisible prison for you). The essence of scripted testing is that the tester is not in control, but rather is being controlled by some other agent or process. This one simple, vital idea took us years to apprehend!

In those years we worked further on our notions of the special skills of exploratory testing. James and Jon Bach created the Exploratory Skills and Tactics reference sheet to bring specificity and detail to answer the question “what specifically is exploratory about exploratory testing?”

In 2007, another big slow leap was about to happen. It started small: inspired in part by a book called The Shape of Actions, James began distinguishing between processes that required human judgment and wisdom and those which did not. He called them “sapient” vs. “non-sapient.” This represented a new frontier for us: systematic study and development of tacit knowledge.

In 2009, Michael followed that up by distinguishing between testing and checking. Testing cannot be automated, but checking can be completely automated. Checking is embedded within testing. At first, James objected that, since there was already a concept of sapient testing, the distinction was unnecessary. To him, checking was simply non-sapient testing. But after a few years of applying these ideas in our consulting and training, we came to realize (as neither of us did at first) that checking and testing was a better way to think and speak than sapience and non-sapience. This is because “non-sapience” sounds like “stupid” and therefore it sounded like we were condemning checking by calling it non-sapient.

Do you notice how fine distinctions of language and thought can take years to work out? These ideas are the tools we need to sort out our practical decisions. Yet much like new drugs on the market, it can sometimes take a lot of experience to understand not only benefits, but also potentially harmful side effects of our ideas and terms. That may explain why those of us who’ve been working in the craft a long time are not always patient with colleagues or clients who shrug and tell us that “it’s just semantics.” It is our experience that semantics like these mean the difference between clear communication that motivates action and discipline, and fragile folklore that gets displaced by the next swarm of buzzwords to capture the fancy of management.

ET 3.0: Normalization

In 2011, sociologist Harry Collins began to change everything for us. It started when Michael read Tacit and Explicit Knowledge. We were quickly hooked on Harry’s clear writing and brilliant insight. He had spent many years studying scientists in action, and his ideas about the way science works fit perfectly with what we see in the testing field.

By studying the work of Harry and his colleagues, we learned how to talk about the difference between tacit and explicit knowledge, which allows us to recognize what can and cannot be encoded in a script or other artifacts. He distinguished between behaviour (the observable, describable aspects of an activity) and actions (behaviours with intention) (which had inspired James’ distinction between sapient and non-sapient testing). He untangled the differences between mimeomorphic actions (actions that we want to copy and to perform in the same way every time) and polimorphic actions (actions that we must vary in order to deal with social conditions); in doing that, he helped to identify the extents and limits of automation’s power. He wrote a book (with Trevor Pinch) about how scientific knowledge is constructed; another (with Rob Evans) about expertise; yet another about how scientists decide to evaluate a specific experimental result.

Harry’s work helped lend structure to other ideas that we had gathered along the way.

  • McLuhan’s ideas about media and tools
  • Karl Weick’s work on sensemaking
  • Venkatesh Rao’s notions of tempo which in turn pointed us towards James C. Scott’s notion of legibility
  • The realization (brought to our attention by an innocent question from a tester at Barclays Bank) that the “exploratory-scripted continuum” is actually the “formality continuum.” In other words, to formalize an activity means to make it more scripted.
  • The realization of the important difference between spontaneous and deliberative testing, which is the degree of reflection that the tester is exercising. (This is not the same as exploratory vs. scripted, which is about the degree of agency.)
  • The concept of “responsible tester” (defined as a tester who takes full, personal, responsibility for the quality of his work).
  • The advent of the vital distinction between checking and testing, which replaced need to talk about “sapience” in our rhetoric of testing.
  • The subsequent redefinition of the term “testing” within the Rapid Software Testing namespace to make these things more explicit (see below).

About That Last Bullet Point

ET 3.0 as a term is a bit paradoxical because what we are working toward, within the Rapid Software Testing methodology, is nothing less than the deprecation of the term “exploratory testing.”

Yes, we are retiring that term, after 22 years. Why?

Because we now define all testing as exploratory.  Our definition of testing is now this:

“Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes: questioning, study, modeling, observation and inference, output checking, etc.”

Where does scripted testing fit, then?  By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.

By defining testing to be exploratory, scripting becomes a guest in the house of our craft; a potentially useful but foreign element to testing, one that is interesting to talk about and apply as a tactic in specific situations. An excellent tester should not be complacent or dismissive about scripting, any more than a lumberjack can be complacent or dismissive about heavy equipment. This stuff can help you or ruin you, but no serious professional can ignore it.

Are you doing testing? Then you are already doing exploratory testing. Are you doing scripted testing? If you’re doing it responsibly, you are doing exploratory testing with scripting (and perhaps with checking).  If you’re only doing “scripted testing,” then you are just doing unmotivated checking, and we would say that you are not really testing. You are trying to behave like a machine, not a responsible tester.

ET 3.0, in a sentence, is the demotion of scripting to a technique, and the promotion of exploratory testing to, simply, testing.

History of Definitions of ET

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software (There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.)
1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory.
1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.”
1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner.
1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class)
1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0)
1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0)
2001(post WHET #1) The Bach View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers)

2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner
2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion)
2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.)


Mr. Langella Never Does it the Same Way Twice

This is from the New York Times:

Its other hallmark is that Mr. Langella never does the part the same way twice. This is partly because he’s still in the process of discovering the character and partly because it’s almost a point of honor. “The Brit approach is very different from mine,” he said. “There’s a tendency to value consistency over creativity. You get it, you nail it, you repeat it. I’d rather hang myself. To me, every night within a certain framework — the framework of integrity — you must forget what you did the night before and create it anew every single time you walk out on the stage.”

I love that phrase the framework of integrity. It ties in to what I’ve been saying about integrity and also what is true about informal testing: if you are well prepared, and you are true to yourself, then whatever you do is going to be spontaneously and rather effortlessly okay, even if it changes over time.

I often hear anxiety in testers and managers about how terrible it is to do something once, some particular way, and then to forget it. What a waste, they say. Let’s write it all down and not deviate from our past behavior, they say. Well I don’t think it’s waste, I think it’s mental hygiene. Testing is a performance, and I want to be free to perform better. So, I make notes, sure. But I am properly reluctant about formalizing what I do.

Doing your best work includes having the courage to let go of pretty good work that you’ve done before.

Justifying Real Acceptance Testing

This post is not about the sort of testing people talk about when nearing a release and deciding whether it’s done. I have another word for that. I call it “testing,” or sometimes final testing or release testing. Many projects perform that testing in such a perfunctory way that it is better described as checking, according to the distinction between testing and checking I have previously written of on this blog. As Michael Bolton points out, that checking may better be described as rejection checking since a “fail” supposedly establishes a basis for saying the product is not done, whereas no amount of “passes” can show that it is done.

Acceptance testing can be defined in various ways. This post is about what I consider real acceptance testing, which I define as testing by a potential acceptor (a customer), performed for the purpose of informing a decision to accept (to purchase or rely upon) a product.

Do we need acceptance testing?

Whenever a business decides to purchase and rely upon a component or service, there is a danger that the product will fail and the business will suffer. One approach to dealing with that problem is to adopt the herd solution: follow the thickest part of the swarm; choose a popular product that is advertised or reputed to do what you want it to do and you will probably be okay. I have done that with smartphones, ruggedized laptops, file-sharing services, etc. with good results, though sometimes I am disappointed.

My business is small. I am nimble compared to almost every other company in the world. My acceptance testing usually takes the form of getting a trial subscription to service, or downloading the “basic” version of a product. Then I do some work with it and see how I feel. In this way I learned to love Dropbox, despite its troubling security situation (I can’t lock up my Dropbox files), or the fact that there is a significant chance it will corrupt very large files. (I no longer trust it with anything over half of a gig).

But what if I were advising a large company about whether to adopt a service or product that it will rely upon across dozens or hundreds or thousands of employees? What if the product has been customized or custom built specifically for them? That’s when acceptance testing becomes important.

Doesn’t the Service Level Agreement guarantee that the product will work?

There are a couple of problems with relying on vendor promises. First, the vendor probably isn’t promising total satisfaction. The service “levels” in the contract are probably narrowly and specifically drawn. That means if you don’t think of everything that matters and put that in the contract, it’s not covered. Testing is a process that helps reveal the dimensions of the service that matter.

Second, there’s an issue with timing. By the time you discover a problem with the vendor’s product, you may be already relying on it. You may already have deployed it widely. It may be too late to back out or switch to a different solution. Perhaps your company negotiated remedies in that case, but there are practical limitations to any remedy. If your vendor is very small, they may not be able to afford to fix their product quickly. If you vendor is very large, they may be able to afford to drag their feet on the fixes.

Acceptance testing protects you and makes the vendor take quality more seriously.

Acceptance testing should never be handled by the vendor. I was once hired by a vendor to do penetration testing on their product in order to appease a customer. But the vendor had no incentive to help me succeed in my assignment, nor to faithfully report the vulnerabilities I discovered. It would have been far better if the customer had hired me.

Only the accepting party has the incentive to test well. Acceptance testing should not be pre-agreed or pre-planned in any detail– otherwise the vendor will be sure that the product passes those specific tests. It should be unpredictable, so that the vendor has an incentive to make the product truly meet its requirements in a general sense. It should be adaptive (exploratory) so that any weakness you find can be examined and exploited.

The vendor wants your money. If your company is large enough, and the vendor is hungry, they will move mountains to make the product work well if they know you are paying attention. Acceptance testing, done creatively by skilled testers on a mission, keeps the vendor on its toes.

By explicitly testing in advance of your decision to accept the product, you have a fighting chance to avoid the disaster of discovering too late that the product is a lemon.

My management doesn’t think acceptance testing matters. What do I do?

1. Make the argument, above.
2. Communicate with management, formally, about this. (In writing, so that there is a record.)
It is up to management to make decisions about business risk. They may feel the risk is not worth worrying about. In that case, you must wait and watch. People are usually more persuaded by vivid experiences, rather than abstract principles, so:
1. Collect specific examples of the problems you are talking about. What bugs have you experienced in vendor products?
2. Collect news reports about bugs in products of your vendors (or other vendors) that have been disruptive.
3. In the event you get to do even a little acceptance testing, make a record of the problems you find and be ready to remind management of that history.


A Public Service Announcement About Exploratory Testing

[Updated: I revamped and added some more examples to the list.]

I got this message from Oliver Vilson, today:

Oliver V.: hi James. Just had a chat with Helena_JM. She reminded me something… don’t know if you’ve written blog about it or pushed it into  RST.. One Test lead from another company mentioned me he has problems with his testers. Some of them are saying that they don’t have to do test plans, since your teaching seems to align that…
James Bach: Any more details?
Oliver V.: rough translation from test team lead : “ET seems to have reputation as “excuse for shitty testing”. People can’t explain what they did and why. If you ask them for test plan or explanation, all you get is “but Bach said…”.

I have, from time to time, heard rumors that some people cite my writings and teachings as an excuse to do bad testing. I think it would help to do a public service announcement…

Attention Testers and Managers of Testers

If a tester claims he is justified in doing bad work because of something I’ve published or said, please email me at james@satisfice.com, or Skype me, and I will help you stop that silliness.

I teach skilled software testing for people who intend to do an excellent job. That process is necessarily exploratory in nature. It also necessarily will have some scripted elements– partly due to the nature of thinking and partly due to the requirements of excellent intellectual work.

I do not teach evasiveness or obscurantism. I do not ever tell a tester that he can get away with refusing to explain his test process. Explaining testing is an important part of being a professional.

Why People Get Confused

I reinvented software testing for myself, from first principles. So, I teach from a very different set of premises. This is necessary, because common ideas about testing are so idiotic. But it does result in confusion when my ideas are taken out of context and “mixed in” to the idiocy. Consider: “I won’t create a detailed test plan document” is a perfectly ordinary and potentially reasonable thing to say in RST. It is a statement about things made explicit in a document, not a statement about lack of planning. Yet Factory School methodology confuses documents for content. If you say that to one of them, it may be mistaken for a refusal to apply appropriate rigor to your work.

Here are some examples of how someone might misapply my teachings:

  1. Rapid Software Testing methodology (RST) is not the same thing as exploratory testing. ET is very simple. Anyone can do ET, just as anyone can look at a painting. But there’s a huge difference between a skilled appraisal of a painting by an expert and a bored glance by a schoolkid. RST is a methodology for doing testing (including scripted and exploratory testing) well. Therefore, anyone doing ET badly is not doing my methodology.
  2. In RST, a plan is not a document, it’s a set of ideas. Therefore, I say you don’t need to have a test plan template, or any sort of written test plan document in order to have a good test plan. I often document my test ideas, though, in different ways, when that helps. Therefore, the lack of a test plan (a guiding set of ideas) probably represents an immature and possibly inadequate test process, but the lack of a test plan document is not necessarily a problem.
  3. In RST, a test is not a document, it’s a performance. Therefore the lack of documented tests is not necessarily a problem, but poor testing (which can be determined by direct observation by a skilled tester or test manager, just as poor carpentry or poor doctoring can be detected) is a problem.
  4. In RST, we have no templates for reporting. But reporting is crucial. Reporting skills are crucial. Accountability is crucial. Credibility is crucial. We teach the art of telling a testing story. Therefore, anyone who declines to explain himself when asked about his testing is not practicing RST. I disavow such testers. (However, just because explaining oneself is an important part of testing doesn’t mean a manager can insist on arbitrarily voluminous documentation or arbitrary metrics. I suspect that, in some cases, managers who complain about testers refusing to document or explain themselves are really just obsessed with a specific method of documentation and refusing to accept other viable solutions to the same problem.)
  5. In RST we say that testing cannot be automated, and that tools can become an obsession. This leads some to think I am against tools. No, I am against bad work. Unfortunately, some tools, such as expensive HP/Mercury tools, are often used to wastefully automate weak fact checking at he expense of good testing. Yes., tools and the technical skills to create and apply them play an important role in great testing. It’s not automating testing when I use tools, because testing is whatever testers do, not what tools do. Therefore a tester who refuses to learn and use tools in general is not practicing RST.
  6. In RST we distinguish between checking and testing. This allows us to distinguish between a test process that is appropriately thoughtful and deep, and one (based solely on checking) that would be reckless and shallow. But when we criticize a checking-only test strategy, some people get confused and think we are criticizing the presence of checking rather than the lack of testing. Therefore, a tester who refuses to design or perform checks that are actually economical and helpful is not doing RST.
  7. In RST, we ban unscientific, abusive attempts at using metrics to control the test process. But when some people hear us attack, say, the counting of test cases, they assume that means we don’t believe in even the concept or principle of measurement. Instead, we support using inquiry-focused metrics (which inspire questions rather than dictating decisions), we promote active skepticism about numbers applied to social systems, and we promote the development of observation, reasoning, and social skills that limit the need for quantification. Therefore any tester who simply refuses to consider using metrics of any kind is not doing RST.
  8. Some people hear about the freedom of exploratory testing, and they confuse that with irresponsibility. But that’s silly. If you drive a car, you are free to run over pedestrians or smash into buildings– except you don’t, because you are responsible! Also, it’s against the law. Freedom is not the same thing as having a right. Therefore, anyone who accepts the freedom of exploratory testing and cannot or will not manage that testing appropriately is an incompetent or irresponsible tester.

What Exploratory Testing is Not

Michael Bolton has gone off like a volcano in Iceland, writing a series about what exploratory testing isn’t:






Another thing I would add to this:

Exploratory testing is not defined by any specific example of exploratory testing.

Just as tap dancing does not characterize ballroom dancing, you can’t take any one example of exploratory testing and treat that as representative of the entire concept of ET.

If you were to hear me singing an aria by Mozart, that would be an example of opera singing. It would be an example of BAD opera singing, but it would truly be an example of the style. Similarly, I regularly talk to testers who go “oh yeah I’ve seen that exploratory testing stuff but it’s not structured… not documented… not x… not y… not whatever.” And my reply is “you probably haven’t seen skilled exploratory testing. Would you like to hear me sing an opera now? OR, I could show you a good example of ET in practice.”

Exploratory testing can be done in an unskilled, slapdash, silly way. Just as a unskilled driver behind the wheel of a car is still a driver who is driving a car, a poor tester can still be doing ET– albeit probably not very well.

The cool thing about ET is that, even done badly, it’s still a great way to find some bugs. Michael and I try to help you do much, much better than that.

The core idea of ET remains as it always has been. It’s been expressed in many different ways, but boils down to this: test design and test execution and learning mixed together in a mutually supportive way. Whenever you see that, and to the degree that you see that, you are seeing exploratory testing.

Why Scripted Testing is Not for Novices

…Unless you want bad testing.

Claire Moss writes:

I am surprised that you say that scripted testing is harder for novice
testers. I would have expected that having so much structure around
the tests would make getting into testing easier for someone with less
experience and that the scripted instructions would make up for a lack
of discipline on the part of the tester.

Structure != “being told what to do”
First, you are misusing the word “structure.” All testing is structured. If what you mean by structure is “externally imposed structure” then say that. But even if you are not aware of a structure in your testing, it is there. When I tell a novice tester to test, and don’t tell him how to test, he will be dominated by certain structures he is largely unaware of– or if aware he cannot verbalize or control them much. For instance: the user interface look and feel is a guiding structure for novice testers. They test what they see.

Cognitive science offer plenty of ideas and insights about the structures that guide our thinking and behavior. See the book Predictably Irrational by Dan Ariely for more on this.

Scripted testing always has at least two distinct parts: test design and test execution. They must be considered independently.

Scripted test execution is quite a bit more difficult than exploratory testing, unless you are assuming that the tester following the script has exactly the same knowledge and skill as the test designer (even then it is a qualitatively different sort of cognitive process than designing). An exploratory tester is following (indeed forming as he goes) his own intentions and ideas. But, a scripted tester, to do well, must apprehend the intent of the one who wrote the script. Moreover, the scripted tester must go beyond the stated intent and honor the tacit intent, as well– otherwise it’s just shallow, bad testing.

Try using a script to guide a 10 year-old to drive a car safely on a busy city street. I don’t believe it can be done. You can’t overcome lack of basic skills with written instructions.

And sure, yeah, there is also the discipline issue, but that’s a minor thing, compared to the other things.

As for scripted test design, that also is a special skill. I can ask my son to put together a computer. He knows how to do that. But if I were to ask him for a comprehensive step-by-step set of instructions to allow me to do it, I doubt the result would help me much. Writing a script requires patience, judgment, and lots of empathy for the person who will execute it. He doesn’t yet have those qualities.

Most people don’t like to write. They aren’t motivated. Now give them a task that requires excellent writing. Bad work generally results.

Both on the design side and the execution side, scripted testing done adequately is harder than exploratory testing done adequately. It’s hard to separate an integrated cognitive activity into two pieces and still make it work.

The reason managers assume it’s simpler and easier is that they have low standards for the quality of testing and yet a strong desire for the appearances of order and productivity.

When I am training a new tester, I begin with highly exploratory testing. Eventually, I will introduce elements of scripting. All skilled testers must feel comfortable with scripted testing, for those rare times when it’s quite important.


1. Start browser

2. Go to CNN.com

3. Test CNN.com and report any problems you find.

This looks like a script, and it is sort of a script, but the interesting details of the testing are left unspecified. One of the elements of good test scripting is to match the instructions to the level of the tester as well as to the design goal of the test. In this case, no design goal is apparent.

This script does not necessarily represent bad testing– because it doesn’t represent any testing whatsoever.

1. Open Notepad

2. Type “hello”

3. Verify that “hello” appears on the screen.

This script has the opposite problem. It specifies what is completely unnecessary to specify. If the tester follows this script, he is probably dumbing himself down. There may be some real good reason for these steps, but again, the design goal is not apparent. The tester’s mind is therefore not being effectively engaged. Congratulations, designer, you’ve managed to treat a sophisticated miracle of human procreation, gestation, mothering, socializing, educating, etc. as if he were the equivalent of an animated poking stick. That’s like buying an iPad, then using it as a serving tray for a platter of cheese.

Exploratory Testing is not “Experienced-Based” Testing

Prabhat Nayak is yet another thinking tester recently hired by the rising Indian testing powerhouse, Moolya. Speaking of the ISTQB syllabus, he writes:

One such disagreement of mine is they have put “Exploratory Testing” on purely experienced based testing. James, correct me if I have got ET wrong (and I am always ready to be corrected if I have misunderstood something), a novice tester who has got great cognizance and sapience and can explore things better, can think of different ways the product may fail to perform per requirement can always do a great job in ET than a 5 years experienced tester who has only learned to execute a set of test cases. That is probably one of the beauties of ET. There is of course, always an advantage of having some experience but that alone doesn’t suffice ET to be put under experienced based testing.

You are quite correct Prabhat. Thank you for pointing this out.

The shadowy cabal known as the ISTQB insulates itself from debate and criticism. They make their decisions in secret (under NDA, if you can believe it!) and they don’t invite my opinion, nor anyone’s opinion who has made a dedicated study of exploratory testing. That alone would be a good reason to dismiss whatever they do or claim.

But this case is an especially sad example of incompetent analysis. Let me break it down:

What does “experience-based” mean?

Usually when people in the technical world speak of something as “x-based” they generally mean that it is “organized according to a model of x” or perhaps “dominated by a consideration of x.” The “x”, whatever it is, plays a special role in the method compared to its role in some other “normal” or “typical” method.

What is a normal or typical method of software testing? I’m not aware the the ISTQB explicitly takes a position on that. But by calling ET an experience-based technique, they imply that no other test technique involves the application of experience to a comparable degree. If they have intended that implication– that would be a claim both remarkable and absurd. Why should any test technique not benefit from experience? Do they think that a novice tester and an experienced tester would choose the exact same tests when practicing other test techniques? Do they think there is no value to experience except when using ET? What research have they done to substantiate this opinion? I bet none.

If they have not intended this implication, then by calling ET experience-based it seems to me they are merely making impressive sounds for the sake of it. They might as well have called ET “breathing-based” on the theory that testers will have to breathe while testing, too.

Ah, but maybe there is another interpretation. They may have called ET “experienced-based” not to imply that ET is any more experience-based than other techniques, but rather as a warning that expresses their belief that the ONLY way ET can be valuable is through the personal heroism and mastery of the individual tester. In other words, what they meant to say was that ET is “personal excellence-based” testing, rather than testing whose value derives from an explicit algorithm that is objective and independent of the tester himself.

I suspect that what’s really going on, here: They think the other techniques are concrete and scientific, whereas ET is somehow mystical and perhaps based on the same sort of dodgy magic that you find in Narnia or MiddleEarth. They say “experience-based” to refer to a dark and scary forest that some enter but none ever return therefrom… They say “experienced-based” because they have no understanding of any other basis that ET can possibly have!

Why would it be difficult for Factory School testing thinkers (of which ISTQB is a product) to understand the basis of ET?

It’s difficult for them because Factory School people, by the force of their creed, seek to minimize the role of humanness in any technical activity. They are radical mechanizers. They are looking for algorithms instead of heuristics. They want to focus on artifacts, not thoughts or feelings or activities. They need to deny the role and value of tacit knowledge and skill. Their theory of learning was state of the art in the 18th century: memorization and mimicry. Then, when they encounter ET, they look for something to memorize or mimic, and find nothing.

Those of us who study ET, when we try to share it, talk a lot about cognitive science, epistemology, and modern learning theory. We talk about the importance of practice. This sounds to the Factory Schoolers like incomprehensible new agey incantations in High Elvish. They suspect we are being deliberately obscure just to keep our clients confused and intimidated.

This is also what makes them want to call ET a technique, rather than an approach. I have, since the late nineties, characterized exploratory testing as an approach that applies to any technique. It is a mindset and set of behaviors that occur, to some degree, in ALL testing. To say “Let’s use ET, now” is technically as incoherent as saying “Let’s use knowledge, now.” You are always using knowledge, to some degree, in any work that you do. “Knowledge” is not a technique that you sometimes deploy. However, knowledge plays more a role in some situations and less a role in others. Knowledge is not always and equally applicable, nor is it uniformly applied even when applicable.

For the Factory Schoolers to admit that ET is endemic to all testing, to some degree, would force them to admit that their ignorance of ET is largely ignorance of testing itself! They cannot allow themselves to do that. They have invested everything in the claim that they understand testing.  No, we will have to wait until those very proud and desperately self-inflated personalities retire, dry up, and blow away. The salvation of our craft will come from recruiting smart young testers into a better way of thinking about things like ET. The brain drain will eventually cause the Factory School culture to sink into the sea like a very boring version of Altantis.

Bottom Line: Most testing benefits from experience, but no special experience is necessary to do ET

Exploratory testing is not a technique, so it doesn’t need to be categorized alongside techniques. However, a more appropriate way to characterize ET, if you want to charactize it in some way, is to call it self-managed and self-structured (as opposed to externally managed and externally structured). It is testing wherein the design part of the process and the execution part of the process are parallel and interactive.

You know what else is self-managed and self-structured? Learning how to walk and talk. Does anyone suggest that only “experienced people” should be allowed to do that?

Behavior-Driven Development vs. Testing

The difference between Behavior-Driven Development and testing:

This is a BDD scenario (from Dan North, a man I respect and admire):

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then ensure the account is debited
And ensure cash is dispensed
And ensure the card is returned

This is that BDD scenario turned into testing:

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then check that the account is debited
And check that cash is dispensed
And check that the card is returned
And check that nothing happens that shouldn’t happen and everything else happens that should happen for all variations of this scenario and all possible states of the ATM and all possible states of the customer’s account and all possible states of the rest of the database and all possible states of the system as a whole, and anything happening in the cloud that should not matter but might matter.

Do I need to spell it out for you more explicitly? This check is impossible to perform. To get close to it, though, we need human testers. Their sapience turns this impossible check into plausible testing. Testing is a quest within a vast, complex, changing space. We seek bugs. It is not the process of  demonstrating that the product CAN work, but exploring if it WILL.

I think Dan understands this. I sometimes worry about other people who promote tools like Cucumber or jBehave.

I’m not opposed to such tools (although I continue to suspect that Cucumber is an elaborate ploy to spend a lot of time on things that don’t matter at all) but in the face of them we must keep a clear head about what testing is.

A Nice Quote Against Confirmatory Testing

Most of the technology of “confirmatory” non-qualitative research in both the social and natural sciences is aimed at preventing discovery. When confirmatory research goes smoothly, everything comes out precisely as expected. Received theory is supported by one more example of its usefulness, and requires no change. As in everyday social life, confirmation is exactly the absence of insight.  In science, as in life, dramatic new discoveries must almost by definition be accidental (“serendipitous”). Indeed, they occur only in consequence of some mistake.

Kirk, Jerome, and Miller, Marc L., Reliability and Validity in Qualitative Research (Qualitative Research Methods). Sage Publications, Inc, Thousand Oaks, CA, 1985.

Viva exploratory methods in science! Viva exploratory methods in testing! Viva testers who study philosophy and the social sciences!

(Thank you Michael Bolton for finding this quote.)

Who says ET is good for Medical Devices? The FDA!

In a new guidance document discussing the clinical testing of medical devices, the FDA includes a long section about the value of exploratory testing:

The Importance of Exploratory Studies in Pivotal Study

Medical devices often undergo design improvement during development, with evolution and refinement during lifecycles extending from early research through investigational use, initial marketing of the approved or cleared product, and on to later approved or cleared commercial device versions.

For new medical devices, as well as for significant changes to marketed devices, clinical development is marked by the following three stages: the exploratory (first-in-human, feasibility) stage, the pivotal stage (determines the safety and effectiveness of the device), and the postmarket stage (design improvement, better understanding of device safety and effectiveness and development of new intended uses). While these stages can be distinguished, it is important to point out that device development can be an ongoing, iterative process, requiring additional exploratory and pivotal studies as new information is gained and new intended uses are developed. Insights obtained late in development (e.g., from a pivotal study) can raise the need for additional studies, including clinical or non-clinical.

This section focuses on the importance of the exploratory work (in non-clinical and clinical studies) in developing a pivotal study design plan. Non-clinical testing (e.g., bench, cadaver, or animal) can often lead to an understanding of the mechanism of action and can provide basic safety information for those devices that may pose a risk to subjects. The exploratory stage of clinical device development (first-in-human and feasibility studies) is intended to allow for any iterative improvement of the design of the device, advance the understanding of how the device works and its safety, and to set the stage for the pivotal study.

Thorough and complete evaluation of the device during the exploratory stage results in a better understanding of the device and how it is expected to perform. This understanding can help to confirm that the intended use of the device will be aligned with sponsor expectations, and can help with the selection of an appropriate pivotal study design. A robust exploratory stage should also bring the device as close as possible to the form that will be used both in the pivotal trial and in the commercial market. This reduces the likelihood that the pivotal study will need to be altered due to unexpected results, which is an important consideration, since altering an ongoing pivotal study can increase cost, time, and patient resources, and might invalidate the study or lead to its abandonment.

For diagnostic devices, analytical validation of the device to establish performance characteristics such as analytical specificity, precision (repeatability/reproducibility), and limit of detection are often part of the exploratory stage. In addition, for such devices, the exploratory stage may be used to develop an algorithm, determine the threshold(s) for clinical decisions, or develop the version of the device to be used in the clinical study. For both in vivo and in vitro diagnostic devices, results from early clinical studies may prompt device modifications and thus necessitate additional small studies in humans or with specimens from humans.

Exploratory studies may continue even as the pivotal stage of clinical device development gets underway. For example, FDA may require continued animal testing of implanted devices at 6 months, 2 years and 3 years after implant. While the pivotal study might be allowed to begin after the six month data are available, additional data may also need to be collected. For example, additional animal testing might be required if pediatric use is intended. For in vitro diagnostic devices, it is not uncommon for stability testing of the device (e.g., for shelf life) to continue while (or even after) conducting the pivotal study.

While the pivotal stage is generally the definitive stage during which valid scientific evidence is gathered to support the primary safety and effectiveness evaluation of the medical device for its intended use, the exploratory stage should be used to finalize the device design, or the appropriate endpoints for the pivotal stage. This is to ensure that the investigational device is standardized as described in 21 CFR 860.7(f)(2), which states:

“To insure the reliability of the results of an investigation, a well-controlled investigation shall involve the use of a test device that is standardized in its composition or design and performance.”

This is what I’ve been arguing for a couple of years, now. If you want to test a medical device very well, then you have to test it in an exploratory way. This prepares the way for what the FDA here calls the “pivotal study”, which in software terms is basically a scripted demonstration of the product.

Yes, the FDA says, earlier in this guidance document, that it is intended to apply to clinical studies, not necessarily bench testing. But look at the reasoning: this exact reasoning does apply to software development. You might even say it is advocating an agile approach to product design.

Technique: Paired Exploratory Survey

I named a technique the other day. It’s another one of those things I’ve been doing for a while, but only now has come crisply into focus as a distinct heuristic of testing: the Paired Exploratory Survey (PES).

Definition: A paired exploratory survey is a process whereby two testers confront one product at the same time for the purpose of learning a product, preparing for formal testing, and/or characterizing its quality as rapidly as possible, whereby one tester (the “driver”) is responsible for open-ended play and all direct interaction with the product while the other tester (the “navigator” or “leader”) acts as documentarian, mission-minder, and co-test-designer.

Here’s a story about it..

Last week, I was on my way home from the CAST conference with my 17 year-old son Oliver when a client called me with an emergency assignment: “Get down to L.A. and test our product right away!” I didn’t have time to take Oliver home, so we bought some clean clothes, had Oliver’s ID flown in from Orcas Island by bush plane, and headed to SeaTac.

(I love emergencies. They’re exciting. It’s like James Bond, except that my Miss Moneypenny is named Lenore. I got to the airport and two first class tickets were waiting for us. However, a gentle note to potential clients: making me run around like a secret agent can be expensive.)

This is the first time I had Oliver with me while doing professional testing, so I decided to make use of him as an unpaid intern. Basically, this is the situation any tester is in when he employs a non-tester, such as a domain expert, as a partner. In such situations, the professional tester must assure that the non-tester is strongly engaged and having good fun. That’s why I like to make that “honorary tester” drive. I get them twiddling the knobs, punching the buttons, and looking for trouble. Then they’ll say “testing is fun” and help me the next time I ask.

(Oliver is a very experienced video gamer. He has played all the major offline games since he was 3 or 4, and the online ones for the last 5 years. I know from playing with him what this means: he can be relentless once he decides to figure out how a system works. I was hoping his gamer instinct would kick in for this, but I was also prepared for him to get bored and wander off. You shouldn’t set your expectations too high with teenagers.)

The client gave us a briefing about how the device is used. I have already studied up on this, but it’s new for Oliver. The scene reminded me of that part in the movie Inception where Leonardo DiCaprio explains the dynamics of dream invasion.We have a workstation that controls a power unit and connects to a probe which is connected to a pump. It all looks Frankenstein-y.

(I can’t tell you much about the device, in this case. Let’s just say it zaps the patient with “healing energy” and has nothing whatsoever to do with weaponized subconscious projections.)

I set up a camera so that all the testing would be filmed.

(Video is becoming an indispensable tool in my work. My traveling kit consists of a little solid state Sony cam that plugs into the wall so I don’t have to worry about battery life, a micro-tripod so I can pose the camera at any desired angle, and a terabyte hard drive which stores all the work.)

Then, I began the testing just to demonstrate to Oliver the sort of thing I wanted to do. We would begin with a sanity check of the major functions and flows, while letting ourselves deviate as needed to pursue follow-up testing on anything we find that was anomalous. After about 15 minutes, Oliver became the driver, I became the navigator, and that’s how we worked for the next 6 or 7 hours.

Oliver quickly distinguished himself as as a remarkable observer. He noticed flickers on the screen, small changes over time, quirks in the sound the device made. He had a good memory for what he had just been doing, and quickly constructed a mental model of the product.

From the transcript:

“What?!…That could be a problem…check this out…dad…look, right now…settings, unclickable…start…suddenly clickable, during operation…it’s possible to switch its entire mode to something else, when it should be locked!”

and later

“alright… you can’t see the error message every single time because it’s corrupted… but the error message… the error message is exactly what we were seeing before with the sequence bug… the error message comes up for a brief moment and then BOOM, it’s all gone… it’s like… it makes the bug we found with the sequence thing (that just makes it freeze) destructive and takes down the whole system… actually I think that’s really interesting. It’s like this bug is slightly more evolved…”

(You have to read this while imagining the voice of a triumphant teenager who’s just found an easter egg in HALO3. From his point of view, he’s finding ways to “beat the boss of the level.”)

At the start, I frequently took control of the process in order to reproduce the bugs, but as I saw Oliver’s natural enthusiasm and inquisitiveness blossom, I gave him room to run. I explained bug isolation and bug risk and challenged him to find the simplest, yet most compelling form of each problem he uncovered.

Meanwhile, I worked on my notes and noted time stamps of interesting events. As we moved along, I would redirect him occasionally to collect more evidence regarding specific aspects of the evolving testing story.

How is this different from ordinary paired testing?

Paired testing simply means two testers testing one product on the same system at the same time. A PES is a kind of paired testing.

Exploratory testing means an approach to testing whereby learning, test design, and test execution are mutually supportive activities that run in parallel. A PES is exploratory testing, too.

A “survey session,” in the lingo of Session-Based Test Management, is a test session devoted to learning a product and characterizing the general risks and challenges of testing it, while at the same time noticing problems. A survey session contrasts with analysis sessions, deep coverage sessions, and closure sessions, among possible others that aren’t yet identified as a category. A PES is a survey test session.

It’s all of those things, plus one more thing: the senior tester is the one who takes the notes and makes sure that the right areas are touched and right general information comes out. The senior tester is in charge of developing a compelling testing story. The senior tester does that so that his partner can get more engaged in the hunt for vital information. This “hunt” is a kind of play. A delicious dance of curiosity and analysis.

There are lots of ways to do paired testing. A PES is one interesting way.

Hey, I’ve done this before!

While testing with my son, I flashed back to 1997, in one of my first court cases, in which I worked with my brother Jon (who is now a director of testing at eBay, but was then a cub tester). Our job was to apply my Good Enough model of quality analysis to a specific product, and I let Jon drive that time, too. I didn’t think to give a name to that process, at the time, other than ET. The concept of paired testing hadn’t even been named in our community until Cem Kaner suggested that we experiment with it at the first Workshop on Heuristic and Exploratory Techniques in 2001.

I have seen different flavors of a PES, too. I once saw a test lead who stepped to the keyboard specifically because he wanted his intern to design the tests. He felt that that letting the kid lean back in his chair and talk ideas to the ceiling (as he was doing when I walked in) would be the best way to harness certain technical knowledge the intern had which the test lead did not have. In this way, the intern was actually the driver.

I’m feeling good about the name Paired Exploratory Survey. I think it may have legs. Time will tell.

Here’s the report I filed with the client (all specific details changed, but you can see what the report looks like, anyway).

What Testers Find

While testing at eBay, recently, it occurred to me that we need a deeper account of what testers find. It’s not just bugs. Here’s my experimental list:

Testers find bugs. In other words, we look for anything that threatens the value of the product. (This ties directly into Jerry Weinberg’s famous dictum that quality means value to some person, at some time, who matters.) Some people like to say that testers find “defects.” That is also true, but I avoid that word. It tends to make programmers and lawyers upset, and I have trouble enough. Example: a list of countries in a form is missing “France.”

Testers also find risks. We notice situations that seem likely to produce bugs. We notice behaviors of the product that look likely to go wrong in important ways, even if we haven’t yet seen that happen. Example: A web form is using a deprecated HTML tag, which works fine in current browsers, but may stop working in future browsers. This suggests that we ought to do a validation scan. Maybe there are more things like that on the site.

Testers find issues. An issue is something that threatens the value of the project, rather than the product itself. Example: There’s a lot of real-time content on eBay. Ads and such. Am I supposed to test that stuff? How should I test it?

Testers find testability problems. It’s a kind of issue, but it’s worth highlighting. Testers should point out aspects of the product that make it hard to observe and hard to control. There may be small things that the developers can do (adding scriptable interfaces and log files, for instance) that can improve testability. And if you don’t ask for testability, it’s your fault that you don’t get it. Example: You’re staring at a readout that changes five times a second, wondering how to tell if it’s presenting accurate figures. For that, you need a log file.

Testers find artifacts, too. Also a kind of issue, but also worth highlighting, we sometimes see things that look like problems, but turn out to be manifestations of how we happen to be testing. Example: I’m getting certificate errors on the site, but it turns out to be an interaction between the site and Burp Proxy, which is my recording tool.

Testers find curios. We notice surprising and interesting things about our products that don’t threaten the value of the product, but may suggest hidden features or interesting ways of using the product. Some of them may represent features that the programmers themselved don’t know about. They may also suggest new ways of testing. Example: Hmm. I notice that a lot of complex content is stored in Iframes on eBay. Maybe I can do a scan for Iframes and systematically discover important scripts that I need to test.

Maybe there are other things you think should be added to this list. The point is that the outcomes of testing can be quite diverse. Keep your eyes and your mind open.

This is What We Do

In the Context-Driven Testing community, the testing craft is a living, growing thing. This dialog, led by my partner in Rapid Testing, Michael Bolton, is a prime example of the life among us. Read the PDF that Michael refers to, and what will you see? You see many ideas proposed and discarded. You see definitions being made, and remade. You see people struggling to make sense of subtle, yet important distinctions.

In my world, the development of testing skill goes hand-in-hand with the development of our rhetoric of describing testing. The development of personal skill is linked to the development of social skill. This is why we smirk and roll our eyes when people come to us looking for templates and other pre-fabricated answers to what they believe are simple questions. My reaction to many who come to me is “You don’t need to learn the definition of term ‘test case’. You don’t need me to tell you ‘how to create a test plan’. What you need is to learn how to test. You need to struggle with imponderables; sit with them; turn them over in your mind. You need practice, and you need to talk through your practice with other testers.”
Michael’s dialog reminds me of the book Proofs and Refutations, by Imre Lakatos, which studies the exploratory and dialectical nature of mathematics by also using dialog.

Introducing Thread-Based Test Management

Most of the testing world is managed around artifacts: test cases, test documents, bug reports. If you look at any “test management” tool, you’ll see that the artifact-based approach permeates it. “Test” for many people is a noun.

For me test is a verb. Testing is something that I do, not so much something that I create. Testing is the act of exploration of an unknown territory. It is casting questions, like Molotov cocktails, into the darkness, where they splatter and burst into bright revealing fire.

How to Manage Such a Process?

My brother Jon and I created a way to control highly exploratory testing 10 years ago, called session-based test management (SBTM). I recently returned from an intense testing project in Israel, where I used SBTM. But I also experimented with a new idea: thread-based test management (TTM).

Like many of my new ideas, it’s not really new. It’s the christening (with words) and sharpening (with analysis) of something many of us already do. The idea is this: organize management around threads of activity rather than test sessions or artifacts.

Thread-based testing is a generalized form of session-based testing, in that sessions are a form of thread, but a thread is not necessarily a session. In SBTM, you test in uninterrupted blocks of time that each have a charter. A charter is a mission for that session; a light sort of commitment, or contract. A thread, on the other hand, may be interrupted, it may go on and on indefinitely, and does not imply a commitment. Session-based testing can be seen as an extension of thread-based testing for especially high accountability and more orderly situations.

I define a thread as a set of one or more activities intended to solve a problem or achieve an objective. You could think of a thread as a very small project within a project.

Why Thread-Based Test Management?

Because it can work under even the most chaotic and difficult conditions. The only formalism required for TBTM is a list of threads. I use this form of test management when I am dropped into a project with as little a day or two to get it done.

What Does Thread-Based Test Management Looks Like?

It’s simple. Thread-based test management looks like a todo list, except that we organize the todo items into an outline that matches the structure of the testing process. Here’s a mocked-up example:

Test Facilities

  • Power meter calibration method
  • Backup test jig validation
  • Create standard test images

Test Strategy

  • Accuracy Testing
    • Sampling strategy
    • Preliminary-testing
    • Log file analysis program
  • Transaction Flow Testing
  • Essential Performance Testing
  • Safety Testing
    • warnings and errors FRS review
    • tool for forcing errors
  • Compliance Testing
  • Test Protocol V1.0 doc.

Test Management

  • Change protocol definition
  • Build protocol definition
  • Test cycle protocol definition
  • Bug reporting protocol definition
  • Bug triage
  • Fix verifications

This outline describes the high level threads that comprise the test project. I typically use a mind-mapping program like MindManager to organize and present them.

So, you should be thinking, “Is that it? Todo lists?” right about now. Well, no. That’s not it. But that’s one face of it.

What Else Does Thread-Based Test Management Look Like?

It looks like testers gathered around a todo list, talking about what they are going to work on that afternoon. Then they split up and go to work. Several times day they might come together like that. If the team is not co-located, then this meeting is done over instant messaging, email, or perhaps through a wiki.

Is That All it Looks Like?

Well, there is also the status report. Whether written or spoken, the thread-based test management version of a status report lists the threads, who is working on the threads, and the outlook for each thread. It typically also includes an issues list.

Other documentation may be produced, of course. TBTM doesn’t tell you what documents to create. It simply tells you that threads are the organizing principle we use for managing the project.

Where Do Threads Come From?

Threads are first spawned from our model of the testing problem. The Satisfice Heuristic Test Strategy Model is an example of such a model. By working through those lists, we get an idea of the kinds of testing we might want to do: those are the first of the threads. After that, threads might be created in many ways, including splitting off of existing threads as we gain a deeper understanding of what testing needs to be done. Of course, in an Agile environment, each user story kicks off a new testing thread.

Which Threads Do We Work On?

Think priority and progress. We might frequently drop threads, switch threads, and pick them up again. In general, we work on the highest priority threads, but we also work on lower priority threads many times, when we see the possibility for quick and inexpensive progress. If I’m trying to finish a sanity check on the new build, I might interrupt that to discuss the status of a particular known bug if the developer happens to wander by.

Major ongoing threads often become attached to specific people. For instance “client testing” or “performance testing” often become full-time jobs. Testing itself, after all, can be thought of as a thread so challenging to do well, and so different from programming, that most companies have seen fit to hire dedicated testers.

How Do Threads End?

A thread ends either in a cut or knot. Cutting a thread means to cancel that task. A knot, however, is a milestone; an achievement of some kind. This is exactly the meaning of the phrase “tying up the loose ends” and marks either the end of the thread (or group of threads) or a good place to drop it for a while.

How Do We Estimate Work?

In thread-based test management, there is no special provision or method for estimating work, except that this is done on a thread-by-thread basis. Session-based test management may be overlaid onto TBTM in order to estimate work in terms of sessions.

How Do We Evaluate Progress?

In thread-based test management, there is no special provision or method for evaluating progress, either, except that this is done on a thread-by-thread basis, and status reports may be provided frequently, perhaps at the end of each day. Session-based test management is also helpful for that.

So What?

This form of management is actually quite common. But, to my knowledge, no one has yet named and codified it. Without a convenient way to talk about it, we have a hard time explaining and justifying it. Then when the “process improvement” freaks come along, they act like there’s no management happening at all. This form of management has been “illegible” up to now (meaning that it’s there but no one notices it) and my brother and I are going to push to make it fully legible and respectable in the testing arena.

From now on, when asked about my approach to test management, I can say “I practice Rapid Testing methodology, which I track in either a thread-based or session-based manner, depending on the stage of the project, and in a risk-based manner at all times.”

How is TBTM Any Different From Using a TO-DO List?

Michel Kraaij questions the substance of TBTM by wondering how it’s different from the age-old idea of a “to-do” list? See his post here.

This is a good question. Yes, TBTM is different than just using a to-do list, but even so, I don’t think I’ve ever read an article about to-do list based test management (TDBTM?). Most textbooks focus on artifacts, not the activity of testing. Thread-based test management is trying to capture the essence of managing with to-do lists, plus some other things in addition to that.

The main additional element, beyond just making a to-do list, is that a traditional to-do list contains items that can be “done”, whereas many threads might not ever be “done.” They might be cut (abandoned) or knotted (temporarily parked at some level of completion). Some threads maybe tied up with a bow and “done” like a normal task, but not the main ones that I’m thinking of. As I practice testing, for instance, I’m rarely “done” with test strategy. I tinker with the test strategy all the way through the project. That’s why it makes sense to call it a thread.

Once again: Thread-based management is not focused on getting things “done.” In this way it is different from KanBan, Scrum, ToDo lists, Session-based test management,  etc., all of which are big into workflow management of definite units of work.

Another thing to recognize is that the main concern of TBTM is how to know what to put on your thread list. The answer to that invokes the entire framework of Rapid Software testing. So, yeah, it’s more than having an outline of threads, which does look very much like a to-do list– it’s the activity (and skills) of making the list and managing it. If you want to talk about to-do list based test management, then you would have to invent that lore as well. You couldn’t just say “make a to-do list” and claim to have communicated the methodology.

[You can find Jonathan’s take on TBTM here.]

[I credit Sagi Krupetski, the test lead on my recent project, for helping me get this idea. His clockwork status reporting and regular morning question “Where are we on the project and what do you think you need to work on today?” caused me to see the thread structure of our project clearly for the first time. He’s back on the market now (Chicago area), in case you need a great tester or test manager.]