Re-Inventing Testing: What is Integration Testing? (Part 1)

(Thank you, Anne-Marie Charrett, for reviewing my work and helping with this post.)

One of the reasons I obsessively coach other testers is that they help me test my own expertise. Here is a particularly nice case of that, while working with a particularly bright and resilient student, Anita Gujrathi, (whose full name I am using here with her permission).

The topic was integration testing. I chose it from a list of skills Anita made for herself. It stood out because integration testing is one of those labels that everyone uses, yet few can define. Part of what I do with testers is help them become aware of things that they might think they know, yet may have only a vague intuition about. Once we identify those things, we can study and deepen that knowledge together.

Here is the start of our conversation (with minor edits for grammar and punctuation, and commentary in brackets):

What do you mean by integration testing?
[As I ask her this question I am simultaneously asking myself the same question. This is part of a process known as transpection. Also, I am not looking for “one right answer” but rather am exploring and exercising her thought processes, which is called the Socratic Method.]

Integration test is the test conducted when we are integrating two or more systems.
[This is not a wrong answer, but it is shallow, so I will press for more details.

By shallow, I mean that leaves out a lot of detail and nuances. A shallow answer may be fine in a lot of situations, but in coaching it is a black box that I must open.]

What do you mean by integrated?

That means kind of joining two systems such that they give and take data.
[This is a good answer but again it is shallow. She said “kind of” which I take as a signal that she may be not quite sure what words to use. I am wondering if she understands the technical aspects of how components are joined together during integration. For instance, when two systems share an operating space, they may have conflicting dependencies which may be discovered only in certain situations. I want to push for a more detailed answer in order to see what she knows about that sort of thing.]

What does it mean to join two systems?
[This process is called “driving to detail” or “drilling down”. I just keep asking for more depth in the answer by picking key ideas and asking what they mean. Sometimes I do this by asking for an example.]

For example, there is an application called WorldMate which processes the itineraries of the travellers and generates an XML file, and there is another application which creates the trip in its own format to track the travellers using that XML.
[Students will frequently give me an example when they don’t know how to explain a concept. They are usually hoping I will “get it” and thus release them from having to explain anything more. Examples are helpful, of course, but I’m not going to let her off the hook. I want to know how well she understands the concept of joining systems.

The interesting thing about this example is that it illustrates a weak form of integration– so weak that if she doesn’t understand the concept of integration well enough, I might be able to convince her that no integration is illustrated here.

What makes her example a case of weak integration is that the only point of contact between the two programs is a file that uses a standardized format. No other dependencies or mode of interaction is mentioned. This is exactly what designers do when they want to minimize interaction between components and eliminate risks due to integration.]

I still don’t know what it means to join two systems.
[This is because an example is not an explanation, and can never be an explanation. If someone asks what a flower is and you hold up a rose, they still know nothing about what a flower is, because you could hold up a rose in response to a hundred other such questions: what is a plant? what is a living thing? what is botany? what is a cell? what is red? what is carbon? what is a proton? what is your favorite thing? what is advertising? what is danger? Each time the rose is an answer to some specific aspect of the question, but not all aspects, but how do you know what the example of a rose actually refers to? Without an explanation, you are just guessing.]

I am coming to that. So, here we are joining WorldMate (which is third-party application) to my product so that when a traveller books a ticket from a service and receives the itinerary confirmation email, it then goes to WorldMate which generates XML to give it to my product. Thus, we have joined or created the communication between WorldMate and my application.
[It’s nice that Anita asserts herself, here. She sounds confident.

What she refers to is indeed communication, although not a very interesting form of communication in the context of integration risk. It’s not the sort of communication that necessarily requires integration testing, because the whole point of using XML structures is to cleanly separate two systems so that you don’t have to do anything special or difficult to integrate them.]

I still don’t see the answer to my question. I could just as easily say the two systems are not joined. But rather independent. What does join really mean?
[I am pretending not to see the answer in order to pressure her for more clarity. I won’t use this tactic as a coach unless I feel that the student is reasonably confident.]

Okay, basically when I say join I mean that we are creating the communication between the two systems
[This is the beginning of a good answer, but her example shows only a weak sort of communication.]

I don’t see any communication here. One creates an XML, the other reads it. Neither knows about the other.
[It was wrong of me to say I don’t see any communication. I should have said it was simplistic communication. What I was trying to do is provoke her to argue with me, but I regret saying it so strongly.]

It is a one-way communication.
[I agree it is one-way. That’s part of why I say it is a weak form of integration.]

Is Google integrated with Bing?
[One major tactic of the Socratic method is to find examples that seem to fit the student’s idea and yet refute what they were trying to prove. I am trying to test what Anita thinks is the difference between two things that are integrated and two things that are simply “nearby.”]

Ah no?

According to you, they are! Because I can Google something, then I can take the output and feed it to Bing, and Bing will do a search on that. I can Google for a business name and then paste the name into Bing and learn about the business. The example you gave is just an example of two independent programs that happen to deal with the same file.


So, if I test the two independent programs, haven’t I done all the testing that needs to be done? How is integration testing anything more or different or special?

At this point, Anita seems confused. This would be a good time to switch into lecture mode and help her get clarity. Or I could send her away to research the matter. But what I realized in that moment is that I was not satisfied with my own ideas about integration. When I asked myself “what would I say if I were her?” my answers sounded not much deeper than hers. I decided I needed to do some offline thinking about integration testing.

Lots of things in out world are slightly integrated. Some things are very integrated. This seems intuitively obvious, but what exactly is that difference? I’ve thought it through and I have answers now. Before I blog about it, what do you think?

TestInsane’s Mindmaps Are Crazy Cool

Most testing companies offer nothing to the community or the field of testing. They all seem to say they hire only the best experts, but only a very few of them are willing to back that up with evidence. Testing companies, by and large, are all the same, and the sameness is one of mediocrity and mendacity.

But there are a few exceptions. One of them is TestInsane, founded by ex-Moolyan co-founder Santosh Tuppad. This is a company to watch.

The wonderful thing about TestInsane is their mindmaps. More than 100 of them. What lovelies! Check them out. They are a fantastic public contribution! Each mindmap tackles some testing-related subject and lists many useful ideas that will help you test in that area.

I am working on a guide to bug reporting, and I found three maps on their site that are helping me cover all the issues that matter. Thank you TestInsane!

I challenge other testing companies to contribute to the craft, as well.

Note: Santosh offered me money to help promote his company. That is a reasonable request, but I don’t do those kinds of deals. If I did that even once I would lose vital credibility. I tell everyone the same thing: I am happy to work for you if you pay me, but I cannot promote you unless I believe in you, and if I believe in you I will promote you for free. As of this writing, I have not done any work for TestInsane, paid or otherwise, but it could happen in the future.

I have done paid work for Moolya, and Per Scholas, both of which I gush about on a regular basis. I believe in those guys. Neither of them pay me to say good things about them, but remember, anyone who works for a company will never say bad things. There are some other testing companies I have worked for that I don’t feel comfortable endorsing, but neither will I complain about them in public (usually… mostly).

Competent People and Conference Keynotes

My colleague and friend Anne-Marie Charrett has a thing about women. A) She is one. B) She feels that not enough of them are speaking at testing conferences. (See also Fiona Charles’ post on this subject.) I support Anne-Marie’s cause partly because I support the woman herself and it would make her happy. This is how humanity works: we are tribal creatures. Don’t deny it, you will just sound silly.

There is another reason I support their cause, though. It’s related to the fact that we people are not only tribal creatures. We are also creatures of myth, story, and principle. Each of us lives inside a story, and we want that story to “win,” whatever that may mean to us. Apart from tribal struggles, there is a larger meta-tribal struggle over what constitutes the “correct” or “good” or “moral” story.

In other words, it isn’t only whom we like that motivates us, but also what seems right. I’m not religious, so I won’t bother to talk about that aspect of things. But in the West, the professional status of women is a big part of the story of good and proper society; about what seems right.

The story I’m living, these days, is about competence. And I think most people speaking at testing conferences are not competent enough. A lot of what’s talked about at testing conferences is the muttering of idiots. By idiot, I mean functionally stupid people: people who choose not to use their minds to find excellent solutions to important problems, but instead speak ritualistically and uncritically about monsters and angels and mathematically invalid metrics and fraudulent standards and other useless or sinister tools that are designed to amaze and confuse the ignorant.

I want to see at least 50% of people speaking at conferences to be competent. That’s my goal. I think it is achievable, but it will take a lot of work. We are up against an entrenched and powerful interest: the promoters-of-ineptness (few of whom realize the damage they do) who run the world and impose themselves on our craft.

Why are there so many idiots and why do they run the world? The roots and dynamics of idiocracy are deep. It’s a good question but I don’t want to go into it here and now.

What I want to say is that Anne-Marie and Fiona, along with some others, can help me, and I can help them. I want to encourage new voices to take a place in the Great Conversation of testing because I do believe there is an under-tapped pool of talent among the women of testing. I am absolutely opposed to quotas or anything that simply forces smaller people with higher voices onto the stage for the sake of it. Instead let’s find and develop talent that leads us into a better future. This is what SpeakEasy is all about.

Maybe, if we can get more women speaking and writing in the craft, we will be able to imagine a world where more than 50% of keynote speakers are not spouting empty quotes from great thinkers and generic hearsay about projects and incoherent terminology and false dichotomies and ungrounded opinions and unworkable heuristics presented in the form of “best practices.”

I am not a feminist. I’m not going to be one. This is why I have work to do. I am not naturally biased in favor of considering women, and even if I were, can I be so sure that I’m not biased in favor of the attractive ones? Or against them? Research suggests no one can be complacent about overcoming our biology and gender identity. So, it’s a struggle. Any honest man will tell you that. And, I must engage that struggle while maintaining my implacable hostility to charlatans and quacks. The story I am living tells me this is what I must do. Also, Anne-Marie has asked for my help.

Here’s my call to action: To bring new beautiful minds forth to stand against mediocrity, we need to make the world a better, friendlier place especially for the women among us. I’m asking all you other non-feminists out there to consider working with me on this.

Exploratory Testing 3.0

[Authors’ note: Others have already made the point we make here: that exploratory testing ought to be called testing. In fact, Michael said that about tests in 2009, and James wrote a blog post in 2010 that seems to say that about testers. Aaron Hodder said it quite directly in 2011, and so did Paul Gerrard. While we have long understood and taught that all testing is exploratory (here’s an example of what James told one student, last year), we have not been ready to make the rhetorical leap away from pushing the term “exploratory testing.” Even now, we are not claiming you should NOT use the term, only that it’s time to begin assuming that testing means exploratory testing, instead of assuming that it means scripted testing that also has exploration in it to some degree.]

[Second author’s note: Some people start reading this with a narrow view of what we mean by the word “script.” We are not referring to text! By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This includes text instructions, but also any form of instructions, or even biases that are not instructions.]

By James Bach and Michael Bolton

In the beginning, there was testing. No one distinguished between exploratory and scripted testing. Jerry Weinberg’s 1961 chapter about testing in his book, Computer Programming Fundamentals, depicted testing as inherently exploratory and expressed caution about formalizing it. He wrote, “It is, of course, difficult to have the machine check how well the program matches the intent of the programmer without giving a great deal of information about that intent. If we had some simple way of presenting that kind of information to the machine for checking, we might just as well have the machine do the coding. Let us not forget that complex logical operations occur through a combination of simple instructions executed by the computer and not by the computer logically deducing or inferring what is desired.”

Jerry understood the division between human work and machine work. But, then the formalizers came and confused everyone. The formalizers—starting officially in 1972 with the publication of the first testing book, Program Test Methods—focused on the forms of testing, rather than its essences. By forms, we mean words, pictures, strings of bits, data files, tables, flowcharts and other explicit forms of modeling. These are things that we can see, read, point to, move from place to place, count, store, retrieve, etc. It is tempting to look at these artifacts and say “Lo! There be testing!” But testing is not in any artifact. Testing, at the intersection of human thought processes and activities, makes use of artifacts. Artifacts of testing without the humans are like state of the art medical clinics without doctors or nurses: at best nearly useless, at worst, a danger to the innocents who try to make use of them.

We don’t blame the innovators. At that time, they were dealing with shiny new conjectures. The sky was their oyster! But formalization and mechanization soon escaped the lab. Reckless talk about “test factories” and poorly designed IEEE standards followed. Soon all “respectable” talk about testing was script-oriented. Informal testing was equated to unprofessional testing. The role of thinking, feeling, communicating humans became displaced.

James joined the fray in 1987 and tried to make sense of all this. He discovered, just by watching testing in progress, that “ad hoc” testing worked well for finding bugs and highly scripted testing did not. (Note: We don’t mean to make this discovery sound easy. It wasn’t. We do mean to say that the non-obvious truths about testing are in evidence all around us, when we put aside folklore and look carefully at how people work each day.) He began writing and speaking about his experiences. A few years into his work as a test manager, mostly while testing compilers and other developer tools, he discovered that Cem Kaner had coined a term—”exploratory testing”—to represent the opposite of scripted testing. In that original passage, just a few pages long, Cem didn’t define the term and barely described it, but he was the first to talk directly about designing tests while performing them.

Thus emerged what we, here, call ET 1.0.

(See The History of Definitions of ET for a chronological guide to our terminology.)

ET 1.0: Rebellion

Testing with and without a script are different experiences. At first, we were mostly drawn to the quality of ideas that emerged from unscripted testing. When we did ET, we found more bugs and better bugs. It just felt like better testing. We hadn’t yet discovered why this was so. Thus, the first iteration of exploratory testing (ET) as rhetoric and theory focused on escaping the straitjacket of the script and making space for that “better testing”. We were facing the attitude that “Ad hoc testing is uncontrolled and unmanageable; something you shouldn’t do.” We were pushing against that idea, and in that context ET was a special activity. So, the crusaders for ET treated it as a technique and advocated using that technique. “Put aside your scripts and look at the product! Interact with it! Find bugs!”

Most of the world still thinks of ET in this way: as a technique and a distinct activity. But we were wrong about characterizing it that way. Doing so, we now realize, marginalizes and misrepresents it. It was okay as a start, but thinking that way leads to a dead end. Many people today, even people who have written books about ET, seem to be happy with that view.

This era of ET 1.0 began to fade in 1995. At that time, there were just a handful of people in the industry actively trying to develop exploratory testing into a discipline, despite the fact that all testers unconsciously or informally pursued it, and always have. For these few people, it was not enough to leave ET in the darkness.

ET 1.5: Explication

Through the late ‘90s, a small community of testers beginning in North America (who eventually grew into the worldwide Context-Driven community, with some jumping over into the Agile testing community) was also struggling with understanding the skills and thought processes that constitute testing work in general. To do that, they pursued two major threads of investigation. One was Jerry Weinberg’s humanist approach to software engineering, combining systems thinking with family psychology. The other was Cem Kaner’s advocacy of cognitive science and Popperian critical rationalism. This work would soon cause us to refactor our notions of scripted and exploratory testing. Why? Because our understanding of the deep structures of testing itself was evolving fast.

When James joined ST Labs in 1995, he was for the first time fully engaged in developing a vision and methodology for software testing. This was when he and Cem began their fifteen-year collaboration. This was when Rapid Software Testing methodology first formed. One of the first big innovations on that path was the introduction of guideword heuristics as one practical way of joining real-time tester thinking with a comprehensive underlying model of the testing process. Lists of test techniques or documentation templates had been around for a long time, but as we developed vocabulary and cognitive models for skilled software testing in general, we started to see exploratory testing in a new light. We began to compare and contrast the important structures of scripted and exploratory testing and the relationships between them, instead of seeing them as activities that merely felt different.

In 1996, James created the first testing class called “Exploratory Testing.”  He had been exposed to design patterns thinking and had tried to incorporate that into the class. He identified testing competencies.

Note: During this period, James distinguished between exploratory and ad hoc testing—a distinction we no longer make. ET is an ad hoc process, in the dictionary sense: ad hoc means “to this; to the purpose”. He was really trying to distinguish between skilled and unskilled testing, and today we know better ways to do that. We now recognize unskilled ad hoc testing as ET, just as unskilled cooking is cooking, and unskilled dancing is dancing. The value of the label “exploratory testing” is simply that it is more descriptive of an activity that is, among other things, ad hoc.

In 1999, James was commissioned to define a formalized process of ET for Microsoft. The idea of a “formal ad hoc process” seemed paradoxical, however, and this set up a conflict which would be resolved via a series of constructive debates between James and Cem. Those debates would lead to we here will call ET 2.0.

There was also progress on making ET more friendly to project management. In 2000, inspired by the work for Microsoft, James and Jon Bach developed “Session-Based Test Management” for a group at Hewlett-Packard. In a sense this was a generalized form of the Microsoft process, with the goal of creating a higher level of accountability around informal exploratory work. SBTM was intended to help defend exploratory work from compulsive formalizers who were used to modeling testing in terms of test cases. In one sense, SBTM was quite successful in helping people to recognize that exploratory work was entirely manageable. SBTM helped to transform attitudes from “don’t do that” to “okay, blocks of ET time are things just like test cases are things.”

By 2000, most of the testing world seemed to have heard something about exploratory testing. We were beginning to make the world safe for better testing.

ET 2.0: Integration

The era of ET 2.0 has been a long one, based on a key insight: the exploratory-scripted continuum. This is a sliding bar on which testing ranges from completely exploratory to completely scripted. All testing work falls somewhere on this scale. Having recognized this, we stopped speaking of exploratory testing as a technique, but rather as an approach that applies to techniques (or as Cem likes to say, a “style” of testing).

We could think of testing that way because, unlike ten years earlier, we now had a rich idea of the skills and elements of testing. It was no longer some “creative and mystical” act that some people are born knowing how to do “intuitively”. We saw testing as involving specific structures, models, and cognitive processes other than exploring, so we felt we could separate exploring from testing in a useful way. Much of what we had called exploratory testing in the early 90’s we now began to call “freestyle exploratory testing.”

By 2006, we settled into a simple definition of ET, simultaneous learning, test design, and test execution. To help push the field forward, James and Cem convened a meeting called the Exploratory Testing Research Summit in January 2006. (The participants were James Bach, Jonathan Bach, Scott Barber, Michael Bolton, Elisabeth Hendrickson, Cem Kaner, Mike Kelly, Jonathan Kohl, James Lyndsay, and Rob Sabourin.) As we prepared for that, we made a disturbing discovery: every single participant in the summit agreed with the definition of ET, but few of us agreed on what the definition actually meant. This is a phenomenon we had no name for at the time, but is now called shallow agreement in the CDT community. To combat shallow agreement and promote better understanding of ET, some of us decided to adopt a more evocative and descriptive definition of it, proposed originally by Cem and later edited by several others: “a style of testing that emphasizes the freedom and responsibility of the individual tester to continually optimize the quality of his work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the course of the project.” Independently of each other, Jon Bach and Michael had suggested the “freedom and responsibility” part to that definition.

And so we had come to a specific and nuanced idea of exploration and its role in testing. Exploration can mean many things: searching a space, being creative, working without a map, doing things no one has done before, confronting complexity, acting spontaneously, etc. With the advent of the continuum concept (which James’ brother Jon actually called the “tester freedom scale”) and the discussions at the ExTRS peer conference, we realized most of those different notions of exploration are already central to testing, in general. What the adjective “exploratory” added, and how it contrasted with “scripted,” was the dimension of agency. In other words: self-directedness.

The full implications of the new definition became clear in the years that followed, and James and Michael taught and consulted in Rapid Software Testing methodology. We now recognize that by “exploratory testing”, we had been trying to refer to rich, competent testing that is self-directed. In other words, in all respects other than agency, skilled exploratory testing is not distinguishable from skilled scripted testing. Only agency matters, not documentation, nor deliberation, nor elapsed time, nor tools, nor conscious intent. You can be doing scripted testing without any scrap of paper nearby (scripted testing does not require that you follow a literal script). You can be doing scripted testing that has not been in any way pre-planned (someone else may be telling you what to do in real-time as they think of ideas). You can be doing scripted testing at a moment’s notice (someone might have just handed you a script, or you might have just developed one yourself). You can be doing scripted testing with or without tools (tools make testing different, but not necessarily more scripted). You can be doing scripted testing even unconsciously (perhaps you feel you are making free choices, but your models and habits have made an invisible prison for you). The essence of scripted testing is that the tester is not in control, but rather is being controlled by some other agent or process. This one simple, vital idea took us years to apprehend!

In those years we worked further on our notions of the special skills of exploratory testing. James and Jon Bach created the Exploratory Skills and Tactics reference sheet to bring specificity and detail to answer the question “what specifically is exploratory about exploratory testing?”

In 2007, another big slow leap was about to happen. It started small: inspired in part by a book called The Shape of Actions, James began distinguishing between processes that required human judgment and wisdom and those which did not. He called them “sapient” vs. “non-sapient.” This represented a new frontier for us: systematic study and development of tacit knowledge.

In 2009, Michael followed that up by distinguishing between testing and checking. Testing cannot be automated, but checking can be completely automated. Checking is embedded within testing. At first, James objected that, since there was already a concept of sapient testing, the distinction was unnecessary. To him, checking was simply non-sapient testing. But after a few years of applying these ideas in our consulting and training, we came to realize (as neither of us did at first) that checking and testing was a better way to think and speak than sapience and non-sapience. This is because “non-sapience” sounds like “stupid” and therefore it sounded like we were condemning checking by calling it non-sapient.

Do you notice how fine distinctions of language and thought can take years to work out? These ideas are the tools we need to sort out our practical decisions. Yet much like new drugs on the market, it can sometimes take a lot of experience to understand not only benefits, but also potentially harmful side effects of our ideas and terms. That may explain why those of us who’ve been working in the craft a long time are not always patient with colleagues or clients who shrug and tell us that “it’s just semantics.” It is our experience that semantics like these mean the difference between clear communication that motivates action and discipline, and fragile folklore that gets displaced by the next swarm of buzzwords to capture the fancy of management.

ET 3.0: Normalization

In 2011, sociologist Harry Collins began to change everything for us. It started when Michael read Tacit and Explicit Knowledge. We were quickly hooked on Harry’s clear writing and brilliant insight. He had spent many years studying scientists in action, and his ideas about the way science works fit perfectly with what we see in the testing field.

By studying the work of Harry and his colleagues, we learned how to talk about the difference between tacit and explicit knowledge, which allows us to recognize what can and cannot be encoded in a script or other artifacts. He distinguished between behaviour (the observable, describable aspects of an activity) and actions (behaviours with intention) (which had inspired James’ distinction between sapient and non-sapient testing). He untangled the differences between mimeomorphic actions (actions that we want to copy and to perform in the same way every time) and polimorphic actions (actions that we must vary in order to deal with social conditions); in doing that, he helped to identify the extents and limits of automation’s power. He wrote a book (with Trevor Pinch) about how scientific knowledge is constructed; another (with Rob Evans) about expertise; yet another about how scientists decide to evaluate a specific experimental result.

Harry’s work helped lend structure to other ideas that we had gathered along the way.

  • McLuhan’s ideas about media and tools
  • Karl Weick’s work on sensemaking
  • Venkatesh Rao’s notions of tempo which in turn pointed us towards James C. Scott’s notion of legibility
  • The realization (brought to our attention by an innocent question from a tester at Barclays Bank) that the “exploratory-scripted continuum” is actually the “formality continuum.” In other words, to formalize an activity means to make it more scripted.
  • The realization of the important difference between spontaneous and deliberative testing, which is the degree of reflection that the tester is exercising. (This is not the same as exploratory vs. scripted, which is about the degree of agency.)
  • The concept of “responsible tester” (defined as a tester who takes full, personal, responsibility for the quality of his work).
  • The advent of the vital distinction between checking and testing, which replaced need to talk about “sapience” in our rhetoric of testing.
  • The subsequent redefinition of the term “testing” within the Rapid Software Testing namespace to make these things more explicit (see below).

About That Last Bullet Point

ET 3.0 as a term is a bit paradoxical because what we are working toward, within the Rapid Software Testing methodology, is nothing less than the deprecation of the term “exploratory testing.”

Yes, we are retiring that term, after 22 years. Why?

Because we now define all testing as exploratory.  Our definition of testing is now this:

“Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes: questioning, study, modeling, observation and inference, output checking, etc.”

Where does scripted testing fit, then?  By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.

By defining testing to be exploratory, scripting becomes a guest in the house of our craft; a potentially useful but foreign element to testing, one that is interesting to talk about and apply as a tactic in specific situations. An excellent tester should not be complacent or dismissive about scripting, any more than a lumberjack can be complacent or dismissive about heavy equipment. This stuff can help you or ruin you, but no serious professional can ignore it.

Are you doing testing? Then you are already doing exploratory testing. Are you doing scripted testing? If you’re doing it responsibly, you are doing exploratory testing with scripting (and perhaps with checking).  If you’re only doing “scripted testing,” then you are just doing unmotivated checking, and we would say that you are not really testing. You are trying to behave like a machine, not a responsible tester.

ET 3.0, in a sentence, is the demotion of scripting to a technique, and the promotion of exploratory testing to, simply, testing.

History of Definitions of ET

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software (There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.)
1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory.
1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.”
1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner.
1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class)
1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0)
1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0)
2001(post WHET #1) The Bach View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers)

2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner
2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion)
2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.)


“Are you listening? Say something!”

[Note: J. Michael Hammond suggests that I note right at the top of this post that the dictionary definition of listen does not restrict the word to the mere noticing of sounds. In the Oxford English Dictionary, as well as others, an extended and more active definition of listening is also given. It’s that extended definition of listening I am writing about here.]

I’m tired of hearing the simplistic advice about how to listen one must not talk. That’s not what listening means. I listen by reacting. As an extravert, I react partly by talking. Talking is how I chew on what you’ve told me. If I don’t chew on what you say, I will choke or get tummy aches and nightmares. You don’t want me to have nightmares, do you? Until you interrupt me to say otherwise, I charitably assume you don’t.

Below is an alternative theory of listening; one that does not require passivity. I will show how this theory is consistent the “don’t talk” advice if you consider that being quiet while other people speak is one heuristic of good listening, rather than the definition or foundation of it. I am tempted to say that listening requires talking, but that is not quite true. This is my proposal of a universal truth of listening: Listening requires you to change.

To Listen is to Change

  1. I propose that to listen is to react coherently and charitably to incoming information. That is how I would define listening.
  2. To react is to change. The reactions of listening may involve a change of mood, attention, concept, or even a physical action.

Notice that I said “coherently and charitably” and not “constructively” or “agreeably.” I think I can be listening to a criminal who demands ransom even if I am not constructive in my response to him. Reacting coherently is not the same as accepting someone’s view of the world. If I don’t agree with you or do what you want me to, that is not proof of my poor listening. “Coherently” refers to a way of making sense of something by interpreting it such that it does not contradict anything important that you also believe is true and important about the world. “Charitably” refers to making sense of something in a way most likely to fit the intent of the speaker.

Also, notice that coherence does not require understanding. I would not be a bad listener, necessarily, if I didn’t understand the intent or implications of what was told to me. Understanding is too high a burden to require for listening. Coherence and charitability already imply a reasonable attempt to understand, and that is the important part.

Poor listening would be the inability or refusal to do the following:

  • take in data at a reasonable pace. (“reasonable pace” is subject to disagreement)
  • make sense of data that is reasonably sensible in that context, including empathizing with it. (“reasonably sensible” is subject to disagreement)
  • reason appropriately about the data. (“reason appropriately” is subject to disagreement)
  • take appropriate responsibility for one’s feelings about the data (“appropriate responsibility” is subject to disagreement)
  • make a coherent response. (“coherent response” is subject to disagreement)
  • comprehend the reasonable purposes and nature of the interaction (“reasonable purposes and nature” is subject to disagreement)

Although all these elements are subject to disagreement, you might not choose to actively dispute them in a given situation, because maybe you feel that the disagreement is not very important. (As an example, I originally wrote “dispute” in the text above, which I think is fine, but during review, after hearing me read the above, Michael Bolton suggested changing “dispute” to “disagreement” and that seemed okay, too, so I made the change. In making his suggestion, he did not need to explain or defend his preference, because he’s earned a lot of trust with me and I felt listened to.)

I was recently told, in an argument, that I was not listening. I didn’t bother to reply to the man that I also felt he wasn’t listening to me. For the record, I think I was listening well enough, and what the man wanted from me was not listening– he wanted compliance to his world view, which was the very matter of dispute! Clearly he wasn’t getting the reaction he wanted, and the word he used for that was listening. Meanwhile, I had reacted to his statements with arguments against them. To me, this is close to the essence of listening.

If you really believe someone isn’t listening, it’s unlikely that it will help to say that, unless you have a strong personal relationship. When my wife tells me I’m not listening, that’s a very special case. She’s weaker than me and crucial to my health and happiness, therefore I will use every tool at my disposal to make myself easy for her to talk to. I generally do the same for children, dogs, people who seem mentally unstable, fire, and dangerous things, but not for most colleagues. I do get crossed up sometimes. Absolutely. Especially on Twitter. Sometimes I assume a colleague feels powerful, and respond to him that way, only later to discover he was afraid of me.

(This happened again just the other day on Twitter. Which is why it is unlikely you will see me teach in Finland any time soon! I am bitten by such a mistake a few times a year, at least. For me this is not a reason to be softer with my colleagues. Then again, it may be. I struggle with the pros and cons. There is no simple answer. I regularly receive counsel from my most trusted colleagues on this point.)

A Sign of Being Listened to is the Change that Happens

Introspect for a moment. How do you know that your computer is listening to you? At this moment, as I am typing, the letters I want to see are appearing on the screen as I press the keys. WordPress is talking back to me. WordPress is changing, and its changes seem coherent and reasonable to me. My purposes are apparently being served. The computer is listening. Consider what happens when you don’t see a response from your computer. How many times have you clicked “save” or “print” or “calculate” or “paste” and suffered that sinking feeling as the forest noises go completely silent and your screen goes glassy and gets that faraway grayed out look of the damned? You feel out of control. You want to shout at your screen “Come back! I’ve changed my mind! Undo! Cancel!” How do you feel then? You don’t say to yourself “what a good listener my computer is!”

Why is this so? It’s because you are involved in a cybernetic control loop with your computer. Without frequent feedback from your system you lose your control over it. You don’t know what it needs or what to do about it. It may be listening to something, but when nothing changes in a manner that seems to relate to your input, you suspect it is not listening to you.

Based just on this example I conjecture that we feel listened to when a system responds to our utterances and actions in a harmonious manner that honors our purposes. I further conjecture that the advice to maintain attentive silence in order to listen better is a special case of change in such a way as to foster harmony and supportiveness.

Can we think of a situation where listening to someone means shouting loudly over them? I can. I was recently in a situation where a quiet colleague was trying to get students to return to her tutorial after a break. The hallway was too noisy and few people could hear her. I noticed that, so I repeated her words very loudly that her students might hear. I would argue that I listened and responded harmoniously in support of her needs. I didn’t ask her if she felt that I listened to her. She knows I did. I could tell by her smile.

If my wife cries “brake!” when I’m driving, I hit the brake. The physical action of my foot on the brake is her evidence that I listened, not attentive silence or passivity.

It may be a small change or a large change, but for the person communicating with you to feel listened to, they must see good evidence of an appropriate change (or change process) in you.

Let me tell you about being a father of a strong-minded son. I have been in numerous arguments with my boy. I have learned how to get my point across: plant the idea, argue for a while, and then let go of it. I discovered it doesn’t matter if he seems to reject the idea. In fact, I’ve come to believe he cannot reject any idea of mine unless it is genuinely wrong for him. I know he’s listening because he argues with me. And if he gets upset, that means he must be taking it quite seriously. Then I wait. And I invariably see a response in the days that follow (I mean not a single instance of this not happening comes to mind right now).

One of the tragedies of fatherhood is that many fathers can’t tell when their children are listening because they need to see too specific a response too quickly. Some listening is a long process. I know that my son needs to chew on difficult ideas in order to process them. This is how to think about the listening process. True listening implies digestion and incubation. The mental metabolism is subtle, complicated, and absolutely vital.

Let People Chew on Your Ideas

Listening is not primarily about taking information into yourself, any more than eating is about taking food into yourself. With eating the real point is digestion. And for good listening you need to digest, too. Part of digestion is chewing, and for humans part of listening is reacting to the raw data for the purposes of testing understanding and contrasting the incoming data with other data they have. Listening well about any complicated thing requires testing. Does this apply to your spouse and children, too? Yes! But perhaps it applies differently to them than to a colleague at work, and certainly differently than testing-as-listening to politician or a telemarketer.

Why does this matter so much? Because if we uncritically accept ideas we risk falling prey to shallow agreement, which is the appearance of agreement despite an unrecognized deep disagreement. I don’t want to find out in the middle of a critical moment on a project that your definition of testing, or role, or collaboration, or curiosity doesn’t match mine. I want to have conversations about the meanings of words well before that. Therefore I test my understanding. Too many in the Agile culture seem to confuse a vacant smile with philosophical and practical comprehension. I was told recently that for an Agile tester, “collaboration” may be more important than testing skill. That is probably the stupidest thing I have heard all year. By “stupid” I mean willfully refusing to use one’s mind. I was talking to a smart man who would not use his smarts in that moment, because, by his argument, the better tester is the one who agrees to do anything for anyone, not the one who knows how to find important bugs quickly. In other words, any unskilled day laborer off the street, desperate for work, is apparently a better tester than me. Yeah… Right…

In addition to the idea digestion process, listening also has a critical social element. As I said above, whether or not you are listening is, practically speaking, always a matter of potential dispute. That’s the way of it. Listening practices and instances are all tied up in socially constructed rituals and heuristics. And these rituals are all about making ourselves open to reasonable change in response to each other. Listening is about the maintenance of social order as well as maintaining specific social relationships. This is the source of all that advice about listening by keeping attentively quiet while someone else speaks. What that misses is that the speaker also has a duty to perform in the social system. The speaker cannot blather on in ignorance or indifference to the idea processing practices of his audience. When I teach, I ask my students to interrupt me, and I strive to reward them for doing so. When I get up to speak, I know I must skillfully use visual materials, volume control, rhythm, and other rhetorical flourishes in order to package what I’m communicating into a more digestible form.

Unlike many teachers, I don’t interpret silence as listening. Silence is easy. If an activity can be done better and cheaper by a corpse or an inanimate object, I don’t consider it automatically worth doing as a living human.

I strongly disagree with Paul Klipp when he writes: “Then stop thinking about talking and pretend, if it’s not obvious to you yet, that the person who is talking is as good at thinking as you are. You may suddenly have a good idea, or you may have information that the person speaking doesn’t. That’s not a good enough reason to interrupt them when they are thinking.” Paul implies that interrupting a speaker is an expression of dominance or subversion. Yes, it can be, but it is not necessarily so, and I wish someone trained in Anthropology would avoid such an uncharitable oversimplification. Some interruptions are harmful and some are helpful. In fact, I would say that every social act is both harmful and helpful in some way. We must use our judgment to know what to say, how to say it and when. Stating favorite heuristics as if they were “best practices” is patronizing and unnecessary.

One Heuristic of Listening: Stop Talking

Where I agree with Paul and others like him is that one way of improving the harmony of communication and that feeling of being coherently and charitably responded to is to talk less. I’m more likely to use that in a situation where I’m dealing with someone whom I suspect is feeling weak, and whom I want to encourage to speak to me. However, another heuristic I use in that situation is to speak more. I do this when I want to create a rhetorical framework to help the person get his idea across. This has the side effect of taking pressure of someone who may not want to speak at all. I say this based on the vivid personal experience of my first date with the one who would become my wife. I estimate I spoke many thousands of words that evening. She said about a dozen. I found out later that’s just what she was looking for. How do I know? After two dates we got married. We’ve been married 23 years, so far. I also have many vivid experiences of difficult conversations that required me to sit next to her in silence for as long as 10 minutes until she was ready to speak. Both the “talk more” and “talk less” heuristics are useful for having a conversation.

What does this have to do with testing?

My view of listening can be annoying to people for exactly the same reason that testing is annoying to people. A developer may want me to accept his product without “judgment.” Sorry, man. That is not the tester’s way. A tester who doesn’t subject your product to criticism is, in fact, not taking it seriously. You should not feel honored by that, but rather insulted. Testing is how I honor strong, good products. And arguing with you may be how I honor your ideas.

Listening, I claim, is itself a testing process. It must be, because testing is how we come to comprehend anything deeply. Testing is a practice that enables deep learning and deeply trusting what we know.

Are You Listening to Me?

Then feel free to respond. Even if you disagree, you could well have been listening. I might be able to tell from your response, if that matters to you.

If you want to challenge this post, try reading it carefully… I will understand if you skip parts, or see one thing and want to argue with that. Go ahead. That might be okay. If I feel that there is critical information that you are missing, I will suggest that you read the post again. I don’t require that people read or listen to me thoroughly before responding. I ask only that you make a reasonable and charitable effort to make sense of this.


How Not to Standardize Testing (ISO 29119)

Many years ago I took a management class. One of the exercises we did was on achieving consensus. My group did not reach an agreement because I wouldn’t lower my standards. I wanted to discuss the matter further, but the other guys grew tired of arguing with me and declared “consensus” over my objections. This befuddled me, at first. The whole point of the exercise was to reach a common decision, and we had failed, by definition, to do that– so why declare consensus at all? It’s like getting checkmated in chess and then declaring that, well, you still won the part of the game that you cared about… the part before the checkmate.

Later I realized this is not so bizarre. What they had effectively done is ostracize me from the team. They had changed the players in the game. The remaining team did come to consensus. In the years since, I have found that changing the boundaries or membership of a community is indeed an important pillar of consensus building. I have used this tactic many times to avoid unhelpful debate. It is one reason why I say that I’m a member of the Context-Driven School of Testing. My school does not represent all schools, and the other schools do not represent mine. Therefore, we don’t need consensus with them.

Then what about ISO 29119?

The ISO organization claims to have a new standard for software testing. But ISO 29119 is not a standard for testing. It cannot be a standard for testing.

A standard for testing would have to reflect the values and practices of the world community of testers. Yet, the concerns of the Context-Driven School of thought, which has been in development for at least 15 years have been ignored and our values shredded by this so-called standard and the process used to create it. They have done this by excluding us. There are two organizations explicitly devoted to Context-Driven values (AST and ISST) and our community holds several major conferences a year. Members of our community speak at all the major practitioners conferences, and our ideas are widely cited. Some of the most famous testers in the the world, including me, are Context-Driven testers. We exist, and together with the Agilists, we are the source of nearly every new idea in testing in the last decade.

The reason they have excluded us is that they know we won’t agree to any simplistic standard based on templates or simple formulae. We know those things look pretty but they don’t help. If ISO doesn’t exclude us, they worry they will never finish. They know we will challenge their evidence, and even their ethics and basic competence. This is why I say the craft is not ready for standards. It will be years before all the recognized experts in testing can come together and agree on anything substantial.

The people running the ISO effort know exactly who we are. I personally have had multiple public debates with Stuart Reid, on stage. He cannot pretend we don’t exist. He cannot pretend we are some sort of lunatic fringe. Tens of thousands of testers have watched my video lectures or bought my books. This is not a case where ISO can simply declare us to be outsiders.

The Burden of Proof

The Context-Driven community stands for excellence in testing. This is why we must reject this depraved attempt by ISO to grab power and assert control over our craft. Our craft is still an open marketplace of ideas, and it is full of strong debates. We must protect that marketplace and allow it to evolve. I want the fair chance to put my competitors out of business (or get them to change their business) with the high quality of my work. Context-Driven testing has been growing in strength and numbers over the years. Whereas this ISO effort appears to be a job protection program for people who can’t stomach debate. They can’t win the debate so they want to remake the rules.

The burden of proof is not on me or any of us to show that the standard is wrong, nor is it our job to make it right. The burden is on those who claim that the craft can be standardized to study the craft and recognize and resolve the deep differences among us. Failing that, there can be no ethical or rational basis for standardization.

This blog post puts me on record as opposing the ISO 29119 standard. Together with my colleagues, we constitute a determined and sustained and principled opposition.

Let’s test at Let’s Test

I’ve been telling people that the best conference I know for thinking testers is Let’s Test (followed closely by CAST, which I will also be at, this year, in New York). Let’s Test was created by people who experienced CAST and wanted to be even more dedicated to Context-Driven testing principles.

Now, I’m here in Stockholm once again to be with the most interesting testers in Europe. I’m not done with my presentations, yet. But I still have a couple of days.

(I will presenting a new model of what it means to be an excellent observer, together with one or two observation challenges for participants. And Pradeep Soundararajan and I will be presenting a tutorial on reviewing a specification by testing it.)

Let’s Test is not for the faint of heart. Events go on day and night. I suffer from terrible jet lag, so I probably won’t be seen after dinner. But for you crazy kids, it’s a great place to try a testing exercise, or present one.

(Note: I’m being paid to teach at Let’s Test. I don’t get a percentage of the gate, though– I get paid the same whether anyone shows up or not.)

Australia Let’s Test

I will also be in Australia for the first Let’s Test happening down there, in September. There are some interesting testers in Oz. I’m sure they will all be there. It will be the first great party of ambitious intellectual testers that I know of in the history of Australian testing.

Anne-Marie Charrett and I will be doing our Coaching Testers tutorial, which is the only time this year we will teach it together.

“Intellectual” testers?

Why do I keep saying that? Because the state of the practice in testing is for testers NOT to read about their craft, NOT to study social science or know anything about the proper use of statistics or the meaning of the word “heuristic”, and NOT to challenge the now 40 year stale ideas about making testing into factory work that lead directly to mass outsourcing of testing to lowest bidder instead of the most able tester.

Intellectual testers are not the most common type of tester.

The ISTQB and similar programs require your stupidity and your fear in order to survive. And their business model is working. They don’t debate us for the same reason that HP made billions of dollars selling bad test tools by pitching them to non-testers who had more money than wisdom. Debating us would spoil their racket.

So, don’t be like that. Be smart.

I’ll see you at Let’s Test.

Variable Testers

I once heard a vice president of software engineering tell his people that they needed to formalize their work. That day, I was an unpaid consultant in the building to give a free seminar, so I had even less restraint than normal about arguing with the guy. I raised my hand, “I don’t think you can mean that, sir. Formality is about sameness. Are you really concerned that your people are working in different ways? It seems to me that what you ought to be concerned about is effectiveness. In other words, get the job done. If the work is done a different way every time, but each time done well, would you really have a problem with that? For that matter, do you actually know how your folks work?”

This was years ago. I’m wracking my brain, but I can’t remember specifically how the executive responded. All I remember is that he didn’t reply with anything very specific and did not seem pleased to be corrected by some stranger who came to give a talk.

Oh well, it had to be done.

I have occasionally heard the concern by managers that testers are variable in their work; that some testers are better than others; and that this variability is a problem. But variability is not a problem in and of itself. When you drive a car, there are different cars on the road each day, and you have to make different patterns of turning the wheel and pushing the brake. So what?

The weird thing is how utterly obvious this is. Think about managers, designers, programmers, product owners… think about ANYONE in engineering. We are all variable. Complaining about testers being variable– as if that were a special case– seems bizarre to me… unless…

I suppose there are two things that come to mind which might explain it:

1) Maybe they mean “testers vary between satisfying me and not satisfying me, unlike other people, who always satisfy me.” To examine this we would discover what their expectations are. Maybe they are reasonable or maybe they are not. Maybe a better system for training and leading testers is needed.

2) Maybe they mean “testing is a strictly formal process that by its nature should not vary.” This is a typical belief by people who know nothing about testing. What they need is to have testing explained or demonstrated to them by someone who knows what he’s doing.






No KPIs: Use Discussion

My son is ready to send the manuscript of his novel to publishers. It’s time to see what the interest is. In other words, we are going to beta on it. He made this decision tonight.

What is the quality level of his manuscript? There is no objective measure for that. Even if we might imagine “requirements” we could not say for sure if they are met. I can tell you that the novel is about 800 pages long, representing well more than 1,200 hours of his work alone. I have worked a lot on editing and review. The first half has been rewritten many times– maybe 20 or 30. It’s a mature draft.

The first third is good, in my opinion. I’m biased. I’ve read the parts I’ve read many many times. But it seems good to me. I cannot yet speak about the latter 2/3 because I haven’t gotten there yet. I know it will be good by the time we’ve completed the editing, because he’s using a methodical, competent editing process.

Here’s my point. My son, who relies on me to test his novel, has not asked me to quantify my process nor my results. I have not been asked for a KPI. He cares deeply about the quality of his work, but he doesn’t think that can be reduced to numbers. I think this is partly because my son is no longer a child. He doesn’t need me or anyone else to make complicated life simple for him.

How do you measure quality?

Gather relevant evidence through testing and other means. Then discuss that evidence.

That’s how it works for us. That’s how it works for publishers. That’s how it works for almost everything.

Who can’t accept this?

Children and liars.

But my company demands that I report quality in the form of an objective metric!

I’m sorry that you work for children and/or liars. You must feel awful.


Agile Testing Heuristic: The Power of Looking

Today I broke my fast with a testing exercise from a colleague. (Note: I better not tell you what it is or even who gave it to me, because after you read this it will be spoiled for you, whereas if you read this and at a later time stumble into that challenge, not knowing that’s the one I was talking about, it won’t be spoiled.)

The exercise involved a short spec and an EXE. The challenge was how to test it.

The first thing I checked is if it had a text interface that I could interact with programmatically. It did. So I wrote a program to flood it with “positive” and “negative” input. The results were collected in a log file. I programmatically checked the output and it was correct.

So far this is a perfectly ordinary Agile testing situation. It is consistent with any API testing or systematic domain testing of units you have heard of. The program I wrote performs a check, and the check is produced by my testing thought process and its output analyzed by a similar thought process. That human element qualifies this as testing and not merely naked checking. If I were to hand my automated check to someone else who did not think like a tester, it would not be testing anymore, although the checks would still have some value, probably.

Here’s my public service announcement: Kids! Remember to look at what is happening.

The Power of Looking

One aspect of my strategy I haven’t described yet is that I carefully watched the check as it was running. I do this not as a bored, offhanded, or incidental matter. It’s absolutely vital. I must observe all the output I can observe, rather than just the “pass/fail” status of my checks. I will comb through log files, watch the results in real-time, try things through the GUI, whatever CAN be seen, I want to see it.

As I watched the output flow by in this particular example, I noticed that it was much slower than I expected. Moreover, the speed of the output was variable. It seemed to vary semi-randomly. Since there was nothing in the nature of the program (as I understood it) that would explain slowness or variable timing, this became an instant focus of investigation. Either there’s a bug here or something I need to learn. (Note: that is known as the Explainability Oracle Heuristic.)

It’s possible that I could have anticipated and explicitly checked for performance issues, of course, but my point is that the Power of Looking is a heuristic for discovering lots of things you did NOT anticipate. The models in your mind generate expectations, automatically, that you may not even be aware of until they are violated.

This is important for all testing, but it’s especially important for tool-happy Agile testers, bless their hearts, some of whom consider automation to be next to godliness… Come to think of it, if God has automated his tests for human qualities, that would explain a lot…



Test Jumpers: One Vision of Agile Testing

Many software companies, these days, are organized around a number of small Agile teams. These teams may be working on different projects or parts of the same project. I have often toured such companies with their large open plan offices; their big tables and whiteboards festooned with colorful Post-Its occasionally fluttering to the floor like leaves in a perpetual autumn display; their too many earbuds and not nearly enough conference rooms. Sound familiar, Spotify? Skype?

(This is a picture of a smoke jumper. I wish test jumpers looked this cool.)

I have a proposal for skilled Agile testing in such places: a role called a “test jumper.” The name comes from the elite “smoke jumper” type of firefighter. A test jumper is a trained and enthusiastic test lead (see my Responsible Tester post for a description of a test lead) who “jumps” into projects and from project to project: evaluating the testing, doing testing or organizing people in other roles to do testing. A test jumper can function as test team of one (what I call an omega tester ) or join a team of other testers.

The value of a role like this arises because in a typical dedicated Agile situation, everyone is expected to help with testing, and yet having staff dedicated solely to testing may be unwarranted. In practice, that means everyone remains chronically an amateur tester, untrained and unmotivated. The test jumper role could be a role held by one person, dedicated to the mastery of testing skills and tools, who is shared among many projects. This is a role that I feel close to, because it’s sort of what I already do. I am a consulting software tester who likes to get his hands dirty doing testing and running in-house testing events. I love short-term assignments and helping other testers come up to speed.



What Does a Test Jumper Do?

A test jumper basically asks, How are my projects handling the testing? How can I contribute to a project? How can I help someone test today?

Specifically a test jumper:

  • may spend weeks on one project, acting as an ordinary responsible tester.
  • may spend a few days on one project, organizing and leading testing events, coaching people, and helping to evaluate the results.
  • may spend as little as 90 minutes on one project, reviewing a test strategy and giving suggestions to a local tester or developer.
  • may attend a sprint planning meeting to assure that testing issues are discussed.
  • may design, write, or configure a tool to help perform a certain special kind of testing.
  • may coach another tester about how to create a test strategy, use a tool, or otherwise learn to be a better tester.
  • may make sense of test coverage.
  • may work with designers to foster better testability in the product.
  • may help improve relations between testers and developers, or if there are no other testers help the developers think productively about testing.

Test jumping is a time-critical role. You must learn to triage and split your time across many task threads. You must reassess project and product risk pretty much every day. I can see calling someone a test jumper who never “jumps” out of the project, but nevertheless embodies the skills and temperament needs to work in a very flexible, agile, self-managed fashion, on an intense project.

Addendum #1: Commenter Augusto Evangelisti suggests that I emphasize the point about coaching. It is already in my list, above, but I agree it deserves more prominence. In order to safely “jump” away from a project, the test jumper must constantly lean toward nudging, coaching, or even training local helpers (who are often the developers themselves, and who are not testing specialists, even though they are super-smart and experienced in other technical realms) and local responsible testers (if there are any on that project). The ideal goal is for each team to be reasonably self-sufficient, or at least for the periodic visits of the test jumper to be enough to keep them on a good track.

What Does a Test Jumper Need?

  • The ability and the enthusiasm for plunging in and doing testing right now when necessary.
  • The ability to pull himself out of a specific test task and see the big picture.
  • The ability to recruit helpers.
  • The ability to coach and train testers, and people who can help testing.
  • A wide knowledge of tools and ability to write tools as needed.
  • A good respectful relationship with developers.
  • The ability to speak up in sprint planning meetings about testing-related issues such as testability.
  • A keen understanding of testability.
  • The ability to lead ad hoc groups of people with challenging personalities during occasional test events.
  • An ability to speak in front of people and produce useful and concise documentation as necessary.
  • The ability to manage many threads of work at once.
  • The ability to evaluate and explain testing in general, as well as with respect to particular forms of testing.

A good test jumper will listen to advice from anyone, but no one needs to tell a test jumper what to do next. Test jumpers manage their own testing missions, in consultation with such clients as arise. A test jumper must be able to discover and analyze the testing context, then adapt to it or shape it as necessary. It is a role made for the Context-Driven school of testing.

Does a Test Jumper Need to be a Programmer?

Coding skills help tremendously in this role, but being a good programmer is not absolutely required. What is required is that you learn technical things very quickly and have excellent problem-solving and social skills. Oh, and you ought to live and breathe testing, of course.

How Does a Test Jumper Come to Be?

A test jumper is mostly self-created, much as good developers are. A test jumper can start as a programmer, as I did, and then fall in love with the excitement of testing (I love the hunt for bugs). A test jumper may start as a tester, learn consulting and leadership skills, but not want to be a full-time manager. Management has its consolations and triumphs, of course, but some of us like to do technical things. Test jumping may be part of extending the career path for an experienced and valuable tester.

RST Methodology: “Responsible Tester”

In Rapid Software Testing methodology, we recognize three main roles: Leader, Responsible Tester, and Helper. These roles are situational distinctions. The same person might be a helper in one situation, a leader in another, and a responsible tester in yet another.

Responsible Tester

Rapid Software Testing is a human-centered approach to testing, because testing is a performance and can only be done by humans. Therefore, testing must be traceable to people, or else it is literally and figuratively irresponsible. Hence, a responsible tester is that tester who bears personal responsibility for testing a particular thing in a particular way for a particular project. The responsible tester answers for the quality of that testing, which means the tester can explain and defend the testing, and make it better if needed. Responsible testers also solicit and supervise helpers, as needed (see below).

This contrasts with factory-style testing, which relies on tools and texts rather than people. In the Factory school of testing thought, it should not matter who does the work, since people are interchangeable. Responsibility is not a mantle on anyone’s shoulders in that world, but rather a sort of smog that one seeks to avoid breathing too much of.

Example of testing without a responsible tester: Person A writes a text called a “test case” and hands it to person B. Person B reads the text and performs the instructions in the text. This may sound okay, but what if Person B is not qualified to evaluate if he has understood and performed the test, while at the same time Person A, the designer, is not watching and so also isn’t in position to evaluate it? In such a case, it’s like a driverless car. No one is taking responsibility. No one can say if the testing is good or take action if it is not good. If a problem is revealed later, they may both rightly blame the other.

That situation is a “sin” in Rapid Testing. To be practicing RST, there must always a responsible tester for any work that the project relies upon. (Of course students and otherwise non-professional testers can work unsupervised as practice or in the hopes of finding one more bug. That’s not testing the project relies upon.)

A responsible tester is like being the driver of an automobile or the pilot-in-command of an aircraft.


A helper is someone who contributes to the testing without taking responsibility for the quality of the work AS testing. In other words, if a responsible tester asks someone to do something simple to press a button, the helper may press the button without worrying about whether that has actually helped fulfill the mission of testing. Helpers should not be confused with inexperienced or low-skilled people. Helpers may be very skilled or have little skill. A senior architect who comes in to do testing might be asked to test part of the product and find interesting bugs without being expected to explain or defend his strategy for doing that. It’s the responsible tester whose job it is to supervise people who offer help and evaluate the degree to which their work is acceptable.

Beta testing is testing that is done entirely by helpers. Without responsible testers in the mix, it is not possible to evaluate in any depth what was achieved. One good way to use beta testers is to have them organized and engaged by one or more responsible testers.


A leader is someone whose responsibility is to foster and maintain the project conditions that make good testing possible; and to train, support, and evaluate responsible testers. There are at least two kinds of leader, a test lead and a test manager. The test manager is a test lead with the additional responsibilities of hiring, firing, performance reviews, and possibly budgeting.

In any situation where a leader is responsible for testing and yet has no responsible testers on his team, the leader IS the acting responsible tester. A leader surrounded by helpers is the responsible tester for that team.


Integrity #3: A Testimonial

Oliver Erlewein is an automation specialist. He’s respected in the Context-Driven Testing community of New Zealand and has been an agitator pushing back against the ISTQB. After some years of frustration with bad management he finally went independent. Now he’s back to full-time work. He posted the following as a comment:

Starting 2014, I have given up my self-employment and joined a (sort of) start up. I didn’t think I was ever going back to being employed but this was worth it. I have found a company that respects my professionalism and listens to what I say, where I am responsible for what I produce and get the full control of how to go about it. I and the task I do are respected. The word integrity doesn’t get used here but it is a place that actually has oodles of it.

Every now and again I hear the sentence “you are the expert so what do you suggest we do?” or “do what you think is right, you are the expert” ….and they mean it exactly like that. It makes for a completely different working environment. It motivates, it invigorates and it makes working fun. It puts heaps more pressure and responsibility on me but I am happy as taking that on board because I am convinced that I can do it (even if I still don’t know how right at this moment).

Although this shop is not agile (but more agile than a lot of the shops out there that call themselves agile!) they do something that is one of the main success factors for agile: They re-introduce back the idea of responsibility, professionalism and craftsmanship into (IT) work. And that motivates. I feel like I can call bull**** if it is appropriate to do so or get traction on subjects I think are important.

So although it meant I made a career change away from my original trajectory I made it consciously towards a more ethical work life, where integrity and being the best you can actually counts for something.

Thank you for sharing that with us, Oliver. It goes to show that there are good managers out there who understand craftsmanship and leadership.


A Test is a Performance

Testing is a performance, not an artifact.

Artifacts may be produced before, during, or after the act of testing. Whatever they are, they are not tests. They may be test instructions, test results, or test tools. They cannot be tests.

Note: I am speaking a) authoritatively about how we use terms in Rapid Testing Methodology, b) non-authoritatively of my best knowledge of how testing is thought of more broadly within the Context-Driven school, and c) of my belief about how anyone, anywhere should think of testing if they want a clean and powerful way to talk about it.

I may informally say “I created a test.” What I mean by that is that I designed an experience, or I made a plan for a testing event. That plan itself is not the test, anymore than a picture of a car is a car. Therefore, strictly speaking, the only way to create a test is to perform a test. As Michael Bolton likes to say, there’s a world of difference between sheet music and a musical performance, even though we might commonly refer to either one as “music.” Consider these sentences: “The music at the symphony last night was amazing.” vs. “Oh no, I left the music on my desk at home.”

We don’t always have to speak strictly, but we should know how and know why we might want to.

Why can’t a test be an artifact?

Because artifacts don’t think or learn in the full human sense of that word, that’s why, and thinking is central to the test process. So to claim that an artifact is a test is like wearing a sock puppet on your hand and claiming that it’s a little creature talking to you. That would be no more than you talking to yourself, obviously, and if you removed yourself from that equation the puppet wouldn’t be a little creature, would it? It would be a decorated sock lying on the floor. The testing value of an artifact can be delivered only in concert with an appropriately skilled and motivated tester.

With procedures or code you can create a check. See here for a detailed look at the difference between checking and testing. Checking is part of testing, of course. Anyone who runs checks that fail knows that the next step is figuring out what the failures mean. A tester must also evaluate whether the checks are working properly and whether there are enough of them, or too many, or the wrong kind. All of that is part of the performance of testing.

When a “check engine” light goes on in your car, or any strange alert, you can’t know until you go to a mechanic whether that represents a big problem or a little problem. The check is not testing. The testing is more than the check itself.

But I’ve seen people follow test scripts and only do what the test document tells them to do!

Have you really witnessed that? I think the most you could possibly have witnessed is…


a tester who appeared to do “only” what the test document tells him, while constantly and perhaps unconsciously adjusting and reacting to what’s happening with the system under test. (Such a tester may find bugs, but does so by contributing interpretation, judgment, and analysis; by performing.)


a tester who necessarily missed a lot of bugs that he could have found, either because the test instructions were far too complex, or far too vague, or there was far too little of it (because that documentation is darn expensive) and the tester failed to perform as a tester to compensate.

In either case, the explicitly written or coded “test” artifact can only be an inanimate sock, or a sock puppet animated by the tester. You can choose to suffer without a tester, or to cover up the presence of the tester. Reality will assert itself either way.

What danger could there be in speaking informally about writing “tests?”

It’s not necessarily dangerous to speak informally. However, a possible danger is that non-testing managers and clients of our work will think of testers as “test case writers” instead of as people who perform the skilled process of testing. This may cause them to treat testers as fungible commodities producing “tests” that are comprised solely of explicit rules. Such a theory of testing– which is what we call the Factory school of testing thought– leads to expensive artifacts that uncover few bugs. Their value is mainly in that they look impressive to ignorant people.

If you are talking to people who fully understand that testing is a performance, it is fine to speak informally. Just be on your guard when you hear people say “Where are your tests?” “Have you written any tests?” or “Should you automate those tests?” (I would rather hear “How do you test this?” “Where are you focusing you testing?” or “Are you using tools to help your testing?”)

Thanks to Michael Bolton and Aleksander Simic for reviewing and improving this post.