Rethinking Equivalence Class Partitioning, Part 1

Wikipedia’s article on equivalence class partitioning (ECP) is a great example of the poor thinking and teaching and writing that often passes for wisdom in the testing field. It’s narrow and misleading, serving to imply that testing is some little game we play with our software, rather than an open investigation of a complex phenomenon.

(No, I’m not going to edit that article. I don’t find it fun or rewarding to offer my expertise in return for arguments with anonymous amateurs. Wikipedia is important because it serves as a nearly universal reference point when criticizing popular knowledge, but just like popular knowledge itself, it is not fixable. The populus will always prevail, and the populus is not very thoughtful.)

In this article I will comment on the Wikipedia post. In a subsequent post I will describe ECP my way, and you can decide for yourself if that is better than Wikipedia.

“Equivalence partitioning or equivalence class partitioning (ECP)[1] is a software testing technique that divides the input data of a software unit into partitions of equivalent data from which test cases can be derived.”

Not exactly. There’s no reason why ECP should be limited to “input data” as such. The ECP thought process may be applied to output, or even versions of products, test environments, or test cases themselves. ECP applies to anything you might be considering to do that involves any variations that may influence the outcome of a test.

Yes, ECP is a technique, but a better word for it is “heuristic.” A heuristic is a fallible method of solving a problem. ECP is extremely fallible, and yet useful.

“In principle, test cases are designed to cover each partition at least once. This technique tries to define test cases that uncover classes of errors, thereby reducing the total number of test cases that must be developed.”

This text is pretty good. Note the phrase “In principle” and the use of the word “tries.” These are softening words, which are important because ECP is a heuristic, not an algorithm.

Speaking in terms of “test cases that must be developed,” however, is a misleading way to discuss testing. Testing is not about creating test cases. It is for damn sure not about the number of test cases you create. Testing is about performing experiments. And the totality of experimentation goes far beyond such questions as “what test case should I develop next?” The text should instead say “reducing test effort.”

“An advantage of this approach is reduction in the time required for testing a software due to lesser number of test cases.”

Sorry, no. The advantage of ECP is not in reducing the number of test cases. Nor is it even about reducing test effort, as such (even though it is true that ECP is “trying” to reduce test effort). ECP is just a way to systematically guess where the bigger bugs probably are, which helps you focus your efforts. ECP is a prioritization technique. It also helps you explain and defend those choices. Better prioritization does not, by itself, allow you to test with less effort, but we do want to stumble into the big bugs sooner rather than later. And we want to stumble into them with more purpose and less stumbling. And if we do that well, we will feel comfortable spending less effort on the testing. Reducing effort is really a side effect of ECP.

“Equivalence partitioning is typically applied to the inputs of a tested component, but may be applied to the outputs in rare cases. The equivalence partitions are usually derived from the requirements specification for input attributes that influence the processing of the test object.”

Typically? Usually? Has this writer done any sort of research that would substantiate that? No.

ECP is a process that we all do informally, not only in testing but in our daily lives. When you push open a door, do you consciously decide to push on a specific square centimeter of the metal push plate? No, you don’t. You know that for most doors it doesn’t matter where you push. All pushable places are more or less equivalent. That is ECP! We apply ECP to anything that we interact with.

Yes, we apply it to output. And yes, we can think of equivalence classes based on specifications, but we also think of them based on all other learning we do about the software. We perform ECP based on all that we know. If what we know is wrong (for instance if there are unexpected bugs) then our equivalence classes will also be wrong. But that’s okay, if you understand that ECP is a heuristic and not a golden ticket to perfect testing.

“The fundamental concept of ECP comes from equivalence class which in turn comes from equivalence relation. A software system is in effect a computable function implemented as an algorithm in some implementation programming language. Given an input test vector some instructions of that algorithm get covered, ( see code coverage for details ) others do not…”

At this point the article becomes Computer Science propaganda. This is why we can’t have nice things in testing: as soon as the CS people get hold of it, they turn it into a little logic game for gifted kids, rather than a pursuit worthy of adults charged with discovering important problems in technology before it’s too late.

The fundamental concept of ECP has nothing to do with computer science or computability. It has to do with logic. Logic predates computers. An equivalence class is simply a set. It is a set of things that share some property. The property of interest in ECP is utility for exploring a particular product risk. In other words, an equivalence class in testing is an assertion that any member of that particular group of things would be more or less equally able to reveal a particular kind of bug if it were employed in a particular kind of test.

If I define a “test condition” as something about a product or its environment that could be examined in a test, then I can define equivalence classes like this: An equivalence class is a set of tests or test conditions that are equivalent with respect to a particular product risk, in a particular context. 

This implies that two inputs which are not equivalent for the purposes of one kind of bug may be equivalent for finding another kind of bug. It also implies that if we model a product incorrectly, we will also be unable to know the true equivalence classes. Actually, considering that bugs come in all shapes and sizes, to have the perfectly correct set of equivalence classes would be the same as knowing, without having tested, where all the bugs in the product are. This is because ECP is based on guessing what kind of bugs are in the product.

If you read the technical stuff about Computer Science in the Wikipedia article, you will see that the author has decided that two inputs which cover the same code are therefore equivalent for bug finding purposes. But this is not remotely true! This is a fantasy propagated by people who I suspect have never tested anything that mattered. Off the top of my head, code-coverage-as-gold-standard ignores performance bugs, requirements bugs, usability bugs, data type bugs, security bugs, and integration bugs. Imagine two tests that cover the same code, and both involve input that is displayed on the screen, except that one includes an input which is so long that when it prints it goes off the edge of the screen. This is a bug that the short input didn’t find, even though both inputs are “valid” and “do the same thing” functionally.

The Fundamental Problem With Most Testing Advice Is…

The problem with most testing advice is that it is either uncritical folklore that falls apart as soon as you examine it, or else it is misplaced formalism that doesn’t apply to realistic open-ended problems. Testing advice is better when it is grounded in a general systems perspective as well as a social science perspective. Both of these perspectives understand and use heuristics. ECP is a powerful, ubiquitous, and rather simple heuristic, whose utility comes from and is limited by your mental model of the product. In my next post, I will walk through an example of how I use it in real life.

We. Use. Tools.

Context-Driven testers use tools to help ourselves test better. But, there is no such thing as test automation.

Want details? Here’s the 10,000 word explanation that Michael Bolton and I have been working on for months.

Editor’s Note: I have just posted version 1.03 of this article. This is the third revision we have made due to typos. Isn’t it interesting how hard it is to find typos in your own work before you ship an article? We used automation to help us with spelling, of course, but most of the typos are down to properly spelled words that are in the wrong context. Spelling tools can’t help us with that. Also, Word spell-checker still thinks there are dozens of misspelled words in our article, because of all the proper nouns, terms of art, and neologisms. Of course there are the grammar checking tools, too, right? Yeah… not really. The false positive rate is very high with those tools. I just did a sweep through every grammar problem the tool reported. Out of the five it thinks it found, only one, a missing hyphen, is plausibly a problem. The rest are essentially matters of writing style.

One of the lines it complained about is this: “The more people who use a tool, the more free support will be available…” The grammar checker thinks we should not say “more free” but rather “freer.” This may be correct, in general, but we are using parallelism, a rhetorical style that we feel outweighs the general rule about comparatives. Only humans can make these judgments, because the rules of grammar are sometimes fluid.

Reinventing Testing: What is Integration Testing? (part 2)

These thoughts have become better because of these specific commenters on part 1: Jeff Nyman, James Huggett, Sean McErlean, Liza Ivinskaia, Jokin Aspiazu, Maxim Mikhailov, Anita Gujarathi, Mike Talks, Amit Wertheimer, Simon Morley, Dimitar Dimitrov, John Stevenson. Additionally, thank you Michael Bolton and thanks to the student whose productive confusion helped me discover a blindspot in my work, Anita Gujarathi.

Integration testing is a term I don’t use much– not because it doesn’t matter, but because it is so fundamental that it is already baked into many of the other working concepts and techniques of testing. Still, in the past week, I decided to upgrade my ability to quickly explain integration, integration risk, and integration testing. This is part of a process I recommend for all serious testers. I call it: reinventing testing. Each of us may reinvent testing concepts for ourselves, and engage in vigorous debates about them (see the comments on part 1, which is now the most commented of any post I have ever done).

For those of you interested in getting to a common language for testing, this is what I believe is the best way we have available to us. As each of us works to clarify his own thinking, a de facto consensus about reasonable testing ontology will form over time, community by community.

So here we go…

There several kinds of testing that involve or overlap with or may even be synonymous with integration testing, including: regression testing, system testing, field testing, interoperability testing, compatibility testing, platform testing, and risk-based testing. Most testing, in fact, no matter what it’s called, is also integration testing.

Here is my definition of integration testing, based on my own analysis, conversations with RST instructors (mainly Michael Bolton), and stimulated by the many commenters from part 1. All of my assertions and definitions are true within the Rapid Software Testing methodology namespace, which means that you don’t have to agree with me unless you claim to be using RST.

What is integration testing?

Integration testing is:
1. Testing motivated by potential risk related to integration.
2. Tests designed specifically to assess risk related to integration.

Notes:

1. “Motivated by” and “designed specifically to” overlap but are not the same. For instance, if you know that a dangerous criminal is on the loose in your neighborhood you may behave in a generally cautious or vigilant way even if you don’t know where the criminal is or what he looks like. But if you know what he looks like, what he is wearing, how he behaves or where he is, you can take more specific measures to find him or avoid him. Similarly, a newly integrated product may create a situation where any kind of testing may be worth doing, even if that testing is not specifically aimed at uncovering integration bugs, as such; OR you can perform tests aimed at exposing just the sort of bugs that integration typically causes, such as by performing operations that maximize the interaction of components.

The phrase “integration testing” may therefore represent ANY testing performed specifically in an “integration context”, or applying a specific “integration test technique” in ANY context.

This is a special case of the difference between risk-based test management and risk-based test design. The former assigns resources to places where there is potential risk but does not dictate the testing to be performed; whereas the latter crafts specific tests to examine the product for specific kinds of problems.

2. “Potential risk” is not the same as “risk.” Risk is the danger of something bad happening, and it can be viewed from at least three perspectives: probability of a bad event occurring, the impact of that event if it occurs, and our uncertainty about either of those things. A potential risk is a risk about which there is substantial uncertainty (in other words, you don’t know how likely the bug is to be in the product or you don’t know how bad it could be if it were present). The main point of testing is to eliminate uncertainty about risk, so this often begins with guessing about potential risk (in other words, making wild guesses, educated guesses, or highly informed analyses about where bugs are likely to be).

Example: I am testing something for the first time. I don’t know how it will deal with stressful input, but stress often causes failure, so that’s a potential risk. If I were to perform stress testing, I would learn a lot about how the product really handles stress, and the potential risk would be transformed into a high risk (if I found serious bugs related to stress) or a low risk (if the product handled stress in a consistently graceful way).

What is integration?

General definition from the Oxford English Dictionary: “The making up or composition of a whole by adding together or combining the separate parts or elements; combination into an integral whole: a making whole or entire.”

Based on this, we can make a simple technical definition related to products:

Integration is:
v. the process of constructing a product from parts.
n. a product constructed from parts.

Now, based on General Systems Theory, we make these assertions:

An integration, in some way and to some degree:

  1. Is composed of parts:
  • …that come from differing sources.
  • …that were produced for differing purposes.
  • …that were produced at different times.
  • …that have differing attributes.
  1. Creates or represents an internal environment for its parts:
  • …in which its parts interact among themselves.
  • …in which its parts depend on each other.
  • …in which its parts interact with or depend on an external environment.
  • …in which these things are not visible from the outside.
  1. Possesses attributes relative to its parts:
  • …that depend on them.
  • …that differ from them.

Therefore, you might not be able to discern everything you want to know about an integration just by looking at its parts.

This is why integration risk exists. In complex or important systems, integration testing will be critically important, especially after changes have been made.

It may be possible to gain enough knowledge about an integration to characterize the risk (or to speak more plainly: it may be possible to find all the important integration bugs) without doing integration testing. You might be able to do it with unit testing. However, that process, although possible in some cases, might be impractical. This is the case partly because the parts may have been produced by different people with different assumptions, because it is difficult to simulate the environment of an integration prior to actual integration, or because unit testing tends to focus on what the units CAN do and not on what they ACTUALLY NEED to do. (If you unit test a calculator, that’s a lot of work. But if that calculator will only ever be asked to add numbers under 50, you don’t need to do all that work.)

Integration testing, although in some senses being complex, may actually simplify your testing since some parts mask the behavior of other parts and maybe all you need to care about is the final outputs.

Notes:

1. “In some way and to some degree” means that these assertions are to be interpreted heuristically. In any specific situation, these assertions are highly likely to apply in some interesting or important way, but might not. An obvious example is where I wrote above that the “parts interact with each other.” The stricter truth is that the parts within an integration probably do not EACH directly interact with ALL the other ones, and probably do not interact to the same degree and in the same ways. To think of it heuristically, interpret it as a gentle warning such as  “if you integrate something, make it your business to know how the parts might interact or depend on each other, because that knowledge is probably important.”

By using the phrase “in some way and to some degree” as a blanket qualifier, I can simplify the rest of the text, since I don’t have to embed other qualifiers.

2. “Constructing from parts” does not necessarily mean that the parts pre-existed the product, or have a separate existence outside the product, or are unchanged by the process of integration. It just means that we can think productively about pieces of the product and how they interact with other pieces.

3. A product may possess attributes that none of its parts possess, or that differ from them in unanticipated or unknown ways. A simple example is the stability of a tripod, which is not found in any of its individual legs, but in all the legs working together.

4. Disintegration also creates integration risk. When you takes things away, or take things apart, you end up with a new integration, and that is subject to the much the same risk as putting them together.

5. The attributes of a product and all its behaviors obviously depend largely on the parts that comprise it, but also on other factors such as the state of those parts, the configurations and states of external and internal environments, and the underlying rules by which those things operate (ultimately, physics, but more immediately, the communication and processing protocols of the computing environment).

6. Environment refers to the outside of some object (an object being a product or a part of a product), comprising factors that may interact with that object. A particular environment might be internal in some respects or external in other respects, at the same time.

  • An internal environment is an environment controlled by the product and accessible only to its parts. It is inside the product, but from the point vantage point of some of parts, it’s outside of them. For instance, to a spark plug the inside of an engine cylinder is an environment, but since it is not outside the car as a whole, it’s an internal environment. Technology often consists of deeply nested environments.
  • An external environment is an environment inhabited but not controlled by the product.
  • Control is not an all-or-nothing thing. There are different levels and types of control. For this reason it is not always possible to strictly identify the exact scope of a product or its various and possibly overlapping environments. This fact is much of what makes testing– and especially security testing– such a challenging problem. A lot of malicious hacking is based on the discovery that something that the developers thought was outside the product is sometimes inside it.

7. An interaction occurs when one thing influences another thing. (A “thing” can be a part, an environment, a whole product, or anything else.)

8. A dependency occurs when one thing requires another thing to perform an action or possess an attribute (or not to) in order for the first thing to behave in a certain way or fulfill a certain requirement. See connascence and coupling.

9. Integration is not all or nothing– there are differing degrees and kinds. A product may be accidentally integrated, in that it works using parts that no one realizes that it has. It may be loosely integrated, such as a gecko that can jettison its tail, or a browser with a plugin. It may be tightly integrated, such as when we take the code from one product and add it to another product in different places, editing as we go. (Or when you digest food.) It may preserve the existing interfaces of its parts or violate them or re-design them or eliminate them. The integration definition and assertions, above, form a heuristic pattern– a sort of lens– by which we can make better sense of the product and how it might fail. Different people may identify different things as parts, environments or products. That’s okay. We are free to move the lens around and try out different perspectives, too.

Example of an Integration Problem

bitmap

This diagram shows a classic integration bug: dueling dependencies. In the top two panels, two components are happy to work within their own environments. Neither is aware of the other while they work on, let’s say, separate computers.

But when they are installed together on the same machine, it may turn out that each depends on factors that exclude the other. Even though the components themselves don’t clash (the blue A box and the blue B boxes don’t overlap). Often such dependencies are poorly documented, and may be entirely unknown to the developer before integration time.

It is possible to discover this through unit testing… but so much easier and probably cheaper just to try to integrate sooner rather than later and test in that context.

 

Exploratory Testing 3.0

[Authors’ note: Others have already made the point we make here: that exploratory testing ought to be called testing. In fact, Michael said that about tests in 2009, and James wrote a blog post in 2010 that seems to say that about testers. Aaron Hodder said it quite directly in 2011, and so did Paul Gerrard. While we have long understood and taught that all testing is exploratory (here’s an example of what James told one student, last year), we have not been ready to make the rhetorical leap away from pushing the term “exploratory testing.” Even now, we are not claiming you should NOT use the term, only that it’s time to begin assuming that testing means exploratory testing, instead of assuming that it means scripted testing that also has exploration in it to some degree.]

[Second author’s note: Some people start reading this with a narrow view of what we mean by the word “script.” We are not referring to text! By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This includes text instructions, but also any form of instructions, or even biases that are not instructions.]

By James Bach and Michael Bolton

In the beginning, there was testing. No one distinguished between exploratory and scripted testing. Jerry Weinberg’s 1961 chapter about testing in his book, Computer Programming Fundamentals, depicted testing as inherently exploratory and expressed caution about formalizing it. He wrote, “It is, of course, difficult to have the machine check how well the program matches the intent of the programmer without giving a great deal of information about that intent. If we had some simple way of presenting that kind of information to the machine for checking, we might just as well have the machine do the coding. Let us not forget that complex logical operations occur through a combination of simple instructions executed by the computer and not by the computer logically deducing or inferring what is desired.”

Jerry understood the division between human work and machine work. But, then the formalizers came and confused everyone. The formalizers—starting officially in 1972 with the publication of the first testing book, Program Test Methods—focused on the forms of testing, rather than its essences. By forms, we mean words, pictures, strings of bits, data files, tables, flowcharts and other explicit forms of modeling. These are things that we can see, read, point to, move from place to place, count, store, retrieve, etc. It is tempting to look at these artifacts and say “Lo! There be testing!” But testing is not in any artifact. Testing, at the intersection of human thought processes and activities, makes use of artifacts. Artifacts of testing without the humans are like state of the art medical clinics without doctors or nurses: at best nearly useless, at worst, a danger to the innocents who try to make use of them.

We don’t blame the innovators. At that time, they were dealing with shiny new conjectures. The sky was their oyster! But formalization and mechanization soon escaped the lab. Reckless talk about “test factories” and poorly designed IEEE standards followed. Soon all “respectable” talk about testing was script-oriented. Informal testing was equated to unprofessional testing. The role of thinking, feeling, communicating humans became displaced.

James joined the fray in 1987 and tried to make sense of all this. He discovered, just by watching testing in progress, that “ad hoc” testing worked well for finding bugs and highly scripted testing did not. (Note: We don’t mean to make this discovery sound easy. It wasn’t. We do mean to say that the non-obvious truths about testing are in evidence all around us, when we put aside folklore and look carefully at how people work each day.) He began writing and speaking about his experiences. A few years into his work as a test manager, mostly while testing compilers and other developer tools, he discovered that Cem Kaner had coined a term—”exploratory testing”—to represent the opposite of scripted testing. In that original passage, just a few pages long, Cem didn’t define the term and barely described it, but he was the first to talk directly about designing tests while performing them.

Thus emerged what we, here, call ET 1.0.

(See The History of Definitions of ET for a chronological guide to our terminology.)

ET 1.0: Rebellion

Testing with and without a script are different experiences. At first, we were mostly drawn to the quality of ideas that emerged from unscripted testing. When we did ET, we found more bugs and better bugs. It just felt like better testing. We hadn’t yet discovered why this was so. Thus, the first iteration of exploratory testing (ET) as rhetoric and theory focused on escaping the straitjacket of the script and making space for that “better testing”. We were facing the attitude that “Ad hoc testing is uncontrolled and unmanageable; something you shouldn’t do.” We were pushing against that idea, and in that context ET was a special activity. So, the crusaders for ET treated it as a technique and advocated using that technique. “Put aside your scripts and look at the product! Interact with it! Find bugs!”

Most of the world still thinks of ET in this way: as a technique and a distinct activity. But we were wrong about characterizing it that way. Doing so, we now realize, marginalizes and misrepresents it. It was okay as a start, but thinking that way leads to a dead end. Many people today, even people who have written books about ET, seem to be happy with that view.

This era of ET 1.0 began to fade in 1995. At that time, there were just a handful of people in the industry actively trying to develop exploratory testing into a discipline, despite the fact that all testers unconsciously or informally pursued it, and always have. For these few people, it was not enough to leave ET in the darkness.

ET 1.5: Explication

Through the late ‘90s, a small community of testers beginning in North America (who eventually grew into the worldwide Context-Driven community, with some jumping over into the Agile testing community) was also struggling with understanding the skills and thought processes that constitute testing work in general. To do that, they pursued two major threads of investigation. One was Jerry Weinberg’s humanist approach to software engineering, combining systems thinking with family psychology. The other was Cem Kaner’s advocacy of cognitive science and Popperian critical rationalism. This work would soon cause us to refactor our notions of scripted and exploratory testing. Why? Because our understanding of the deep structures of testing itself was evolving fast.

When James joined ST Labs in 1995, he was for the first time fully engaged in developing a vision and methodology for software testing. This was when he and Cem began their fifteen-year collaboration. This was when Rapid Software Testing methodology first formed. One of the first big innovations on that path was the introduction of guideword heuristics as one practical way of joining real-time tester thinking with a comprehensive underlying model of the testing process. Lists of test techniques or documentation templates had been around for a long time, but as we developed vocabulary and cognitive models for skilled software testing in general, we started to see exploratory testing in a new light. We began to compare and contrast the important structures of scripted and exploratory testing and the relationships between them, instead of seeing them as activities that merely felt different.

In 1996, James created the first testing class called “Exploratory Testing.”  He had been exposed to design patterns thinking and had tried to incorporate that into the class. He identified testing competencies.

Note: During this period, James distinguished between exploratory and ad hoc testing—a distinction we no longer make. ET is an ad hoc process, in the dictionary sense: ad hoc means “to this; to the purpose”. He was really trying to distinguish between skilled and unskilled testing, and today we know better ways to do that. We now recognize unskilled ad hoc testing as ET, just as unskilled cooking is cooking, and unskilled dancing is dancing. The value of the label “exploratory testing” is simply that it is more descriptive of an activity that is, among other things, ad hoc.

In 1999, James was commissioned to define a formalized process of ET for Microsoft. The idea of a “formal ad hoc process” seemed paradoxical, however, and this set up a conflict which would be resolved via a series of constructive debates between James and Cem. Those debates would lead to we here will call ET 2.0.

There was also progress on making ET more friendly to project management. In 2000, inspired by the work for Microsoft, James and Jon Bach developed “Session-Based Test Management” for a group at Hewlett-Packard. In a sense this was a generalized form of the Microsoft process, with the goal of creating a higher level of accountability around informal exploratory work. SBTM was intended to help defend exploratory work from compulsive formalizers who were used to modeling testing in terms of test cases. In one sense, SBTM was quite successful in helping people to recognize that exploratory work was entirely manageable. SBTM helped to transform attitudes from “don’t do that” to “okay, blocks of ET time are things just like test cases are things.”

By 2000, most of the testing world seemed to have heard something about exploratory testing. We were beginning to make the world safe for better testing.

ET 2.0: Integration

The era of ET 2.0 has been a long one, based on a key insight: the exploratory-scripted continuum. This is a sliding bar on which testing ranges from completely exploratory to completely scripted. All testing work falls somewhere on this scale. Having recognized this, we stopped speaking of exploratory testing as a technique, but rather as an approach that applies to techniques (or as Cem likes to say, a “style” of testing).

We could think of testing that way because, unlike ten years earlier, we now had a rich idea of the skills and elements of testing. It was no longer some “creative and mystical” act that some people are born knowing how to do “intuitively”. We saw testing as involving specific structures, models, and cognitive processes other than exploring, so we felt we could separate exploring from testing in a useful way. Much of what we had called exploratory testing in the early 90’s we now began to call “freestyle exploratory testing.”

By 2006, we settled into a simple definition of ET, simultaneous learning, test design, and test execution. To help push the field forward, James and Cem convened a meeting called the Exploratory Testing Research Summit in January 2006. (The participants were James Bach, Jonathan Bach, Scott Barber, Michael Bolton, Elisabeth Hendrickson, Cem Kaner, Mike Kelly, Jonathan Kohl, James Lyndsay, and Rob Sabourin.) As we prepared for that, we made a disturbing discovery: every single participant in the summit agreed with the definition of ET, but few of us agreed on what the definition actually meant. This is a phenomenon we had no name for at the time, but is now called shallow agreement in the CDT community. To combat shallow agreement and promote better understanding of ET, some of us decided to adopt a more evocative and descriptive definition of it, proposed originally by Cem and later edited by several others: “a style of testing that emphasizes the freedom and responsibility of the individual tester to continually optimize the quality of his work by treating test design, test execution, test result interpretation, and learning as mutually supporting activities that continue in parallel throughout the course of the project.” Independently of each other, Jon Bach and Michael had suggested the “freedom and responsibility” part to that definition.

And so we had come to a specific and nuanced idea of exploration and its role in testing. Exploration can mean many things: searching a space, being creative, working without a map, doing things no one has done before, confronting complexity, acting spontaneously, etc. With the advent of the continuum concept (which James’ brother Jon actually called the “tester freedom scale”) and the discussions at the ExTRS peer conference, we realized most of those different notions of exploration are already central to testing, in general. What the adjective “exploratory” added, and how it contrasted with “scripted,” was the dimension of agency. In other words: self-directedness.

The full implications of the new definition became clear in the years that followed, and James and Michael taught and consulted in Rapid Software Testing methodology. We now recognize that by “exploratory testing”, we had been trying to refer to rich, competent testing that is self-directed. In other words, in all respects other than agency, skilled exploratory testing is not distinguishable from skilled scripted testing. Only agency matters, not documentation, nor deliberation, nor elapsed time, nor tools, nor conscious intent. You can be doing scripted testing without any scrap of paper nearby (scripted testing does not require that you follow a literal script). You can be doing scripted testing that has not been in any way pre-planned (someone else may be telling you what to do in real-time as they think of ideas). You can be doing scripted testing at a moment’s notice (someone might have just handed you a script, or you might have just developed one yourself). You can be doing scripted testing with or without tools (tools make testing different, but not necessarily more scripted). You can be doing scripted testing even unconsciously (perhaps you feel you are making free choices, but your models and habits have made an invisible prison for you). The essence of scripted testing is that the tester is not in control, but rather is being controlled by some other agent or process. This one simple, vital idea took us years to apprehend!

In those years we worked further on our notions of the special skills of exploratory testing. James and Jon Bach created the Exploratory Skills and Tactics reference sheet to bring specificity and detail to answer the question “what specifically is exploratory about exploratory testing?”

In 2007, another big slow leap was about to happen. It started small: inspired in part by a book called The Shape of Actions, James began distinguishing between processes that required human judgment and wisdom and those which did not. He called them “sapient” vs. “non-sapient.” This represented a new frontier for us: systematic study and development of tacit knowledge.

In 2009, Michael followed that up by distinguishing between testing and checking. Testing cannot be automated, but checking can be completely automated. Checking is embedded within testing. At first, James objected that, since there was already a concept of sapient testing, the distinction was unnecessary. To him, checking was simply non-sapient testing. But after a few years of applying these ideas in our consulting and training, we came to realize (as neither of us did at first) that checking and testing was a better way to think and speak than sapience and non-sapience. This is because “non-sapience” sounds like “stupid” and therefore it sounded like we were condemning checking by calling it non-sapient.

Do you notice how fine distinctions of language and thought can take years to work out? These ideas are the tools we need to sort out our practical decisions. Yet much like new drugs on the market, it can sometimes take a lot of experience to understand not only benefits, but also potentially harmful side effects of our ideas and terms. That may explain why those of us who’ve been working in the craft a long time are not always patient with colleagues or clients who shrug and tell us that “it’s just semantics.” It is our experience that semantics like these mean the difference between clear communication that motivates action and discipline, and fragile folklore that gets displaced by the next swarm of buzzwords to capture the fancy of management.

ET 3.0: Normalization

In 2011, sociologist Harry Collins began to change everything for us. It started when Michael read Tacit and Explicit Knowledge. We were quickly hooked on Harry’s clear writing and brilliant insight. He had spent many years studying scientists in action, and his ideas about the way science works fit perfectly with what we see in the testing field.

By studying the work of Harry and his colleagues, we learned how to talk about the difference between tacit and explicit knowledge, which allows us to recognize what can and cannot be encoded in a script or other artifacts. He distinguished between behaviour (the observable, describable aspects of an activity) and actions (behaviours with intention) (which had inspired James’ distinction between sapient and non-sapient testing). He untangled the differences between mimeomorphic actions (actions that we want to copy and to perform in the same way every time) and polimorphic actions (actions that we must vary in order to deal with social conditions); in doing that, he helped to identify the extents and limits of automation’s power. He wrote a book (with Trevor Pinch) about how scientific knowledge is constructed; another (with Rob Evans) about expertise; yet another about how scientists decide to evaluate a specific experimental result.

Harry’s work helped lend structure to other ideas that we had gathered along the way.

  • McLuhan’s ideas about media and tools
  • Karl Weick’s work on sensemaking
  • Venkatesh Rao’s notions of tempo which in turn pointed us towards James C. Scott’s notion of legibility
  • The realization (brought to our attention by an innocent question from a tester at Barclays Bank) that the “exploratory-scripted continuum” is actually the “formality continuum.” In other words, to formalize an activity means to make it more scripted.
  • The realization of the important difference between spontaneous and deliberative testing, which is the degree of reflection that the tester is exercising. (This is not the same as exploratory vs. scripted, which is about the degree of agency.)
  • The concept of “responsible tester” (defined as a tester who takes full, personal, responsibility for the quality of his work).
  • The advent of the vital distinction between checking and testing, which replaced need to talk about “sapience” in our rhetoric of testing.
  • The subsequent redefinition of the term “testing” within the Rapid Software Testing namespace to make these things more explicit (see below).

About That Last Bullet Point

ET 3.0 as a term is a bit paradoxical because what we are working toward, within the Rapid Software Testing methodology, is nothing less than the deprecation of the term “exploratory testing.”

Yes, we are retiring that term, after 22 years. Why?

Because we now define all testing as exploratory.  Our definition of testing is now this:

“Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes: questioning, study, modeling, observation and inference, output checking, etc.”

Where does scripted testing fit, then?  By “script” we are speaking of any control system or factor that influences your testing and lies outside of your realm of choice (even temporarily). This does not refer only to specific instructions you are given and that you must follow. Your biases script you. Your ignorance scripts you. Your organization’s culture scripts you. The choices you make and never revisit script you.

By defining testing to be exploratory, scripting becomes a guest in the house of our craft; a potentially useful but foreign element to testing, one that is interesting to talk about and apply as a tactic in specific situations. An excellent tester should not be complacent or dismissive about scripting, any more than a lumberjack can be complacent or dismissive about heavy equipment. This stuff can help you or ruin you, but no serious professional can ignore it.

Are you doing testing? Then you are already doing exploratory testing. Are you doing scripted testing? If you’re doing it responsibly, you are doing exploratory testing with scripting (and perhaps with checking).  If you’re only doing “scripted testing,” then you are just doing unmotivated checking, and we would say that you are not really testing. You are trying to behave like a machine, not a responsible tester.

ET 3.0, in a sentence, is the demotion of scripting to a technique, and the promotion of exploratory testing to, simply, testing.

History of Definitions of ET

History of the term “Exploratory Testing” as applied to software testing within the Rapid Software Testing methodology space.

For a discussion of the some of the social and philosophical issues surrounding this chronology, see Exploratory Testing 3.0.

1988 First known use of the term, defined variously as “quick tests”; “whatever comes to mind”; “guerrilla raids” – Cem Kaner, Testing Computer Software (There is explanatory text for different styles of ET in the 1988 edition of Testing Computer Software. Cem says that some of the text was actually written in 1983.)
1990 “Organic Quality Assurance”, James Bach’s first talk on agile testing filmed by Apple Computer, which discussed exploratory testing without using the words agile or exploratory.
1993 June: “Persistence of Ad Hoc Testing” talk given at ICST conference by James Bach. Beginning of James’ abortive attempt to rehabilitate the term “ad hoc.”
1995 February: First appearance of “exploratory testing” on Usenet in message by Cem Kaner.
1995 Exploratory testing means learning, planning, and testing all at the same time. – James Bach (Market Driven Software Testing class)
1996 Simultaneous exploring, planning, and testing. – James Bach (Exploratory Testing class v1.0)
1999 An interactive process of concurrent product exploration, test design, and test execution. – James Bach (Exploratory Testing class v2.0)
2001(post WHET #1) The Bach View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed and uses information gained while testing to design new and better tests.

The Kaner View

Any testing to the extent that the tester actively controls the design of the tests as those tests are performed, uses information gained while testing to design new and better tests, and where the following conditions apply:

  • The tester is not required to use or follow any particular test materials or procedures.
  • The tester is not required to produce materials or procedures that enable test re-use by another tester or management review of the details of the work done.

– Resolution between Bach and Kaner following WHET #1 and BBST class at Satisfice Tech Center.

(To account for both of views, James started speaking of the “scripted/exploratory continuum” which has greatly helped in explaining ET to factory-style testers)

2003-2006 Simultaneous learning, test design, and test execution – Bach, Kaner
2006-2015 An approach to software testing that emphasizes the personal freedom and responsibility of each tester to continually optimize the value of his work by treating learning, test design and test execution as mutually supportive activities that run in parallel throughout the project. – (Bach/Bolton edit of Kaner suggestion)
2015 Exploratory testing is now a deprecated term within Rapid Software Testing methodology. See testing, instead. (In other words, all testing is exploratory to some degree. The definition of testing in the RST space is now: Evaluating a product by learning about it through exploration and experimentation, including to some degree: questioning, study, modeling, observation, inference, etc.)

 

Test Jumpers: One Vision of Agile Testing

Many software companies, these days, are organized around a number of small Agile teams. These teams may be working on different projects or parts of the same project. I have often toured such companies with their large open plan offices; their big tables and whiteboards festooned with colorful Post-Its occasionally fluttering to the floor like leaves in a perpetual autumn display; their too many earbuds and not nearly enough conference rooms. Sound familiar, Spotify? Skype?

(This is a picture of a smoke jumper. I wish test jumpers looked this cool.)

I have a proposal for skilled Agile testing in such places: a role called a “test jumper.” The name comes from the elite “smoke jumper” type of firefighter. A test jumper is a trained and enthusiastic test lead (see my Responsible Tester post for a description of a test lead) who “jumps” into projects and from project to project: evaluating the testing, doing testing or organizing people in other roles to do testing. A test jumper can function as test team of one (what I call an omega tester ) or join a team of other testers.

The value of a role like this arises because in a typical dedicated Agile situation, everyone is expected to help with testing, and yet having staff dedicated solely to testing may be unwarranted. In practice, that means everyone remains chronically an amateur tester, untrained and unmotivated. The test jumper role could be a role held by one person, dedicated to the mastery of testing skills and tools, who is shared among many projects. This is a role that I feel close to, because it’s sort of what I already do. I am a consulting software tester who likes to get his hands dirty doing testing and running in-house testing events. I love short-term assignments and helping other testers come up to speed.

 

 

What Does a Test Jumper Do?

A test jumper basically asks, How are my projects handling the testing? How can I contribute to a project? How can I help someone test today?

Specifically a test jumper:

  • may spend weeks on one project, acting as an ordinary responsible tester.
  • may spend a few days on one project, organizing and leading testing events, coaching people, and helping to evaluate the results.
  • may spend as little as 90 minutes on one project, reviewing a test strategy and giving suggestions to a local tester or developer.
  • may attend a sprint planning meeting to assure that testing issues are discussed.
  • may design, write, or configure a tool to help perform a certain special kind of testing.
  • may coach another tester about how to create a test strategy, use a tool, or otherwise learn to be a better tester.
  • may make sense of test coverage.
  • may work with designers to foster better testability in the product.
  • may help improve relations between testers and developers, or if there are no other testers help the developers think productively about testing.

Test jumping is a time-critical role. You must learn to triage and split your time across many task threads. You must reassess project and product risk pretty much every day. I can see calling someone a test jumper who never “jumps” out of the project, but nevertheless embodies the skills and temperament needs to work in a very flexible, agile, self-managed fashion, on an intense project.

Addendum #1: Commenter Augusto Evangelisti suggests that I emphasize the point about coaching. It is already in my list, above, but I agree it deserves more prominence. In order to safely “jump” away from a project, the test jumper must constantly lean toward nudging, coaching, or even training local helpers (who are often the developers themselves, and who are not testing specialists, even though they are super-smart and experienced in other technical realms) and local responsible testers (if there are any on that project). The ideal goal is for each team to be reasonably self-sufficient, or at least for the periodic visits of the test jumper to be enough to keep them on a good track.

What Does a Test Jumper Need?

  • The ability and the enthusiasm for plunging in and doing testing right now when necessary.
  • The ability to pull himself out of a specific test task and see the big picture.
  • The ability to recruit helpers.
  • The ability to coach and train testers, and people who can help testing.
  • A wide knowledge of tools and ability to write tools as needed.
  • A good respectful relationship with developers.
  • The ability to speak up in sprint planning meetings about testing-related issues such as testability.
  • A keen understanding of testability.
  • The ability to lead ad hoc groups of people with challenging personalities during occasional test events.
  • An ability to speak in front of people and produce useful and concise documentation as necessary.
  • The ability to manage many threads of work at once.
  • The ability to evaluate and explain testing in general, as well as with respect to particular forms of testing.

A good test jumper will listen to advice from anyone, but no one needs to tell a test jumper what to do next. Test jumpers manage their own testing missions, in consultation with such clients as arise. A test jumper must be able to discover and analyze the testing context, then adapt to it or shape it as necessary. It is a role made for the Context-Driven school of testing.

Does a Test Jumper Need to be a Programmer?

Coding skills help tremendously in this role, but being a good programmer is not absolutely required. What is required is that you learn technical things very quickly and have excellent problem-solving and social skills. Oh, and you ought to live and breathe testing, of course.

How Does a Test Jumper Come to Be?

A test jumper is mostly self-created, much as good developers are. A test jumper can start as a programmer, as I did, and then fall in love with the excitement of testing (I love the hunt for bugs). A test jumper may start as a tester, learn consulting and leadership skills, but not want to be a full-time manager. Management has its consolations and triumphs, of course, but some of us like to do technical things. Test jumping may be part of extending the career path for an experienced and valuable tester.

RST Methodology: “Responsible Tester”

In Rapid Software Testing methodology, we recognize three main roles: Leader, Responsible Tester, and Helper. These roles are situational distinctions. The same person might be a helper in one situation, a leader in another, and a responsible tester in yet another.

Responsible Tester

Rapid Software Testing is a human-centered approach to testing, because testing is a performance and can only be done by humans. Therefore, testing must be traceable to people, or else it is literally and figuratively irresponsible. Hence, a responsible tester is that tester who bears personal responsibility for testing a particular thing in a particular way for a particular project. The responsible tester answers for the quality of that testing, which means the tester can explain and defend the testing, and make it better if needed. Responsible testers also solicit and supervise helpers, as needed (see below).

This contrasts with factory-style testing, which relies on tools and texts rather than people. In the Factory school of testing thought, it should not matter who does the work, since people are interchangeable. Responsibility is not a mantle on anyone’s shoulders in that world, but rather a sort of smog that one seeks to avoid breathing too much of.

Example of testing without a responsible tester: Person A writes a text called a “test case” and hands it to person B. Person B reads the text and performs the instructions in the text. This may sound okay, but what if Person B is not qualified to evaluate if he has understood and performed the test, while at the same time Person A, the designer, is not watching and so also isn’t in position to evaluate it? In such a case, it’s like a driverless car. No one is taking responsibility. No one can say if the testing is good or take action if it is not good. If a problem is revealed later, they may both rightly blame the other.

That situation is a “sin” in Rapid Testing. To be practicing RST, there must always a responsible tester for any work that the project relies upon. (Of course students and otherwise non-professional testers can work unsupervised as practice or in the hopes of finding one more bug. That’s not testing the project relies upon.)

A responsible tester is like being the driver of an automobile or the pilot-in-command of an aircraft.

Helper

A helper is someone who contributes to the testing without taking responsibility for the quality of the work AS testing. In other words, if a responsible tester asks someone to do something simple to press a button, the helper may press the button without worrying about whether that has actually helped fulfill the mission of testing. Helpers should not be confused with inexperienced or low-skilled people. Helpers may be very skilled or have little skill. A senior architect who comes in to do testing might be asked to test part of the product and find interesting bugs without being expected to explain or defend his strategy for doing that. It’s the responsible tester whose job it is to supervise people who offer help and evaluate the degree to which their work is acceptable.

Beta testing is testing that is done entirely by helpers. Without responsible testers in the mix, it is not possible to evaluate in any depth what was achieved. One good way to use beta testers is to have them organized and engaged by one or more responsible testers.

Leader

A leader is someone whose responsibility is to foster and maintain the project conditions that make good testing possible; and to train, support, and evaluate responsible testers. There are at least two kinds of leader, a test lead and a test manager. The test manager is a test lead with the additional responsibilities of hiring, firing, performance reviews, and possibly budgeting.

In any situation where a leader is responsible for testing and yet has no responsible testers on his team, the leader IS the acting responsible tester. A leader surrounded by helpers is the responsible tester for that team.

 

A Test is a Performance

Testing is a performance, not an artifact.

Artifacts may be produced before, during, or after the act of testing. Whatever they are, they are not tests. They may be test instructions, test results, or test tools. They cannot be tests.

Note: I am speaking a) authoritatively about how we use terms in Rapid Testing Methodology, b) non-authoritatively of my best knowledge of how testing is thought of more broadly within the Context-Driven school, and c) of my belief about how anyone, anywhere should think of testing if they want a clean and powerful way to talk about it.

I may informally say “I created a test.” What I mean by that is that I designed an experience, or I made a plan for a testing event. That plan itself is not the test, anymore than a picture of a car is a car. Therefore, strictly speaking, the only way to create a test is to perform a test. As Michael Bolton likes to say, there’s a world of difference between sheet music and a musical performance, even though we might commonly refer to either one as “music.” Consider these sentences: “The music at the symphony last night was amazing.” vs. “Oh no, I left the music on my desk at home.”

We don’t always have to speak strictly, but we should know how and know why we might want to.

Why can’t a test be an artifact?

Because artifacts don’t think or learn in the full human sense of that word, that’s why, and thinking is central to the test process. So to claim that an artifact is a test is like wearing a sock puppet on your hand and claiming that it’s a little creature talking to you. That would be no more than you talking to yourself, obviously, and if you removed yourself from that equation the puppet wouldn’t be a little creature, would it? It would be a decorated sock lying on the floor. The testing value of an artifact can be delivered only in concert with an appropriately skilled and motivated tester.

With procedures or code you can create a check. See here for a detailed look at the difference between checking and testing. Checking is part of testing, of course. Anyone who runs checks that fail knows that the next step is figuring out what the failures mean. A tester must also evaluate whether the checks are working properly and whether there are enough of them, or too many, or the wrong kind. All of that is part of the performance of testing.

When a “check engine” light goes on in your car, or any strange alert, you can’t know until you go to a mechanic whether that represents a big problem or a little problem. The check is not testing. The testing is more than the check itself.

But I’ve seen people follow test scripts and only do what the test document tells them to do!

Have you really witnessed that? I think the most you could possibly have witnessed is…

EITHER:

a tester who appeared to do “only” what the test document tells him, while constantly and perhaps unconsciously adjusting and reacting to what’s happening with the system under test. (Such a tester may find bugs, but does so by contributing interpretation, judgment, and analysis; by performing.)

OR:

a tester who necessarily missed a lot of bugs that he could have found, either because the test instructions were far too complex, or far too vague, or there was far too little of it (because that documentation is darn expensive) and the tester failed to perform as a tester to compensate.

In either case, the explicitly written or coded “test” artifact can only be an inanimate sock, or a sock puppet animated by the tester. You can choose to suffer without a tester, or to cover up the presence of the tester. Reality will assert itself either way.

What danger could there be in speaking informally about writing “tests?”

It’s not necessarily dangerous to speak informally. However, a possible danger is that non-testing managers and clients of our work will think of testers as “test case writers” instead of as people who perform the skilled process of testing. This may cause them to treat testers as fungible commodities producing “tests” that are comprised solely of explicit rules. Such a theory of testing– which is what we call the Factory school of testing thought– leads to expensive artifacts that uncover few bugs. Their value is mainly in that they look impressive to ignorant people.

If you are talking to people who fully understand that testing is a performance, it is fine to speak informally. Just be on your guard when you hear people say “Where are your tests?” “Have you written any tests?” or “Should you automate those tests?” (I would rather hear “How do you test this?” “Where are you focusing you testing?” or “Are you using tools to help your testing?”)

Thanks to Michael Bolton and Aleksander Simic for reviewing and improving this post.

 

Rapid Software Testing at Barclays

I’m excited to be working with Barclays on an unprecedented project: creating a professional testing culture based on the Context-Driven principles and my Rapid Software Testing (RST) methodology. The Barclays Global Test Centre (GTC), led by Keith Klain, has hundreds of testers spread around the world. They work in a regulated industry on high stakes products. But unlike nearly every other large organization in the world, they have decided not to rely on pretense and 40 year-old ideas that were discredited 30 years ago. They are instead putting in place a system to recruit and grow highly skilled and highly motivated testers.

Barclays’ approach in the GTC is to identify and encourage dozens of testing champions in its ranks who are the role models and mentors for the rest of the group. Anyone may aspire to be in this special group, but to be recognized requires that the candidate tester demonstrate vigorous self-education and critical analysis. Some of the testers in the group began as strong skeptics of Rapid Testing. But the methodology is designed for skeptics– it is based on skill development and heuristics rather than pushing “best practices.” In Rapid Testing, the skilled tester is always in charge, not pieces of paper or officious charts.

RST requires each tester to employ his own judgment and technical analysis, much like what airlines expect of pilots, or hospitals expect of doctors. That can’t work on a large scale without a strong corporate commitment to training and personal ethics. Management must drive out fear, so that testers are willing to take the sort of risks that come from making their own decisions about test strategy. But the onus is on the testers to earn personal credibility within an internal community that can effectively police itself. Any tester, at any time, is expected to stand up and explain and defend his work.

I’m aware of only two large companies in the world that have made a commitment to this kind of professionalism, which is an altogether different sort of professionalism than the ceremonial certification variety that is promoted by most organizations. In Barclays’ case, this commitment has strong support from top management, and I have personally witnessed, in my weeks of working with them, that the testers at their Singapore operation have fire in their eyes. There are testers here who deserve to have an international reputation.

This is What We Do

In the Context-Driven Testing community, the testing craft is a living, growing thing. This dialog, led by my partner in Rapid Testing, Michael Bolton, is a prime example of the life among us. Read the PDF that Michael refers to, and what will you see? You see many ideas proposed and discarded. You see definitions being made, and remade. You see people struggling to make sense of subtle, yet important distinctions.

In my world, the development of testing skill goes hand-in-hand with the development of our rhetoric of describing testing. The development of personal skill is linked to the development of social skill. This is why we smirk and roll our eyes when people come to us looking for templates and other pre-fabricated answers to what they believe are simple questions. My reaction to many who come to me is “You don’t need to learn the definition of term ‘test case’. You don’t need me to tell you ‘how to create a test plan’. What you need is to learn how to test. You need to struggle with imponderables; sit with them; turn them over in your mind. You need practice, and you need to talk through your practice with other testers.”
Michael’s dialog reminds me of the book Proofs and Refutations, by Imre Lakatos, which studies the exploratory and dialectical nature of mathematics by also using dialog.

Introducing Thread-Based Test Management

Most of the testing world is managed around artifacts: test cases, test documents, bug reports. If you look at any “test management” tool, you’ll see that the artifact-based approach permeates it. “Test” for many people is a noun.

For me test is a verb. Testing is something that I do, not so much something that I create. Testing is the act of exploration of an unknown territory. It is casting questions, like Molotov cocktails, into the darkness, where they splatter and burst into bright revealing fire.

How to Manage Such a Process?

My brother Jon and I created a way to control highly exploratory testing 10 years ago, called session-based test management (SBTM). I recently returned from an intense testing project in Israel, where I used SBTM. But I also experimented with a new idea: thread-based test management (TTM).

Like many of my new ideas, it’s not really new. It’s the christening (with words) and sharpening (with analysis) of something many of us already do. The idea is this: organize management around threads of activity rather than test sessions or artifacts.

Thread-based testing is a generalized form of session-based testing, in that sessions are a form of thread, but a thread is not necessarily a session. In SBTM, you test in uninterrupted blocks of time that each have a charter. A charter is a mission for that session; a light sort of commitment, or contract. A thread, on the other hand, may be interrupted, it may go on and on indefinitely, and does not imply a commitment. Session-based testing can be seen as an extension of thread-based testing for especially high accountability and more orderly situations.

I define a thread as a set of one or more activities intended to solve a problem or achieve an objective. You could think of a thread as a very small project within a project.

Why Thread-Based Test Management?

Because it can work under even the most chaotic and difficult conditions. The only formalism required for TBTM is a list of threads. I use this form of test management when I am dropped into a project with as little a day or two to get it done.

What Does Thread-Based Test Management Looks Like?

It’s simple. Thread-based test management looks like a todo list, except that we organize the todo items into an outline that matches the structure of the testing process. Here’s a mocked-up example:

Test Facilities

  • Power meter calibration method
  • Backup test jig validation
  • Create standard test images

Test Strategy

  • Accuracy Testing
    • Sampling strategy
    • Preliminary-testing
    • Log file analysis program
  • Transaction Flow Testing
  • Essential Performance Testing
  • Safety Testing
    • warnings and errors FRS review
    • tool for forcing errors
  • Compliance Testing
  • Test Protocol V1.0 doc.

Test Management

  • Change protocol definition
  • Build protocol definition
  • Test cycle protocol definition
  • Bug reporting protocol definition
  • Bug triage
  • Fix verifications

This outline describes the high level threads that comprise the test project. I typically use a mind-mapping program like MindManager to organize and present them.

So, you should be thinking, “Is that it? Todo lists?” right about now. Well, no. That’s not it. But that’s one face of it.

What Else Does Thread-Based Test Management Look Like?

It looks like testers gathered around a todo list, talking about what they are going to work on that afternoon. Then they split up and go to work. Several times day they might come together like that. If the team is not co-located, then this meeting is done over instant messaging, email, or perhaps through a wiki.

Is That All it Looks Like?

Well, there is also the status report. Whether written or spoken, the thread-based test management version of a status report lists the threads, who is working on the threads, and the outlook for each thread. It typically also includes an issues list.

Other documentation may be produced, of course. TBTM doesn’t tell you what documents to create. It simply tells you that threads are the organizing principle we use for managing the project.

Where Do Threads Come From?

Threads are first spawned from our model of the testing problem. The Satisfice Heuristic Test Strategy Model is an example of such a model. By working through those lists, we get an idea of the kinds of testing we might want to do: those are the first of the threads. After that, threads might be created in many ways, including splitting off of existing threads as we gain a deeper understanding of what testing needs to be done. Of course, in an Agile environment, each user story kicks off a new testing thread.

Which Threads Do We Work On?

Think priority and progress. We might frequently drop threads, switch threads, and pick them up again. In general, we work on the highest priority threads, but we also work on lower priority threads many times, when we see the possibility for quick and inexpensive progress. If I’m trying to finish a sanity check on the new build, I might interrupt that to discuss the status of a particular known bug if the developer happens to wander by.

Major ongoing threads often become attached to specific people. For instance “client testing” or “performance testing” often become full-time jobs. Testing itself, after all, can be thought of as a thread so challenging to do well, and so different from programming, that most companies have seen fit to hire dedicated testers.

How Do Threads End?

A thread ends either in a cut or knot. Cutting a thread means to cancel that task. A knot, however, is a milestone; an achievement of some kind. This is exactly the meaning of the phrase “tying up the loose ends” and marks either the end of the thread (or group of threads) or a good place to drop it for a while.

How Do We Estimate Work?

In thread-based test management, there is no special provision or method for estimating work, except that this is done on a thread-by-thread basis. Session-based test management may be overlaid onto TBTM in order to estimate work in terms of sessions.

How Do We Evaluate Progress?

In thread-based test management, there is no special provision or method for evaluating progress, either, except that this is done on a thread-by-thread basis, and status reports may be provided frequently, perhaps at the end of each day. Session-based test management is also helpful for that.

So What?

This form of management is actually quite common. But, to my knowledge, no one has yet named and codified it. Without a convenient way to talk about it, we have a hard time explaining and justifying it. Then when the “process improvement” freaks come along, they act like there’s no management happening at all. This form of management has been “illegible” up to now (meaning that it’s there but no one notices it) and my brother and I are going to push to make it fully legible and respectable in the testing arena.

From now on, when asked about my approach to test management, I can say “I practice Rapid Testing methodology, which I track in either a thread-based or session-based manner, depending on the stage of the project, and in a risk-based manner at all times.”

How is TBTM Any Different From Using a TO-DO List?

Michel Kraaij questions the substance of TBTM by wondering how it’s different from the age-old idea of a “to-do” list? See his post here.

This is a good question. Yes, TBTM is different than just using a to-do list, but even so, I don’t think I’ve ever read an article about to-do list based test management (TDBTM?). Most textbooks focus on artifacts, not the activity of testing. Thread-based test management is trying to capture the essence of managing with to-do lists, plus some other things in addition to that.

The main additional element, beyond just making a to-do list, is that a traditional to-do list contains items that can be “done”, whereas many threads might not ever be “done.” They might be cut (abandoned) or knotted (temporarily parked at some level of completion). Some threads maybe tied up with a bow and “done” like a normal task, but not the main ones that I’m thinking of. As I practice testing, for instance, I’m rarely “done” with test strategy. I tinker with the test strategy all the way through the project. That’s why it makes sense to call it a thread.

Once again: Thread-based management is not focused on getting things “done.” In this way it is different from KanBan, Scrum, ToDo lists, Session-based test management,  etc., all of which are big into workflow management of definite units of work.

Another thing to recognize is that the main concern of TBTM is how to know what to put on your thread list. The answer to that invokes the entire framework of Rapid Software testing. So, yeah, it’s more than having an outline of threads, which does look very much like a to-do list– it’s the activity (and skills) of making the list and managing it. If you want to talk about to-do list based test management, then you would have to invent that lore as well. You couldn’t just say “make a to-do list” and claim to have communicated the methodology.

[You can find Jonathan’s take on TBTM here.]

[I credit Sagi Krupetski, the test lead on my recent project, for helping me get this idea. His clockwork status reporting and regular morning question “Where are we on the project and what do you think you need to work on today?” caused me to see the thread structure of our project clearly for the first time. He’s back on the market now (Chicago area), in case you need a great tester or test manager.]

Sapience and Blowing Peoples’ Minds

I told a rival that I don’t use the term “manual testing” and that I prefer the term “sapient testing” because it’s more to the point. This is evident in the first definition of the word “manual” in the Oxford English Dictionary: 1. a. Of work, an action, a skill, etc.: of or relating to the hand or hands; done or performed with the hands; involving physical rather than mental exertion. Freq. in manual labour. Sapient, on the other hand, means “wise.”

He laughed and said “Bach, you are always making words up.” And then told me that in his opinion manual testing did not evoke the concept of unskilled manual labor. Now, other than establishing that the guy doesn’t have an online account to the O.E.D. (definition of “sweeeeet!” is “an online O.E.D. account.”), or perhaps doesn’t consider dictionaries to be useful sources of information about what words mean, I see something else in his reaction: I blew his mind. What I said doesn’t intersect with anything in his education.

To understand me, the man will have to use his sapience, rather than responding manually (i.e. with his hands).

In other words, I notice that some of my rivals in the testing industry don’t merely disagree with me, they apparently don’t comprehend what I’m saying. Example: after some ten hours of solid debate with me, over several sittings, Stuart Reid (who is working on a software testing standard of all preposterous things), told a colleague of mine that he believed I don’t truly mean the things I said in those debates, but merely said them to “be provocative.” Huh. That’s some serious cognitive dissonance you got going, Stu, when the only way you can process what I’m saying is to declare, essentially, that it was all just a dream.

Of course, I don’t think this is an intelligence problem. I think this is a lack-of-effort-to-use-intelligence problem. It’s not convenient for certain consultants to tear up their out-of-date books and classes and respond to the challenge of, um, the last 30 years of development of the craft. So they continue to teach and preach ideas from the seventies (or create testing standards based on them, because they believe not enough people appreciate testing disco).

Anyway, in the Context-Driven community’s latest attempt to explain the ins and outs and vital subtleties of testing, Michael Bolton has come up with a promising tack. Maybe this will help. He’s drawing a distinction between testing and checking.

Brace yourselves for insight. A lot of what people call testing is actually mere checking. But even checking requires testing intelligence to design and do well. This gives more specifics to my concept of sapient testing. Here are Michael’s seminal posts on the subject:

  1. http://www.developsense.com/2009/08/testing-vs-checking.html
  2. http://www.developsense.com/2009/09/transpection-and-three-elements-of.html
  3. http://www.developsense.com/2009/09/pass-vs-fail-vs-is-there-problem-here.html
  4. http://www.developsense.com/2009/09/elements-of-testing-and-checking.html

When Michael first made the distinction between testing and checking, I was annoyed. Truly. It blew my mind in that bad way. I thought he was manufacturing a distinction that we didn’t need. I decided to ignore it. Then he called me and asked “So what do you think of my checking vs. testing article?” I had to say I didn’t like it at all. We argued…

…and he convinced me that it was a good idea. Thank you dialectical learning! Debate has saved me again!

I now agree that it’s a practical distinction that can be used as a lens to focus on the quality of a test process. I do have to get used to the words, though. I now see a difference between automated testing and automated checking, for instance: automated testing means testing supported by tools; automated checking means specific operations, observations, and verifications that are carried out entirely with tools. Automated testing may INCLUDE automated checking as one component, however automated checking does NOT include testing.

Making this distinction is exactly like distinguishing between a programmer and a compiler. We do not speak of a compiler “writing a program” in assembly language when it compiles C++ code. We do not think that we can fire the programmers because the compiler provides “automated programming.” The same thing goes for testing. Or… does that blow your mind?

Michelle Smith: True Test Leadership

I’m delighted to read Michelle Smith’s play-by-play description of how she is coaching new testers. Take a look.

Let me catalog the coolnesses:

1. “The team I work with was previously exposed to Rapid Software Testing. This exposure caused me to wonder what would happen if these new folks were exposed to some of these ideas early on?”

Notice that she uses the word “wonder”? That’s the attitude I hope to foster in people who take my class. It’s an attitude of curiosity and personal responsibility. She doesn’t speak of applying practices as if the people on the team were milling machines waiting to be programmed. She implies that her testers are learners under their own control. Her attitude is one of establishing a productive but not coercive  relationship.

I don’t know if she got this from my class– probably she had it beforehand– but it’s an attitude I share with her.

2. “I went in their shared office and opened up a five minute conversation with them by asking “What is a bug?” and following that with “who are the people that matter?”.”

Michelle mentions “five minute conversations” a few times. And notice how most of her interactions were in the form of puzzles and questions. It speaks of a light touch with coaching. Light touch is good. Especially with novice testers, experience speaks louder than lecture. Introduce an idea then try it, or try something first, then talk about the idea. Either way, I like how she was getting them working.

3. She had them practice explaining themselves, both in writing and by voice.

4. She was concerned more with the cognitive parts of testing than with the artifacts. That’s good because excellent testing artifacts, such as bug reports and test cases, come from the thinking of the testers. Think well and the rest will follow.

5. She has them work on several aspects of testing. Notice how she deals with oracles, tools, mission, the social process and gaining product knowledge.

I bet what Michelle is doing will lead to better, more passionate testers, and more dynamic, flexible testing. Compare this to what we see so often in our industry: testers simply told to sit down and create test cases. Look, even if you think pre-defined test cases make for great testing, I think to be successful with that you have to base it on skilled and knowledgeable testers. Michelle is creating that foundation with her team.

Overall, what I’m most happy with is that Michelle has made Rapid Testing her own thing. This is vital. This is fundamental to the spirit of my teaching. I want to grow colleagues who confidently think for themselves. Hats off to you, Michelle, for doing that and blogging about it.

Finally, Michelle writes, “I have no idea if what I am doing is going to produce any benefits to them, to the team, or to the stakeholders. Time will tell.”

No best practices nonsense, here. No certification mentality. Just healthy skepticism. Thank you, Michelle!

A Question About Test Strategy

Maria writes:

A) Your presentation Test Strategy: “What is it? What does it look like?” applies to creating a test strategy for a specific application (I’ve also read Ingrid B. Ottevanger’s article on “A Risk-Based Test Strategy”). How can I apply the idea to an overall test strategy for the company that I’m working for? Is it possible to create a really good test strategy for a company so that it covers several diverse applications? Having difficulties in finding a way to create a non-poorly stated strategy from which we can create an efficient test process it leaves me with another question: “How do we make a clear line between the overall test strategy and the company test process?”

B) The precondition for this activity is unfortunately not the best leaving us with a tight time schedule and very little time to do a thorough work. My concern is that neither the strategy nor the test process will actually be something possible to use, leaving us with as you say “A string of test technique buzzwords.” So how can I argue that the test strategy and the test process are not just two documents that we have to create but it’s the thoughts behind the documents that are important.

Test strategy is an important yet little-described aspect of test methodology. Let me introduce three definitions:

Test Plan: the set of ideas that guide a test project

Test Strategy: the set of ideas that guide test design

Test Logistics: the set of ideas that guide the application of resources to fulfill a test strategy

I find these ideas to be a useful jumping off point. Here are some implications:

  • The test plan is the sum of test strategy and test logistics.
  • The test plan document does not necessarily contain a test plan. This is because many test plan documents are created by people who are following templates without understanding them, or writing things to please their bosses, without knowing how to fulfill their promises, or simply because it once was a genuine test plan but now is obsolete.
  • Conversely, a genuine test plan is not necessarily documented. This is because new ideas may occur to you each day that change how you test. In my career, I have mostly operated without written test plans.

One quick way to think about test strategy is to realize that testing is (usually) a process of constructing an explanation of the status of the product. Therefore, the ideas that should guide our testing are those that relate to the marshalling of evidence for that explanation.

Here’s an example: Let’s say I need to test Inkscape, an open source painting program. The people I work for want this product to be a viable alternative to Adobe Photoshop.

This leads to an overarching question for the testing: “Given the general capabilities of Inkscape, is the program sufficiently reliable and are its capabilities well enough deployed that a serious user would consider that Inkscape is a viable alternative to Photoshop?” This is a question about the status of Inkscape. Answering it is not merely a processing of determining yes or no, because, as a tester, I must supply an explanation that justifies my answer.

Working backwards, I would have to do something like the following:

  1. Catalog the capabilities of Inkscape and match them to Photoshop.
  2. Determine the major objectives users might have in using a paint program, as well as various kinds of users.
  3. Learn about the product. Get a feel for it in terms of its structures, functions, data, and platforms.
  4. List the general kinds of problems that might occur in the product, based on my knowledge of the technology and the users.
  5. Decide which parts of the product are more likely to fail and/or are more important. Focus more attention on those areas.
  6. Determine what kinds of operations I need to do and which systematic observations I need to make in order to detect problems in the product (area by area and capability by capability) and compare it to Photoshop. (Here’s where I would also apply a variety of test techniques.)
  7. Carry out those test activities and repeat as necessary.
  8. Consider testability and automation as I go.

In doing these things, I would be gathering the evidence I need to argue for the specific ways in which Inkscape does or does not stand up to Photoshop.

Company-wide Test Strategy

In my way of thinking, a good test strategy is product specific. You can have a generic test strategy, but since you don’t test generic products, but only specific products, it will become better when it is made specific to what you are testing at the moment.

Perhaps what you are talking about is a strategy that relates to what you need for a test lab infrastructure, or for developing the appropriate product-specific test skills? Or perhaps you are thinking of creating materials to aid test leads in producing specific test strategies?

If so, one thing to consider is a risk catalog (aka bug taxonomy). A risk catalog is an outline of the kinds of problems that typically occur in products that use a particular technology. You can create one of these based on your experience testing a product, then reuse it for any other similar product.

Company Test Process

I suggest using the term methodology instead of process. “Process” is the way things happen. “Methodology” is a system of methods. You use a methodology; but you participate in a process. When you have an idea and say, “I think I’ll try doing that” the idea itself is probably a method. When you do it, it becomes a practice (a practice is something that you actually do), and therefore it influences the process.

I use these words carefully because process is a very rich and complex reality that I do not want to oversimplify. For instance, it may be a part of your methodology to create a test plan, but at the same time it may be a genuine part of your process that your test plan document is ignored. “Ignore the test plan document” is not going to be written down in anyone’s methodology, yet it can be an important part of the process, especially if the test plan document is full of bad ideas.

The dividing line between test strategy and test methodology is not hard to find, I think. A test strategy is product specific, and a test methodology is not. Another important element you haven’t mentioned is test skill. Your methodology is useless without skilled testers to apply it.

I would suggest that a more important dividing line for you to consider is the line between skill and method. How much do you rely on skilled people to select the right thing to do next, and how much are you trying to program them with methodology? Many process people get this all mixed up. They treat testers as if they are children, or idiots, trying to dictate solutions to problems instead of letting the testers solve the problems for themselves. What the process people then get is either bad testing, or, hopefully, their methodology is ignored and the testers do a good job anyway.

When I develop a test methodology, as I have done for a number of companies, I focus on training the testers and on creating a systematic training and coaching mechanism. That way the methodology documentation is much thinner and less expensive to maintain.

Some Useful Definitions

I use the following. I find these definitions to be flexible, inclusive, and consistent with the dictionary:

Technique: method.

Method: a way of doing something; an idea or ideas that specify behavior.

Methodology: a system of methods.

Approach: a way of enacting a method; a characteristic pattern that modifies method. E.g. “the stress testing technique may be performed using either a scripted or exploratory approach”

Practice: what somebody actually does; a way of doing something that someone actually uses. (usage note: a method is a practice only in the context wherein someone uses that method)

Process: how something happens; a causally-related chain of events. (usage note: a practice or method may describe or affect a process, but process encompasses the totality of events, not just the parts that people might do or think of themselves as doing)

Test technique: test method; a heuristic or algorithm for designing and/or executing a test; a recipe for a test.

Test strategy: the set of ideas (i.e. methods and objectives) that guide test design and execution.

Test logistics: the set of ideas that guide the application of resources to fulfilling the test strategy.

Test plan: the set of ideas that guide a test project; the totality of test strategy and test logistics. (usage note: A test plan document does not necessarily contain a test plan, and a test plan may not necessarily be expressed in written form. Beware confusing a genuine test plan with a document that merely has “test plan” as its title.)

Testing: questioning a product in order to evaluate it (Bach version); technical investigation of a product, on behalf of stakeholders, with the objective of exposing quality-related information of the kind they seek (Kaner version).

Test Idea: an idea for testing something.

Test: a particular instance or instances of questioning a product in order to evaluate it; or a document, artifact, or idea that represents such a thing.

Test case: see Test