What Exploratory Practitioners Are Called

An exploratory DOCTOR is known as… a “doctor.”
An exploratory WELDER is known as… a “welder.”
An exploratory PILOT is known as… a “pilot.”
An exploratory WRITER is known as… a “writer.”
An exploratory SCIENTIST is known as… a “scientist.”
An exploratory TRUCK DRIVER is known as… a “truck driver.”

A non-exploratory doctor is known as… “irresponsible.”
A non-exploratory welder is known as… “irresponsible.”
A non-exploratory pilot is known as… “killed in a plane crash.”
A non-exploratory writer known as… “a plagiarist.”
A non-exploratory scientist is known as… “a tobacco company scientist.” (also “Creationist”)
A non-exploratory truck driver is known as… “lost.”

Can you spot the pattern here?

There is no such thing as an “exploratory tester” except inasmuch as a good tester obviously can and will do exploration as a basic part of his work.

Three New Testing Heuristics

A lot of what I do is give names to testing behaviors and patterns that have been around a long time but that people are not systematically studying or using. I’m not seeking to create a standard language, but simply by applying some kind of terminology, I want to make these patterns easier to apply and to study.

This is a quick note about three testing heuristics I named this week:

Steeplechase Heuristic (of exploratory boundary testing)

When you are exploring boundaries, think of your data as having to get to the boundary and then having to go other places down the line. Picture it as one big obstacle course with the boundary you are testing right in the middle.

Then consider that very large, long, extreme data that the boundary is designed to stop might founder on some obstacle before it ever gets to the boundary you want to test. In other words, a limit of 1,000 characters on a field might work fine unless you paste 1,000,000 characters in, in which case it may crash the program instantly before the boundary check ever gets a chance to reject the data.

But also look downstream, and consider that extreme data which barely gets by your boundary may get mangled on another boundary down the road. So don’t just stop testing when you see one boundary is handled properly. Take that data all around to the other functions that process it.

Galumphing (style of test execution)

Galumphing means doing something in a deliberately over-elaborate way. I’ve been doing this for a long time in my test execution. I add lots of unnecessary but inert actions that are inexpensive and shouldn’t (in theory) affect the test outcome. The idea is that sometimes– surprise!– they do affect it, and I get a free bug out of it.

An example is how I frequently click on background areas of windows while moving my mouse pointer to the button I intend to push. Clicking on blank space shouldn’t matter, right? Doesn’t hurt, right?

I actually learned the term from the book “Free Play” by Stephen Nachmanovitch, who pointed out that it is justified by the Law of Requisite Variety. But I didn’t connect it with my test execution practice until jogged by a student in my recent Sydney testing class, Ted Morris Dawson.

Creep & Leap (for pattern investigation)

If you think you understand the pattern of how a function works, try performing some tests that just barely violate that pattern (expecting an error or some different behavior), and try some tests that boldly take that behavior to an extreme without violating it. The former I call creeping; the latter is leaping.

The point here is that we are likely to learn a little more from a mildly violating test than from a hugely violating test because the mildly violating test is much more likely to surprise us, and the surprise will be easier to sort out.

Meanwhile, stretching legal input and expectations as far as they can reasonably go also can teach us a lot.

Creep & Leap is useful for investigating boundaries, of course, but works in situations without classic boundaries, too, such as when we creep by trying a different type of data in a function that is supposed to be rejected.

New Version of the ET Dynamics Lists

Michael Bolton, my brother Jon, and I have produced a new version of our Exploratory Testing Dynamics document. We unveiled it last week at STPCON.

This document describes the elements of exploratory testing as we currently understand them. This new version has not yet been reviewed by Cem Kaner or any of our colleagues who normally have opinions on this, but I don’t anticipate that it will be revised much when they do. It should be fairly stable, for now.

ET cannot be summed up fairly as “unstructured playing around” and if you look at this document you will see why. If nothing else, count the elements in it. Crikey.

The document consists of four lists:

1. Evolving Work Products

Exploratory testing is an approach that emphasizes evolution. We start with something and make it better and better. Often testers can be overly focused on finding bugs, but look at all the things that get better. For instance, you can run your automated regression tests a thousand times and the tests don’t learn and neither do the button pushing testers. But you learn a lot by sapiently retesting– you evolve. One of the products of ET is a better tester.

Some evolution also happens in highly scripted testing, but in ET it’s a primary goal. It’s the point.

2. Skills and Tactics

What does an exploratory tester do? Well, feast your eyes on the Skills and Tactics list. Each of those items are things that seem to us to be reasonably independent, teachable, and observable. Some of them are also skills that a highly scripted tester would need, but they work a little differently for explorers. For instance, the skills of reading and analyzing documents means doing that to prepare for or aid exploration of the product, rather than to prepare detailed instructions for scripted test execution.

3. ET Polarities

The Polarities list is unique to ET. An exploratory tester must manage his own mental process, and that process is about developing ideas and using those ideas to search the product for trouble. We’ve found over years of experience doing this that a tester loses steam if he spends too long in one train or type of thought. So, what can he do?

One answer is alternation. We alternate among complementary activities. Two activities are complementary if performing one of them rejuvenates the other. A great example is reading about a product and directly interacting with it.

4. Test Strategy

The Heuristic Test Strategy model is a set of guideword heuristics that we publish as part of the Rapid Software Testing class. In addition to the general ideas of exploration, evolution, and alternation, the guidewords represent the specific subject matter of testing– the conditions we need to succeed, the things we look at, the things we look for, and the specific test techniques we use to produce tests.

With these lists, I can systematically evaluate a tester and identify strengths and weaknesses. We in the Context-Driven School of testing need models like this, because we focus on skills, rather than techniques and tools. This document allows us to focus our efforts and develop specific exercises for tester training.

Skaters Redux

An open letter to James Whittaker:

You wrote: “I had an amicable hallway conversation with James Bach. His blogger angst at my use of the title ‘Exploratory Testing’ didn’t spill over to a face-to-face discussion. Frankly, I am not surprised. I’ve never claimed the term as my own, I simply took it and made it work in the hands of real testers on real software under real ship pressure. Consultants can coin all the terms they want, but when us practitioners add meat to their pie, why cry foul? Is it not a better reaction to feel happy that there are people actually doing something with the idea?”

None of that is true.

I would not describe our conversation as amicable. Perhaps you thought it was amicable because we didn’t talk about anything important, and during that moment, I didn’t raise my voice. Or punch you.

My criticism of you is not “blogger angst”, it’s my opinion based on studying for 20 years something you’ve hardly studied at all. Every substantive conversation we’ve had has consisted of you denying whatever I happen to say, without offering evidence and in most cases without offering an argument. You have a zeal for dismissing my work that is truly extraordinary– you once even denied, again without evidence, that I knew how to run a file compare tool. Wow.

Now you say you made ET work? Well, first, you don’t know what ET is. Second, you’re an academic. You stayed in school and studied formal methods (that no one uses) while I was cutting my teeth in Silicon Valley. I have taught and demonstrated ET all over the world. I’m not alone, but work with a community of like-minded testers and thinkers, comparing notes with them, and deepening our understanding of exploratory learning applied to testing. You have not been a part of that.

I don’t think you’re adding meat, I think you’re serving thin gruel.

ET does work. My community repeatedly shows that it does. We will patiently continue to teach and develop it.

Subtlety hasn’t worked with you. So I’m saying this publicly: You’re a good speaker, but as a practitioner, if your prose is any indication, you don’t know much about what you are doing. If you applied yourself, you could become a good tester. But I’ve seen no evidence of that, yet.

Exploratory Testing Skaters

When Cem Kaner introduced the term “exploratory testing” in the mid-80’s, everyone ignored it. When I picked up the term and ran with it, I was mostly ignored. But slowly, it spread through the little community that would become the Context-Driven School. I began talking about it in 1990, and created the first ET class in 1996. It wasn’t until 1999 that Cem and I looked around and noticed that people who were not part of our school had begun to speak and write about it, too.

When we looked at what some of those people were saying, yikes! There was a lot of misunderstanding out there. So, we just kept plugging along and running our peer conferences and hoping that the good would outweigh the bad. I still think that will happen in the long run.

But sometimes it’s hard to stomach how the idea gets twisted. Case in point: James Whittaker, an academic who has not been part of the ET leadership group, and also has little or no experience on an industrial software project as a tester or test manager, has published a book called Exploratory Software Testing.

Whatever Whittaker means when he talks about exploratory testing is NOT what those of us mean who’ve been working on nurturing and developing ET for the last 20 years. As far as I can tell, he has not made more than a shallow study of it. I will probably not write a detailed review (though his publisher asked me to look at it before it was published), because I get too angry when I talk about it, and I would rather not be angry. But Adam Goucher has published his review here.

Another guy who shows up at the conferences, B.J. Rollison, also gets ET wrong. He’s done what he calls “empirical research” into ET, at Microsoft. Since he, again, has not engaged the community that first developed the concept and practices of ET, it’s not altogether surprising that his “research” is based on a poor understanding of ET (for instance, he insists that it’s a technique, not an approach. This is similar to confusing the institution of democracy with the mechanics of voting), and apparently were carried out with untrained subjects, since Rollison himself is not trained in what I recognize as exploratory testing skills.

Experimental research into ET can be done, but of course any such work is in the realm of social science, not computer science, because ET is a social and psychological phenomenon. (see the book Exploring Science, for an example of what such research looks like).

Now even within the group of us who’ve been sharing notes, debating, and discovering the roots of professional exploratory thinking in the fields of epistemology and cognitive psychology and the philosophy and study of scientific practice, there are strong differences of opinion. There are people I disagree with (or who just dislike me) whom I still recognize as thoughtful leaders in the realm of exploratory testing (James Lyndsay and Elisabeth Hendrickson are two examples). Perhaps Whittaker and Rollison will become rivals who make interesting discoveries and contributions, at some point. Time will tell. Right now, in my opinion, they are just skating on the surface of this subject.

The IMVU Shuffle

Michael Bolton reported on our quick test of IMVU, whose development team brags about having no human-mediated test process before deploying their software to the field.

Some commentors have pointed out that the bugs we found in our twenty minute review weren’t serious– or couldn’t have been– because the IMVU  developers feel successful in what they have produced, and apparently, there are satisfied users of the service.

Hearing that, I’m reminded of the Silver Bridge, which fell down suddenly, one day, after forty years of standing up. Up to that day, it must have seemed quite reasonable to claim that the bridge was a high quality bridge, because– look!– it’s still standing! But lack of information is not proof of excellence, it turns out. That’s why we test. Testing doesn’t provide all possible information, but it provides some. Good testing will provide lots of useful information.

I don’t know if the IMVU system is good enough. I do know that IMVU has no basis to claim that their “continuous integration” process, with all their “automated test cases” has anything to do with their success. By exactly the same “not dead yet” argument, they could justify not running any test cases at all. I can’t help but mention that the finance industry used the same logic to defend deregulation and a weak enforcement of the existing laws that allowed Ponzi schemes and credit swap orders to cripple the world economy. Oops, there goes a few trillion dollars– hey maybe we should have been doing better oversight all these years!

It may be that no possible problem that could be revealed by competent testing would be considered a bad problem byIMVU. If that is the case, then the true reason they are successful is that they have chosen to offer a product that doesn’t matter to people who will accept anything they are offered. Of course, they could use ANY set of practices to do that.

Clearly, what they think they’ve done is establish a test process through automation that will probably discover any important problem that could happen before they release. That’s why Michael and I tested it, and we quickly verified what we expected to find: several problems that materially interfered with the claimed functionality of IMVU, and numerous glitches that suggested the presence of more serious problems nearby. Maybe its present users are willing to put up with it, or maybe they are willing to put up with it for now. But that’s not the point.

The point is that IMVU is not doing certain ordinary and obvious things that would reveal problems in their product and they promote that approach to doing business as if it’s an innovation instead of an evasion of responsibility.

The IMVU people can’t know whether there are, in fact, serious problems in their product because they have chosen not to discover them. That they promote this as a good practice (and that manual testing doesn’t scale, which is also bullshit) tells me that they don’t know what testing is for and they don’t know the difference between testing and a set of computerized behaviors called “test cases”.

They are setting themselves up to rediscover what many others have before them– why we test. Their own experiences will be the best teacher. I predict they will have some doozies.

Could the Military Be Waking Up?

Ever since I got into the testing field, almost 20 years ago, it’s been a truism that military software development is moribund. It’s not that they love process, it’s that they love bad process. Documentation? Bad documentation. Who can look upon 2167A or Mil-Std-499 without dismay? I’ll tell you who: paper mills and people paid to arrange the ink on all that paper. It’s just a scam and a travesty.

I was asked, in 1998, to analyze two military software software test plans, each for a different major weapons system. I told them up front that I was neither interested nor qualified to assess the test plans against DoD standards, such as Mil-Std-499. I was told, no problem, assess them against best commercial practice. Interpreting “best commercial practice” as what I would generally recommend doing for a life-critical or otherwise high stakes project in the commercial sector, I created a model for analyzing test plans (now published on my website and also in the appendices of Lessons Learned in Software Testing). I then applied the model to the test plans I was given. What was immediately apparent is that the military test documentation had very little information density. It looked like the 75-page documents had been automatically generated by a set of macros operating on a short bullet list.

I made a bunch of suggestions, including showing how the same information could productively be packaged in 5 pages or so. That way, at least we could wage war on something other than trees. They replied, after two months of silence, that my analysis was not useful to them. Why? Because my ideas about test plan documentation were not consistent with Mil-Std-499. That was the only feedback I received about that work. Way to squander taxpayer money, guys! Hoowaa!

A New Hope

The defense department may be waking up to the problem, at long last. See the NDIA Top Issues Report. Notice that two issues I like to harp about are in the top five challenges: skills and testing. Notice that the military is now officially concerned about wasteful documentation and low-skilled workers.

Maybe not coincidentally, I recently taught my class at Eglin Air Force base, with F-15s thundering regularly overhead. I was surprised that they would invite me to teach there. I was a bit more surprised that they were quite receptive to the concept of skilled and agile software testing, wherein our documentation is produce only because and only to the extent that it actually serves a useful purpose.

Exploratory Testing Research

Good research on testing is hard to find. Why? One reason is that testing does not belong to the field of Computer Science. I mean, sure, some of it does. There is some value to describing and testing an algorithm to efficiently cover a directed graph. But covering directed graphs is not my problem, most of the time. Most of the time, my problem is how to work with other people to simplify a complex world. Most of the time, the testing problem is an exploration and modeling problem within a socially distributed cognitive system. Whew! Whatever that is, it ain’t Computer Science.

Therefore, I am delighted to present two excellent examples of rigorous scientific research into exploratory testing– both of them coming from the field of Cognitive Science.

  1. Jerry Weinberg’s 1965 Doctoral ThesisHere, Jerry runs an experiment to determine strategies people use when trying to comprehend a pattern of behavior in a system. In this case, the system is a set of symbols that keep changing, and the task is to predict the symbols that will come next. By observing the pattern of prediction made by his test subjects, Jerry is able to draw inferences about the evolution of their mental models of the system.The upshot is this: tom some extent it is possible to see how testers think while they are thinking. I use this principle to evaluate testers and coach them to think better.
  2. Collaborative Discovery in a Scientific DomainThis paper by Takeshi Okada and Herbert Simon is fantastic!They study how pairs of scientists, working together, design and conduct experiments to discover a scientific principle. This is EXACTLY the same thought process used by testers to investigate the behavior of systems.Notice how Okada and Simon collect information about the thought processes of their subjects. It’s very much like Weinberg’s approach, and shows again that it is possible to draw solid inferences and make interesting distinctions about the thought processes of testers.This is important stuff, because we need to make the case that exploratory testing is a rich activity that can be observed, evaluated, and also systematically taught and improved. These two papers deal with the observation and evaluation part, but I think they suggest ways to teach and improve.

Surprise Heuristic

At the recent Workshop on Training Software Testers, Morven Gentleman showed us a chart of some test results. I was surprised to see a certain pattern in the results. I began to think of new and better tests to probe the phenomenon.

Morven told us that the tester who produced that chart did not see anything strange about it. This intrigued me. Why did Morven and I see something worth investigation when the tester did not?

Then I stopped myself and tried to discover my own thought process on this. A few minutes later this exploratory testing heuristic came to mind:

I MAKE AN OBSERVATION DURING A TEST…

1. I experience surprise associated with a pattern within the observation.

That triggers REFLECTION about PLAUSIBILITY…

2. The pattern seems implausible relative to my current model of the phenomenon.

That triggers REFLECTION about RISK…

3. I can bring to mind a risk associated with that implausible pattern.

That triggers REFLECTION on MAGNITUDE OF RISK…

4. The risk seems important.

That triggers TEST REDESIGN…

Now, I don’t really know if this is my thought process, but it’s a pattern I might be able to use to explain to new testers how surprise can be a test tool.

Dead Bee Heuristic

Have you ever had a software problem that disappeared even when you did nothing to correct it? Or have you ever fixed a bug by doing something that seems as if it shouldn’t have fixed anything?

Whenever that happens to me, I A) remain wary, and B) remove the fix so that by seeing the problem again I have additional evidence that the “fix” was truly the fix. I call this is the dead bee heuristic, because if there’s a bee in my living room, I don’t want it to mysteriously disappear, I want to see it fly out my window or be dead on my floor.

This applies to testing situations, too. If I change a data file and see that it no longer crashes the application I’m testing, the next thing I do is change it back again so I can see the crash one more time.

“Say, was you ever bit by a dead bee?…You know, you got to be careful of dead bees if you’re goin’ around barefooted, ’cause if you step on them they can sting you just as bad as if they was alive, especially if they was kind of mad when they got killed. I bet I been bit a hundred times that way.” — Walter Brennan as “Eddie” in To Have and Have Not

And always bear in mind that killing the “bee” may not have solved the real problem, or may have created new problems.

How to Investigate Intermittent Problems

The ability and the confidence to investigate an intermittent bug is one of the things that marks an excellent tester. The most engaging stories about testing I have heard have been stories about hunting a “white whale” sort of problem in an ocean of complexity. Recently, a thread on the SHAPE forum made me realized that I had not yet written about this fascinating aspect of software testing.

Unlike a mysterious non-intermittent bug, an intermittent bug is more of a testing problem than a development problem. A lot of programmers will not want to chase that white whale, when there’s other fishing to do.

Intermittent behavior itself is no big deal. It could be said that digital computing is all about the control of intermittent behavior. So, what are we really talking about?

We are not concerned about intermittence that is both desirable and non-mysterious, even if it isn’t exactly predictable. Think of a coin toss at the start of a football game, or a slot machine that comes up all 7’s once in a long while. We are not even concerned about mysterious intermittent behavior if we believe it can’t possibly cause a problem. For the things I test, I don’t care much about transient magnetic fields or minor random power spikes, even though they are happening all the time.

Many intermittent problems have not yet been observed at all, perhaps because they haven’t manifested, yet, or perhaps because they have manifested and not yet been noticed. The only thing we can do about that is to get the best test coverage we can and keep at it. No algorithm can exist for automatically detecting or preventing all intermittent problems.

So, what we typically call an intermittent problem is: a mysterious and undesirable behavior of a system, observed at least once, that we cannot yet manifest on demand.

Our challenge is to transform the intermittent bug into a regular bug by resolving the mystery surrounding it. After that it’s the programmer’s headache.

Some Principles of Intermittent Problems:

  • Be comforted: the cause is probably not evil spirits.
  • If it happened once, it will probably happen again.
  • If a bug goes away without being fixed, it probably didn’t go away for good.
  • Be wary of any fix made to an intermittent bug. By definition, a fixed bug and an unfixed intermittent bug are indistinguishable over some period of time and/or input space.
  • Any software state that takes a long time to occur, under normal circumstances, can also be reached instantly, by unforeseen circumstances.
  • Complex and baffling behavior often has a simple underlying cause.
  • Complex and baffling behavior sometimes has a complex set of causes.
  • Intermittent problems often teach you something profound about your product.
  • It’s easy to fall in love with a theory of a problem that is sensible, clever, wise, and just happens to be wrong.
  • The key to your mystery might be resting in someone else’s common knowledge.
  • An intermittent problem in the lab might be easily reproducible in the field.
  • The Pentium Principle of 1994: an intermittent technical problem may pose a *sustained and expensive* public relations problem.
  • The problem may be intermittent, but the risk of that problem is ever present.
  • The more testability is designed into a product, the easier it is to investigate and solve intermittent problems.
  • When you have eliminated the impossible, whatever remains, however improbable, could have done a lot of damage by then! So, don’t wait until you’ve fully researched an intermittent problem before you report it.
  • If you ever get in trouble an intermittent problem that you could not lock down before release, you will fare a lot better if you made a faithful, thoughtful, vigorous effort to find and fix it. The journey can be the reward, you might say.

Some General Suggestions for Investigating Intermittent Problems:

  • Recheck your most basic assumptions: are you using the computer you think you are using? are you testing what you think you are testing? are you observing what you think you are observing?
  • Eyewitness reports leave out a lot of potentially vital information. So listen, but DO NOT BECOME ATTACHED to the claims people make.
  • Invite more observers and minds into the investigation.
  • Create incentives for people to report intermittent problems.
  • If someone tells you what the problem can’t possibly be, consider putting extra attention into those possibilities.
  • Check tech support websites for each third party component you use. Maybe the problem is listed.
  • Seek tools that could help you observe and control the system.
  • Improve communication among observers (especially with observers who are users in the field).
  • Establish a central clearinghouse for mystery bugs, so that patterns among them might be easier to spot.
  • Look through the bug list for any other bug that seems like the intermittent problem.
  • Make more precise observations (consider using measuring instruments).
  • Improve testability: Add more logging and scriptable interfaces.
  • Control inputs more precisely (including sequences, timing, types, sizes, sources, iterations, combinations).
  • Control state more precisely (find ways to return to known states).
  • Systematically cover the input and state spaces.
  • Save all log files. Someday you’ll want to compare patterns in old logs to patterns in new ones.
  • If the problem happens more often in some situations than in others, consider doing a statistical analysis of the variance between input patterns in those situations.
  • Consider controlling things that you think probably don’t matter.
  • Simplify. Try changing only one variable at a time; try subdividing the system. (helps you understand and isolate problem when it occurs)
  • Complexify. Try changing more variables at once; let the state get “dirty”. (helps you make a lottery-type problem happen)
  • Inject randomness into states and inputs (possibly by loosening controls) in order to reach states that may not fit your typical usage profile.
  • Create background stress (high loads; large data).
  • Set a trap for the problem, so that the next time it happens, you’ll learn much more about it.
  • Consider reviewing the code.
  • Look for interference among components created by different organizations.
  • Celebrate and preserve stories about intermittent problems and how they were resolved.
  • Systematically consider the conceivable causes of the problem (see below).
  • Beware of burning huge time on a small problem. Keep asking, is this problem worth it?
  • When all else fails, let the problem sit a while, do something else, and see if it spontaneously recurs.

Considering the Causes of Intermittent Problems

When investigating an intermittent problem, it maybe worth considering the kinds of things that cause such problems. The list of guideword heuristics below may help you systematically do that analysis. There is some redundancy among the items in the list, because causes can be viewed from different perspectives.

Possibility 1: The system is NOT behaving differently. The apparent intermittence is an artifact of the observation.

  • Bad observation: The observer may have made a poor observation. (e.g. “Innattentional Blindness” is a phenomena whereby an observer whose mind is occupied may not see things that are in plain view. When presented with the scene a second time, the observer may see new things in the scene and assume that they weren’t there, before. Also, certain optical illusions cause apparently intermittent behavior in an unchanging scene. See “the scintillating grid”)
  • Irrelevant observation: The observer may be looking at differences that don’t matter. The things that matter may not be intermittent. This can happen when an observation is too precise for its purpose.
  • Bad memory: The observer may have mis-remembered the observation, or records of the observation could have been corrupted. (There’s a lot to observe when we observe! Our mind immediately compact the data and relate it to other data. Important data may be edited out. Besides, a lot of system development and testing involve highly repetitive observations, and we sometimes get them mixed up.)
  • Misattribution: The observer may have mis-attributed the observation. (“Microsoft Word crashed” might mean that *Windows* crashed for a reason that had nothing whatsoever to do with Word. Word didn’t “do” anything. This is a phenomenon also known as “false correlation” and often occurs in the mind of an observer when one event follows hard on the heels of another event, making one appear to be caused by the other. False correlation is also chiefly responsible for many instances whereby an intermittent problem is mistakenly construed to be a non-intermittent problem with a very complex and unlikely set of causes)
  • Misrepresentation: The observer may have misrepresented the observation. (There are various reasons for this. An innocent reason is that the observer is so confident in an inference that they have the honest impression that they did observe it and report it as such. I once asked my son if his malfunctioning Playstation was plugged in. “Yes!” he said impatiently. After some more troubleshooting, I had just concluded that the power supply was shot when I looked down and saw that it was obviously not plugged in.)
  • Unreliable oracle: The observer may be applying an intermittent standard for what constitutes a “problem.” (We may get the impression that a problem is intermittent only because some people, some of the time, don’t consider the behavior to be a problem, even if the behavior is itself predictable. Different observers may have different tolerances and sensitivities; and the same observer may vary in that way from one hour to the next.)
  • Unreliable communication: Communication with the observer may be inconsistent. (We may get the impression that a problem is intermittent simply because reports about it don’t consistently reach us, even if the problem is itself quite predictable. “I guess people aren’t seeing the problem anymore” may simply mean that people no longer bother to complain.)

Possibility 2: The system behaved differently because it was a different system.

  • Deus ex machina: A developer may have changed it on purpose, and then changed it back. (This can occur easily when multiple developers or teams are simultaneously building or servicing different parts of an operational server platform without coordinating with each other. Another possibility, of course, is that the system has been modified by a malicious hacker.)
  • Accidental change: A developer may be making accidental changes. (The changes may have unanticipated side effects, leading to the intermittent behavior. Also, a developer may be unwittingly changing a live server instead of a sandbox system.)
  • Platform change: A platform component may have been swapped or reconfigured. (An administrator or user may have changed, intentionally or not, a component on which the product depends. Common sources of these problems include Windows automatic updates, memory and disk space reconfigurations.)
  • Flakey hardware: A physical component may have transiently malfunctioned. (Transient malfunctions may be due factors such as inherent natural variation, magnetic fields, excessive heat or cold, battery low conditions, poor maintenance, or physical shock.)
  • Trespassing system: A foreign system may be intruding. (For instance, in web testing, I might get occasionally incorrect results due to a proxy server somewhere at my ISP that provides a cached version of pages when it shouldn’t. Other examples are background virus scans, automatic system updates, other programs, or other instances of the same program.)
  • Executable corruption: The object code may have become corrupted. (One of the worst bugs I ever created in my own code (in terms of how hard it was to find) involved machine code in a video game that occasionally wrote data over a completely unrelated part of the same program. Because of the nature of that data, the system didn’t crash, but rather the newly corrupted function passed control to the function that immediately followed it in memory. Took me days (and a chip emulator) to figure it out.)
  • Split personality: The “system” may actually be several different systems that perform as one. (For instance, I may get inconsistent results from Google depending on which Google server I happen to get; or I might not realize that different machines in the test lab have different versions of some key component; or I might mistype a URL and accidentally test on the wrong server some of the time.)
  • Human element: There may be a human in the system, making part of it run, and that human is behaving inconsistently.

Possibility 3: The system behaved differently because it was in a different state.

  • Frozen conditional: A decision that is supposed to be based on the status of a condition may have stopped checking that condition. (It could be stuck in an “always yes” or “always no” state.)
  • Improper initialization: One or more variables may not have been initialized. (The starting state of a computation would therefore depend on the state of some previous computation of the same or other function.)
  • Resource denial: A critical file, stream, or other variable may not be available to the system. (This could happen either because the object does not exist, has become corrupted, or is locked by another process.)
  • Progressive data corruption: A bad state may have slowly evolved from a good state by small errors propagating over time. (Examples include timing loops that are slightly off, or rounding errors in complicated or reflexive calculations.)
  • Progressive destabilization: There may be a classic multi-stage failure. (The first part of the bug creates an unstable state– such as a wild pointer– when a certain event occurs, but without any visible or obvious failure. The second part precipitates a visible failure at a later time based on the unstable state in combination with some other condition that occurs down the line. The lag time between the destabilizing event and the precipitating event makes it difficult to associate the two events to the same bug.)
  • Overflow: Some container may have filled to beyond its capacity, triggering a failure or an exception handler. (In an era of large memories and mass storage, overflow testing is often shortchanged. Even if the condition is properly handled, the process of handling it may interact with other functions of the system to cause an emergent intermittent problem.)
  • Occasional functions: Some functions of a system may be invoked so infrequently that we forget about them. (These include exception handlers, internal garbage collection functions, auto-save, and periodic maintenance functions. These functions, when invoked, may interact in unexpected ways with other functions or conditions of the system. Be especially wary of silent and automatic functions.)
  • Different mode or option setting: The system can be run in a variety of modes and the user may have set a different mode. (The new mode may not be obviously different from the old one.)

Possibility 4: The system behaved differently because it was given different input.

  • Accidental input: User may have provided input or changed the input in a way that shouldn’t have mattered, yet did. (This might also be called the Clever Hans syndrome, after the mysteriously repeatable ability of Clever Hans, the horse, to perform math problems. It was eventually discovered by Oskar Pfungst that the horse was responding to subtle physical cues that its owner was unintentionally conveying. In the computing world, I once experienced an intermittent problem due to sunlight coming through my office window and hitting an optical sensor in my mouse. The weather conditions outside shouldn’t have constituted different input, but they did. Another more common example is different behavior that may occur when using the keyboard instead of mouse to enter commands. The accidental input might be invisible unless you use special tools or recorders. For instance, two identical texts, one saved in RTF format from Microsoft Word and one saved in RTF format from Wordpad, will be very similar on the disk but not exactly identical.)
  • Secret boundaries and conditions: The software may behave differently in some parts of the input space than it does in others. (There maybe hidden boundaries, or regions of failure, that aren’t documented or anticipated in your mental model of the product. I once tested a search routine that invoked different logic when the total returned hits were =1000 and = 50,000. Only by accident did I discover these undocumented boundaries.)
  • Different profile: Some users may have different profiles of use than other users. (Different biases in input will lead to different experiences of output. Users with certain backgrounds, such as programmers, may be systematically more or less likely to experience, or notice, certain behaviors.)
  • Ghost input: Some other machine-based source than the user may have provided different input. (Such input is often invisible to the user. This includes variations due to different files, different signals from peripherals, or different data coming over the network.)
  • Deus Ex Machina: A third party may be interacting with the product at the same time as the user. (This maybe a fellow tester, friendly user, or a malicious hacker.)
  • Compromised input: Input may have been corrupted or intercepted on its way into the system. (Especially a concern in client-server systems.)
  • Time as input: Intermittence over time may be due to time itself. (Time is the one thing that constantly changes, no matter whatever else you control. Whenever time and date, or time and date intervals, are used as input, bugs in that functionality may appear at some times but not others.)
  • Timing lottery: Variations in input that normally don’t matter may matter at certain times or at certain loads. (The Mars Rover suffered from a problem like this involving a three microsecond window of vulnerability when a write operation could write to a protected part of memory.)
  • Combination lottery: Variations in input that normally don’t matter may matter when combined in a certain way.

Possibility 5: The other possibilities are magnified because your mental model of the system and what influences it is incorrect or incomplete in some important way.

  • You may not be aware of each variable that influences the system.
  • You may not be aware of sources of distortion in your observations.
  • You may not be aware of available tools that might help you understand or observe the system.
  • You may not be aware of the all the boundaries of the system and all the characteristics of those boundaries.
  • The system may not actually have a function that you think it has; or maybe it has extra functions.
  • A complex algorithm may behave in a surprising way, intermittently, that is entirely correct (e.g. mathematical chaos can look like random behavior).

On Answering Questions About ET

People ask me a lot of questions about ET. I want to be helpful in my answers. A problem I struggle with is that questions about ET often come with a lot of assumptions, and the first thing I have to do is to make the assumptions visible and try to clear away the ones that aren’t helpful. Otherwise, my answers will sound crazy.

Questions of any kind rest on premises. That’s cool, and normally it’s not a big problem. It becomes a problem when questions are asked across a paradigmatic chasm. And there’s a big chasm between the premises of “traditional testing” and those of context-driven test methodology, and those of Rapid Software Testing, which is what I call my test methodology.

Starting in 1987, I tried to learn software testing. Starting in 1989, I started reinventing testing for myself, having become disillusioned with the empty calories of folklore that I found in books by folks like William Perry, or the misanthropic techniquism of Boris Beizer (Boris once told me that it didn’t bother him if people find his advice impractical, since he was merely concerned with documenting “best practices”, a phenomenon that he seemed to think has nothing to do with applicability or utility).

I “invented” testing (with the help of many colleagues) mainly by discovering that the problems of testing have already been solved in the fields of cognitive psychology, epistemology, and general systems thinking. The lessons of these much broader and older fields having been studiously ignored by the majority of authors in our field. This puts me in the odd position of having to defend exploratory thinking in technical work as if it’s some kind of new fangled idea, rather than a prime driver of scientific progress since the advent of science itself.

Anyway, now my island of testing metaphysics is mostly complete. I can plan and do and defend my testing without any reference to ideas published in testing “textbooks” or any oral folklore tradition. Instead I reference ideas from logic, the study of cognition, and the philosophy of science. My system works, but it’s a big job to explain it to testing traditionalists, unless they read broadly. For instance, if I were to say that I test the way Richard Feynman used to test, some people get it right away.

Let me illustrate my difficulty: Julian Harty asks “Do you expect an Exploratory Tester to be well versed in [traditional testing] techniques? Do you check that they are competent in them, etc?”

I’ve had some discussions with Julian. He seems like a friendly fellow. My brother Jonathan, who’s had more discussions with him, says “Julian is one of us.” That’s a serious endorsement. So, I don’t want to alienate Julian. I hope I can turn him into an ally.

Still, his question poses a challenge.

Not “exploratory tester”, just “tester.”
First, there is no such thing as an “exploratory tester”, separate from a “traditional tester”, except as a rhetorical device. I sometimes call myself an exploratory tester in debates, by which I mean someone who studies exploratory testing and tries to do it well. But that doesn’t seem to be how Julian is using the term. The truth is all testers are exploratory testers, in that we all test in exploratory ways. Some of us know how to do it well; fewer of us can explain it or teach it.

Testers are testers. Some testers are especially good at simultaneous learning, test design, and test execution, an intellectual blend called exploratory testing.

Exploratory testing is not a technique, it’s an approach.
A technique is a gimmick. A technique is a little thing. There are a buh-zillion techniques. Exploratory thinking is not a technique, but an approach, just as scripted testing is an approach. Approaches modify techniques. Any technique of testing can be approach in an exploratory way or a scripted way, or some combination of the two.

Traditional testing techniques are often not really techniques of testing, they are symbols in a mythology of testing.
Consider the technique “boundary testing.” One would think that this involves analyzing boundaries, somehow, and testing that there are no bugs in software products that are boundary-related. But actually, the way testing is written about and taught, almost no guidance is given to testers about how to analyze anything, including boundaries. Boundary testing isn’t so much a technique as a label, and by repeating the label to each other, we think we are accomplishing something. Now, I do have an exploratory approach to boundary testing. I use various heuristics as part of the boundary testing process, but for the most part, boundary testing is ordinary testing. The technique is a tiny part of it compared to the generic skills of modeling, observing, and evaluating that underlie all skilled testing.

I don’t teach boundary testing in my classes because it’s too trivial to worry about.

So, with that preamble, I can answer the question:

Julian, I assume by “traditional test techniques” you aren’t referring the tradition of using one’s eyes and brain to test something, but rather to certain high sounding labels like equivalence class partitioning (a fancy way of saying “run a different test instead of the same test over and over again”) or black-box testing (a fancy way of saying “test without knowing everything”) or cause-effect graphing (a way of saying “I repeat things I see in books even if the ideas are totally impractical”). I don’t teach those labels to novice testers, Julian, because they don’t help a new tester actually test anything, and I want novices to learn how to test.

But to be an educated tester who is effective at explaining test methodology, I think you need to know the buzzwords; you need to know the folklore. This is true whether you are a tester who embraces exploratory testing, or one who still pretends that you don’t do ET.

A tester– any tester, not just one who follows my rapid testing vision– needs to develop the cognitive skills to effectively question technology. Gain those and you automatically gain everything important about “traditional test techniques”, in my opinion.

To see such a skill in action, ask yourself this question: how many dimensions of a wine glass can you list? Then watch what your mind does next. To answer this question you need a skill that I have come to call “factoring”, which is a component of modeling skill. It is a skill, not a technique, though there may be many techniques we might apply in the course of exhibiting our skill.

Why Talk About Exploratory Testing?

So, there I was at the Dutch Testing Day, last year. I was a featured speaker, talking about exploratory testing. ET is one of my favorite subjects. It is helpful and powerful, and yet by some strange quirk of history and collective delusion, our industry hasn’t yet embraced it.

They asked me to participate in a panel discussion on ET. But when I took the stage and heard the opening statements by the three other panelists, I realized to my shock that the other guys knew almost nothing about ET. That is to say, each of them seems to have spent ten minutes or so looking at my web site, but indicated no other preparation, background, or study of any kind on this topic.

The panel discussion quickly became a debate. Everybody against me. Part of me loves that. My favorite literary character is Cyrano De Bergerac, and my inner Cyrano delights to be ambushed by brigands on the high road. As a freshman in high school I once was invited to square off alone against the entire senior sociology class, where the issue was whether morality is real or just a convenient human contrivance. Guess which side I was arguing? Hint: Somebody on the other side screamed “Baby killer!” as part of her counterargument.

A lot of things were said. One exchange in particular is intructive:

Panelist: I would never use exploratory testing on a life critical product.

Me: Really? I think it would be irresponsible not to. But let me get this straight. Are you saying that you would NEVER change a test based on something you learned while testing?

Panelist: I change tests often. Everybody does. Is that what you call exploratory testing?

Me: Basically, yes.

Panelist: Well, what does it mean to advocate a practice that everybody already does? That’s like telling us we should breathe.

Me: I’m not advocating that you DO exploratory testing. I’m advocating that you learn to do it WELL. There is a huge difference.

Our poor testing craft is afflicted with diseases. One is testcasetosis, which is the inability to imagine test activities unless packaged in chicklets called test cases. Here I’m concerned with techniquism. That’s the inability to comprehend testing as a skill, but instead only as some set of more or less mechanical behaviors called test techniques.

Exploratory testing is not a technique, it’s an approach. That is to say, any technique can be practiced in an exploratory or non-exploratory way. Exploration itself is not testing, but it modifies testing.

The question “Should I do exploratory testing?” is not helpful. Instead ask “In what way is my testing exploratory and in what way is it scripted? How might my testing benefit from more exploration or more scripting?”

But few people are doing this because exploratory testing is not being discussed. It’s still a closet activity. I go into projects and see lots of ET, but usually no ET is mentioned in any of their officially defined processes.

Come out of the damn closet!

You already do exploratory testing. Learn to see it and talk about it. The constituent skills of exploratory testing are simply the skills of testing, applied in the moment. When you turn the key and your car doesn’t start, the things you do next probably consitute an exploratory testing process.

There’s a lot to this idea of thinking on your feet and changing your approach based on what happens. The games Mastermind, Twenty Questions, Jigsaw puzzles and Sudoku are all exploratory activities.

I will go into more detail and itemize the skills of testing in another post.

— James

Amateur Penetration Testing

It’s traveling season for me. London, Amsterdam, New Zealand, Australia, and so forth. That means I’m killing time in a lot of airports and hotel lobbies. And THAT means I’m testing software on kiosks and Internet access stations across this great planet of ours.

At first I was just trying to find interesting bugs, but more recently I’ve become interested in penetration testing. I’m an amateur at it, but I seem to be getting better with practice. Wherever possible, I send the details of any exploits to the companies that run those systems. Even though my behavior is probably illegal (when I actually do a take over), I like being on the sunny side of morality.

I want to share some of the details with you because I think companies should to take security a lot more seriously. Maybe they will if more people know how to circumvent bad security.

Recently, I was able to break into and take over a Internet access station at a hotel in London. Later, I failed to take control of one at Heathrow, but I did find some unsightly bugs in it. A different station at the Red Carpet Club (based on Windows 95, would you believe) fell pretty quickly.

These are the major techniques I use:

1. Look for and access generic, hidden Windows features.

A lot of these systems are based on Windows and use Internet Explorer. Attacking them is not very difficult, in most cases, because the Windows O/S is brimming with rich, creamy functionality. This functionality is difficult to disable, and much of it is hidden.

In my PDA, I keep a list of every keystroke shortcut in Windows (conveniently available from Microsoft). I try each one. Did you know that Shift-Alt-PrtSc activates high contrast mode in XP? Not only that, it gives you access to a bunch of accessibility settings.

In one kiosk I was able to invoke the XP “Help Center”. (Probably something to do with a very short vulnerability window or some other special state, because I couldn’t make it open, later on.) Once open, I tried to install a printer. This allowed me to browse for printer drivers, but through that interface I can see any file on the system. What’s more, I can delete any of those files. By strategically deleting files, a hacker could take over the system on reboot. (No, I didn’t do that.)

While I was playing with this, a notice came up informing me that Windows was ready to install Service Pack 2. By following that process half-way, I was able to invoke an IE session that let me browse and access the files on the hard drive. I discovered information that would have allowed me to attack the servers of the company that manufactured that kiosk. Yikes.

So, don’t turn on automatic updates, you public kiosk developers!

In addition to accessing hidden functions through hotkeys, I also go through each menu of each program I can access, especially Internet Explorer, looking for any function that starts a new application or gives me a file open/save dialog.

For example, in another kiosk, I was able to invoke the Microsoft newsreader application through Explorer. On the help menu there’s an option to view the readme. When invoked, Notepad comes up. Ah ha! With Notepad I created a batch file that opened a command prompt. I put that file in the startup folder. Then I rebooted and took control of the machine.

2. Look for special content on web pages.

A lot of kiosks give you controlled access to web pages. The access is easy to control if the web pages are simple HTML, but when third party plug-ins or special dynamic content is involved, there be dragons…

In yet another kiosk, which was better secured than the other two, I was allowed free access to the website for Heathrow Airport. No problem so far. Looking around that website, I found some PDF files. Acrobat Reader started up within IE, and gave me a new toolbar. One of the functions was Save a Copy, which allowed me to browse the files on the hard drive. I didn’t try it, but I bet I could have saved the PDF with the same file name as one of the files related to the application that drove the kiosk. This might be called a lobotomy attack. I don’t know what the pros call it.

3. Find the hidden technical support login interface, guess passwords, and otherwise beat on it.

Public kiosks almost always have a hidden interface that lets a technician log in and perform maintenance. That interface is often pretty easy to find and poorly tested.

In one case, the tech login was accessed by touching a particular word on a particular page. On another it was clicking three times on one button then three times on the button next to it (It took me all of five minutes to stumble onto it. So much more security through obscurity.) Another kiosk hardly bothered to hide it– you just click in the bottom left corner.

Tapping chaotically on the touchscreen of the United EasyCheckIn system caused a supervisor password prompt to come up. I have not yet been able to repro that, but now that I know it exists, I will keep trying until I learn how to do it on demand.

4. Create error conditions and see if that disables security or reveals functions.

The Del Discover Club kiosk at the Del Coronado hotel has a vulnerability that occurs within a 1/5 second window after causing a certain error message to occur.

5. Make a lot of things happen at once and see if they trip over each other.

Concurrency is so hard for programmers to get right. They make assumptions about states, and those assumptions are often faulty.

6. Make the system reboot, try to interrupt the process, and gather clues about system configuration.

I was unable to reboot the book search system at Barnes and Noble, but I was able to make a modal error dialog appear. By moving it off the screen, the terminal appears to be frozen (unless the user presses enter first). I didn’t try this, but it occurs to me that the store personnel would probably reboot the system for me if I reported that a terminal was frozen. If they walked away after that, I might be able to launch my attack.

Some of these kiosks have an unprotected power cord, believe it or not; and in the UK all outlets have switches on them, so I don’t have to pull the plug to cut the power.

To Repeat Tests or Not to Repeat

One of the serious social diseases of the testing craft is the obsession with repetition. Is that test repeatable? Is that test process repeatable? Have we repeated those tests? These questions are often asked in a tone of worry or accusation, sometimes accompanied by rhetorical quips about the importance of a disciplined process– without explanation of how discipline requires repetition.

(Before you go on, I urge you to carefully re-read the previous paragraph, and notice that I used the word obsession. I am not arguing with repeatability, as such. Just as one can argue against an addiction to food without being against eating, what I’m trying to do is wipe out obsession. Please help me.)

There is one really good reason not to repeat a test: the value of a new test is greater than the value of an old test (all other things being equal). It’s greater because a new test can find problems that have always been in the product, and not yet found, while an old test has a non-zero likelihood of revealing the same old thing it revealed the last time you performed it. New tests always provide new information. Old tests sometimes do.

This one powerful reason to run new tests is based on the idea that testing is a sampling process, and that running a single test, whatever the test, is to collect a tiny sample of behavior from a very large population of potential behaviors. More tests means a bigger sample. Re-running tests belabors the same sample, over and over.

Test repetition is often justified based on arguments that sound like blatant discrimination against the unborn test, as if manifested tests have some kind of special citizenship denied to mere potential tests. One reason for this bias may be a lack of appreciation for the vastness of testing possibilities. If you believe that your tests already comprise all the tests that matter, you won’t have much urgency about making new ones.

Another reason may be an innappropriate analogy to scientific experiments. We were all told in 5th grade science class about the importance of the controlled, repeatable experiment to the proper conduct of science. But what we weren’t told is that a huge amount of less controlled and less easily repeated exploratory work precedes the typical controlled experiment. Otherwise, an amazing amount of time would be wasted on well controlled, but uninteresting experiments. Science embraces exploratory as well as confirmatory research.

One thought experiment I find useful is to take the arguments for repetition to their logical extreme and suppose that we have just one and only one test for a complex product. We run that test again and again. The absurdity of that image helps me see reasons to run more tests. No complex product with a high quality standard can be considered well tested unless a wide variety of tests have been performed against it.

(You probably have noticed that it’s important to consider what I mean by “test” and “run that test again and again”. Depending on how you think of it, it may well be one test would be enough, but then it would have to be an extremely complex test or one that incorporates within itself an extreme amount of variation.)

The Product is a Minefield

In order to replace obsession with informed choice, we need a way to consider a situation and decide if repetition is warranted, and how much repetition. I have found that the analogy of a minefield helps me work through those considerations.

The minefield is an evocative analogy that expresses the sampling argument: if you want to avoid stepping on a mine, walk in the footsteps of the last successful person to traverse the minefield. Repetition avoids finding a mine by limiting new contact between your feet and the ground. By the same principle, variation will increase the possibility of finding a mine.

I like this analogy because it is a meaningful and valid argument that also has important flaws that help us argue in favor of repetition. The analogy helps us explore both sides of the issue.

In my classes, I make the minefield argument and then challenge students to find problems in it. Each problem is then essentially a reason why, in a certain context, repetition might be better than variation.

I won’t force you to go through that exercise. Although, before you click on the link, below, you may want to think it through for yourself.

I know of nine interestingly distinct reasons to repeat tests. How many can you think of?

Click this link when you are ready to see my list and how the argument applies to test-driven design: Ten Reasons to Repeat Tests