We. Use. Tools.

Context-Driven testers use tools to help ourselves test better. But, there is no such thing as test automation.

Want details? Here’s the 10,000 word explanation that Michael Bolton and I have been working on for months.

Editor’s Note: I have just posted version 1.03 of this article. This is the third revision we have made due to typos. Isn’t it interesting how hard it is to find typos in your own work before you ship an article? We used automation to help us with spelling, of course, but most of the typos are down to properly spelled words that are in the wrong context. Spelling tools can’t help us with that. Also, Word spell-checker still thinks there are dozens of misspelled words in our article, because of all the proper nouns, terms of art, and neologisms. Of course there are the grammar checking tools, too, right? Yeah… not really. The false positive rate is very high with those tools. I just did a sweep through every grammar problem the tool reported. Out of the five it thinks it found, only one, a missing hyphen, is plausibly a problem. The rest are essentially matters of writing style.

One of the lines it complained about is this: “The more people who use a tool, the more free support will be available…” The grammar checker thinks we should not say “more free” but rather “freer.” This may be correct, in general, but we are using parallelism, a rhetorical style that we feel outweighs the general rule about comparatives. Only humans can make these judgments, because the rules of grammar are sometimes fluid.

Agile Testing Heuristic: The Power of Looking

Today I broke my fast with a testing exercise from a colleague. (Note: I better not tell you what it is or even who gave it to me, because after you read this it will be spoiled for you, whereas if you read this and at a later time stumble into that challenge, not knowing that’s the one I was talking about, it won’t be spoiled.)

The exercise involved a short spec and an EXE. The challenge was how to test it.

The first thing I checked is if it had a text interface that I could interact with programmatically. It did. So I wrote a program to flood it with “positive” and “negative” input. The results were collected in a log file. I programmatically checked the output and it was correct.

So far this is a perfectly ordinary Agile testing situation. It is consistent with any API testing or systematic domain testing of units you have heard of. The program I wrote performs a check, and the check is produced by my testing thought process and its output analyzed by a similar thought process. That human element qualifies this as testing and not merely naked checking. If I were to hand my automated check to someone else who did not think like a tester, it would not be testing anymore, although the checks would still have some value, probably.

Here’s my public service announcement: Kids! Remember to look at what is happening.

The Power of Looking

One aspect of my strategy I haven’t described yet is that I carefully watched the check as it was running. I do this not as a bored, offhanded, or incidental matter. It’s absolutely vital. I must observe all the output I can observe, rather than just the “pass/fail” status of my checks. I will comb through log files, watch the results in real-time, try things through the GUI, whatever CAN be seen, I want to see it.

As I watched the output flow by in this particular example, I noticed that it was much slower than I expected. Moreover, the speed of the output was variable. It seemed to vary semi-randomly. Since there was nothing in the nature of the program (as I understood it) that would explain slowness or variable timing, this became an instant focus of investigation. Either there’s a bug here or something I need to learn. (Note: that is known as the Explainability Oracle Heuristic.)

It’s possible that I could have anticipated and explicitly checked for performance issues, of course, but my point is that the Power of Looking is a heuristic for discovering lots of things you did NOT anticipate. The models in your mind generate expectations, automatically, that you may not even be aware of until they are violated.

This is important for all testing, but it’s especially important for tool-happy Agile testers, bless their hearts, some of whom consider automation to be next to godliness… Come to think of it, if God has automated his tests for human qualities, that would explain a lot…

 

 

A Test is a Performance

Testing is a performance, not an artifact.

Artifacts may be produced before, during, or after the act of testing. Whatever they are, they are not tests. They may be test instructions, test results, or test tools. They cannot be tests.

Note: I am speaking a) authoritatively about how we use terms in Rapid Testing Methodology, b) non-authoritatively of my best knowledge of how testing is thought of more broadly within the Context-Driven school, and c) of my belief about how anyone, anywhere should think of testing if they want a clean and powerful way to talk about it.

I may informally say “I created a test.” What I mean by that is that I designed an experience, or I made a plan for a testing event. That plan itself is not the test, anymore than a picture of a car is a car. Therefore, strictly speaking, the only way to create a test is to perform a test. As Michael Bolton likes to say, there’s a world of difference between sheet music and a musical performance, even though we might commonly refer to either one as “music.” Consider these sentences: “The music at the symphony last night was amazing.” vs. “Oh no, I left the music on my desk at home.”

We don’t always have to speak strictly, but we should know how and know why we might want to.

Why can’t a test be an artifact?

Because artifacts don’t think or learn in the full human sense of that word, that’s why, and thinking is central to the test process. So to claim that an artifact is a test is like wearing a sock puppet on your hand and claiming that it’s a little creature talking to you. That would be no more than you talking to yourself, obviously, and if you removed yourself from that equation the puppet wouldn’t be a little creature, would it? It would be a decorated sock lying on the floor. The testing value of an artifact can be delivered only in concert with an appropriately skilled and motivated tester.

With procedures or code you can create a check. See here for a detailed look at the difference between checking and testing. Checking is part of testing, of course. Anyone who runs checks that fail knows that the next step is figuring out what the failures mean. A tester must also evaluate whether the checks are working properly and whether there are enough of them, or too many, or the wrong kind. All of that is part of the performance of testing.

When a “check engine” light goes on in your car, or any strange alert, you can’t know until you go to a mechanic whether that represents a big problem or a little problem. The check is not testing. The testing is more than the check itself.

But I’ve seen people follow test scripts and only do what the test document tells them to do!

Have you really witnessed that? I think the most you could possibly have witnessed is…

EITHER:

a tester who appeared to do “only” what the test document tells him, while constantly and perhaps unconsciously adjusting and reacting to what’s happening with the system under test. (Such a tester may find bugs, but does so by contributing interpretation, judgment, and analysis; by performing.)

OR:

a tester who necessarily missed a lot of bugs that he could have found, either because the test instructions were far too complex, or far too vague, or there was far too little of it (because that documentation is darn expensive) and the tester failed to perform as a tester to compensate.

In either case, the explicitly written or coded “test” artifact can only be an inanimate sock, or a sock puppet animated by the tester. You can choose to suffer without a tester, or to cover up the presence of the tester. Reality will assert itself either way.

What danger could there be in speaking informally about writing “tests?”

It’s not necessarily dangerous to speak informally. However, a possible danger is that non-testing managers and clients of our work will think of testers as “test case writers” instead of as people who perform the skilled process of testing. This may cause them to treat testers as fungible commodities producing “tests” that are comprised solely of explicit rules. Such a theory of testing– which is what we call the Factory school of testing thought– leads to expensive artifacts that uncover few bugs. Their value is mainly in that they look impressive to ignorant people.

If you are talking to people who fully understand that testing is a performance, it is fine to speak informally. Just be on your guard when you hear people say “Where are your tests?” “Have you written any tests?” or “Should you automate those tests?” (I would rather hear “How do you test this?” “Where are you focusing you testing?” or “Are you using tools to help your testing?”)

Thanks to Michael Bolton and Aleksander Simic for reviewing and improving this post.

 

Testing and Checking Refined

This post is co-authored with Michael Bolton. We have spent hours arguing about nearly every sentence. We also thank Iain McCowatt for his rapid review and comments.

Testing and tool use are two things that have characterized humanity from its beginnings. (Not the only two things, of course, but certainly two of the several characterizing things.) But while testing is cerebral and largely intangible, tool use is out in the open. Tools encroach into every process they touch and tools change those processes. Hence, for at least a hundred or a thousand centuries the more philosophical among our kind have wondered “Did I do that or did the tool do that? Am I a warrior or just spear throwing platform? Am I a farmer or a plow pusher?” As Marshall McLuhan said “We shape our tools, and thereafter our tools shape us.”

This evolution can be an insidious process that challenges how we label ourselves and things around us. We may witness how industrialization changes cabinet craftsmen into cabinet factories, and that may tempt us to speak of the changing role of the cabinet maker, but the cabinet factory worker is certainly not a mutated cabinet craftsman. The cabinet craftsmen are still out there– fewer of them, true– nowhere near a factory, turning out expensive and well-made cabinets. The skilled cabineteer (I’m almost motivated enough to Google whether there is a special word for cabinet expert) is still in demand, to solve problems IKEA can’t solve. This situation exists in the fields of science and medicine, too. It exists everywhere: what are the implications of the evolution of tools on skilled human work? Anyone who seeks excellence in his craft must struggle with the appropriate role of tools.

Therefore, let’s not be surprised that testing, today, is a process that involves tools in many ways, and that this challenges the idea of a tester.

This has always been a problem– I’ve been working with and arguing over this since 1987, and the literature of it goes back at least to 1961– but something new has happened: large-scale mobile and distributed computing. Yes, this is new. I see this is the greatest challenge to testing as we know it since the advent of micro-computers. Why exactly is it a challenge? Because in addition to the complexity of products and platforms which has been growing steadily for decades, there now exists a vast marketplace for software products that are expected to be distributed and updated instantly.

We want to test a product very quickly. How do we do that? It’s tempting to say “Let’s make tools do it!” This puts enormous pressure on skilled software testers and those who craft tools for testers to use. Meanwhile, people who aren’t skilled software testers have visions of the industrialization of testing similar to those early cabinet factories. Yes, there have always been these pressures, to some degree. Now the drumbeat for “continuous deployment” has opened another front in that war.

We believe that skilled cognitive work is not factory work. That’s why it’s more important than ever to understand what testing is and how tools can support it.

Checking vs. Testing

For this reason, in the Rapid Software Testing methodology, we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do. This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.

Now that Michael and I have had over three years experience working with this distinction, we have sharpened our language even further, with updated definitions and a new distinction between human checking and machine checking.

First let’s look at testing and checking. Here are our proposed new definitions, which soon will replace the ones we’ve used for years (subject to review and comment by colleagues):

Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.

(A test is an instance of testing.)

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

(A check is an instance of checking.)

Explanatory notes:

  • “evaluating” means making a value judgment; is it good? is it bad? pass? fail? how good? how bad? Anything like that.
  • “evaluations” as a noun refers to the product of the evaluation, which in the context of checking is going to be an artifact of some kind; a string of bits.
  • “learning” is the process of developing one’s mind. Only humans can learn in the fullest sense of the term as we are using it here, because we are referring to tacit as well as explicit knowledge.
  • “exploration” implies that testing is inherently exploratory. All testing is exploratory to some degree, but may also be structured by scripted elements.
  • “experimentation” implies interaction with a subject and observation of it as it is operating, but we are also referring to “thought experiments” that involve purely hypothetical interaction. By referring to experimentation, we are not denying or rejecting other kinds of learning; we are merely trying to express that experimentation is a practice that characterizes testing. It also implies that testing is congruent with science.
  • the list of words in the testing definition are not exhaustive of everything that might be involved in testing, but represent the mental processes we think are most vital and characteristic.
  • “algorithmic” means that it can be expressed explicitly in a way that a tool could perform.
  • “observations” is intended to encompass the entire process of observing, and not just the outcome.
  • “specific observations” means that the observation process results in a string of bits (otherwise, the algorithmic decision rules could not operate on them).

There are certain implications of these definitions:

  • Testing encompasses checking (if checking exists at all), whereas checking cannot encompass testing.
  • Testing can exist without checking. A test can exist without a check. But checking is a very popular and important part of ordinary testing, even very informal testing.
  • Checking is a process that can, in principle be performed by a tool instead of a human, whereas testing can only be supported by tools. Nevertheless, tools can be used for much more than checking.
  • We are not saying that a check MUST be automated. But the defining feature of a check is that it can be COMPLETELY automated, whereas testing is intrinsically a human activity.
  • Testing is an open-ended investigation– think “Sherlock Holmes”– whereas checking is short for “fact checking” and focuses on specific facts and rules related to those facts.
  • Checking is not the same as confirming. Checks are often used in a confirmatory way (most typically during regression testing), but we can also imagine them used for disconfirmation or for speculative exploration (i.e. a set of automatically generated checks that randomly stomp through a vast space, looking for anything different).
  • One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.
  • A check is describable; a test might not be (that’s because, unlike a check, a test involves tacit knowledge).
  • An assertion, in the Computer Science sense, is a kind of check. But not all checks are assertions, and even in the case of assertions, there may be code before the assertion which is part of the check, but not part of the assertion.
  • These definitions are not moral judgments. We’re not saying that checking is an inherently bad thing to do. On the contrary, checking may be very important to do. We are asserting that for checking to be considered good, it must happen in the context of a competent testing process. Checking is a tactic of testing.

Whither Sapience?

If you follow our work, you know that we have made a big deal about sapience. A sapient process is one that requires an appropriately skilled human to perform. However, in several years of practicing with that label, we have found that it is nearly impossible to avoid giving the impression that a non-sapient process (i.e. one that does not require a human but could involve a very talented and skilled human nonetheless) is a stupid process for stupid people. That’s because the word sapience sounds like intelligence. Some of our colleagues have taken strong exception to our discussion of non-sapient processes based on that misunderstanding. We therefore feel it’s time to offer this particular term of art its gold watch and wish it well in its retirement.

Human Checking vs. Machine Checking

Although sapience is problematic as a label, we still need to distinguish between what humans can do and what tools can do. Hence, in addition to the basic distinction between checking and testing, we also distinguish between human checking and machine checking. This may seem a bit confusing at first, because checking is, by definition, something that can be done by machines. You could be forgiven for thinking that human checking is just the same as machine checking. But it isn’t. It can’t be.

In human checking, humans are attempting to follow an explicit algorithmic process. In the case of tools, however, the tools aren’t just following that process, they embody it. Humans cannot embody such an algorithm. Here’s a thought experiment to prove it: tell any human to follow a set of instructions. Get him to agree. Now watch what happens if you make it impossible for him ever to complete the instructions. He will not just sit there until he dies of thirst or exposure. He will stop himself and change or exit the process. And that’s when you know for sure that this human– all along– was embodying more than just the process he agreed to follow and tried to follow. There’s no getting around this if we are talking about people with ordinary, or even minimal cognitive capability. Whatever procedure humans appear to be following, they are always doing something else, too. Humans are constantly interpreting and adjusting their actions in ways that tools cannot. This is inevitable.

Humans can perform motivated actions; tools can only exhibit programmed behaviour (see Harry Collins and Martin Kusch’s brilliant book The Shape of Actions, for a full explanation of why this is so). The bottom line is: you can define a check easily enough, but a human will perform at least a little more during that check– and also less in some ways– than a tool programmed to execute the same algorithm.

Please understand, a robust role for tools in testing must be embraced. As we work toward a future of skilled, powerful, and efficient testing, this requires a careful attention to both the human side and the mechanical side of the testing equation. Tools can help us in many ways far beyond the automation of checks. But in this, they necessarily play a supporting role to skilled humans; and the unskilled use of tools may have terrible consequences.

You might also wonder why we don’t just call human checking “testing.” Well, we do. Bear in mind that all this is happening within the sphere of testing. Human checking is part of testing. However, we think when a human is explicitly trying to restrict his thinking to the confines of a check– even though he will fail to do that completely– it’s now a specific and restricted tactic of testing and not the whole activity of testing. It deserves a label of its own within testing.

With all of this in mind, and with the goal of clearing confusion, sharpening our perception, and promoting collaboration, recall our definition of checking:

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

From that, we have identified three kinds of checking:

Human checking is an attempted checking process wherein humans collect the observations and apply the rules without the mediation of tools.

Machine checking is a checking process wherein tools collect the observations and apply the rules without the mediation of humans.

Human/machine checking is an attempted checking process wherein both humans and tools interact to collect the observations and apply the rules.

In order to explain this thoroughly, we will need to talk about specific examples. Look for those in an upcoming post.

Meanwhile, we invite you to comment on this.

UPDATE APRIL 10th: As a result of intense discussions at the SWET5 peer conference, I have updated the diagram of checking and testing. Notice that testing is now sitting outside the box, since it is describing the whole thing, a description of testing is inside of it. Human checking is characterized by a cloud, because its boundary with non-checking aspects of testing is not always clearly discernible. Machine checking is characterized by a precise dashed line, because although its boundary is clear, it is an optional activity. Technically, human checking is also optional, but it would be a strange test process indeed that didn’t include at least some human checking. I thank the attendees of SWET5 for helping me with this: Rikard Edgren, Martin Jansson, Henrik Andersson, Michael Albrecht, Simon Morley, and Micke Ulander.

Why Scripted Testing is Not for Novices

…Unless you want bad testing.

Claire Moss writes:

I am surprised that you say that scripted testing is harder for novice
testers. I would have expected that having so much structure around
the tests would make getting into testing easier for someone with less
experience and that the scripted instructions would make up for a lack
of discipline on the part of the tester.

Structure != “being told what to do”
First, you are misusing the word “structure.” All testing is structured. If what you mean by structure is “externally imposed structure” then say that. But even if you are not aware of a structure in your testing, it is there. When I tell a novice tester to test, and don’t tell him how to test, he will be dominated by certain structures he is largely unaware of– or if aware he cannot verbalize or control them much. For instance: the user interface look and feel is a guiding structure for novice testers. They test what they see.

Cognitive science offer plenty of ideas and insights about the structures that guide our thinking and behavior. See the book Predictably Irrational by Dan Ariely for more on this.

Scripted testing always has at least two distinct parts: test design and test execution. They must be considered independently.

Scripted test execution is quite a bit more difficult than exploratory testing, unless you are assuming that the tester following the script has exactly the same knowledge and skill as the test designer (even then it is a qualitatively different sort of cognitive process than designing). An exploratory tester is following (indeed forming as he goes) his own intentions and ideas. But, a scripted tester, to do well, must apprehend the intent of the one who wrote the script. Moreover, the scripted tester must go beyond the stated intent and honor the tacit intent, as well– otherwise it’s just shallow, bad testing.

Try using a script to guide a 10 year-old to drive a car safely on a busy city street. I don’t believe it can be done. You can’t overcome lack of basic skills with written instructions.

And sure, yeah, there is also the discipline issue, but that’s a minor thing, compared to the other things.

As for scripted test design, that also is a special skill. I can ask my son to put together a computer. He knows how to do that. But if I were to ask him for a comprehensive step-by-step set of instructions to allow me to do it, I doubt the result would help me much. Writing a script requires patience, judgment, and lots of empathy for the person who will execute it. He doesn’t yet have those qualities.

Most people don’t like to write. They aren’t motivated. Now give them a task that requires excellent writing. Bad work generally results.

Both on the design side and the execution side, scripted testing done adequately is harder than exploratory testing done adequately. It’s hard to separate an integrated cognitive activity into two pieces and still make it work.

The reason managers assume it’s simpler and easier is that they have low standards for the quality of testing and yet a strong desire for the appearances of order and productivity.

When I am training a new tester, I begin with highly exploratory testing. Eventually, I will introduce elements of scripting. All skilled testers must feel comfortable with scripted testing, for those rare times when it’s quite important.

Examples

1. Start browser

2. Go to CNN.com

3. Test CNN.com and report any problems you find.

This looks like a script, and it is sort of a script, but the interesting details of the testing are left unspecified. One of the elements of good test scripting is to match the instructions to the level of the tester as well as to the design goal of the test. In this case, no design goal is apparent.

This script does not necessarily represent bad testing– because it doesn’t represent any testing whatsoever.

1. Open Notepad

2. Type “hello”

3. Verify that “hello” appears on the screen.

This script has the opposite problem. It specifies what is completely unnecessary to specify. If the tester follows this script, he is probably dumbing himself down. There may be some real good reason for these steps, but again, the design goal is not apparent. The tester’s mind is therefore not being effectively engaged. Congratulations, designer, you’ve managed to treat a sophisticated miracle of human procreation, gestation, mothering, socializing, educating, etc. as if he were the equivalent of an animated poking stick. That’s like buying an iPad, then using it as a serving tray for a platter of cheese.

Behavior-Driven Development vs. Testing

The difference between Behavior-Driven Development and testing:

This is a BDD scenario (from Dan North, a man I respect and admire):

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then ensure the account is debited
And ensure cash is dispensed
And ensure the card is returned

This is that BDD scenario turned into testing:

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then check that the account is debited
And check that cash is dispensed
And check that the card is returned
And check that nothing happens that shouldn’t happen and everything else happens that should happen for all variations of this scenario and all possible states of the ATM and all possible states of the customer’s account and all possible states of the rest of the database and all possible states of the system as a whole, and anything happening in the cloud that should not matter but might matter.

Do I need to spell it out for you more explicitly? This check is impossible to perform. To get close to it, though, we need human testers. Their sapience turns this impossible check into plausible testing. Testing is a quest within a vast, complex, changing space. We seek bugs. It is not the process of  demonstrating that the product CAN work, but exploring if it WILL.

I think Dan understands this. I sometimes worry about other people who promote tools like Cucumber or jBehave.

I’m not opposed to such tools (although I continue to suspect that Cucumber is an elaborate ploy to spend a lot of time on things that don’t matter at all) but in the face of them we must keep a clear head about what testing is.

A Nice Quote Against Confirmatory Testing

Most of the technology of “confirmatory” non-qualitative research in both the social and natural sciences is aimed at preventing discovery. When confirmatory research goes smoothly, everything comes out precisely as expected. Received theory is supported by one more example of its usefulness, and requires no change. As in everyday social life, confirmation is exactly the absence of insight.  In science, as in life, dramatic new discoveries must almost by definition be accidental (“serendipitous”). Indeed, they occur only in consequence of some mistake.

Kirk, Jerome, and Miller, Marc L., Reliability and Validity in Qualitative Research (Qualitative Research Methods). Sage Publications, Inc, Thousand Oaks, CA, 1985.

Viva exploratory methods in science! Viva exploratory methods in testing! Viva testers who study philosophy and the social sciences!

(Thank you Michael Bolton for finding this quote.)

When Does a Test End?

The short answer is: you never know for sure that a test has ended.

Case in point. The license plate on my car is “tester.” It looks like this:

On December 20th, I received this notice in the mail:

As you see, it seems that the city of Everett, which is located between Orcas Island (where I live) and Seattle (where I occasionally visit) felt that I owed them for a parking violation. This is strange because I have never before parked in Everett, much less received a ticket there. A second reason this is strange is that the case number, apparently, is “111111111”.

At first I thought this was a hoax, but the phone number and address is real. The envelope was sent from Livonia, Michigan, and that turns out to be where Alliance One Receivables Management, Inc. is based. They collect money on behalf of many local governments, so that makes sense. It all looked legitimate, except that I’m not guilty, and the case number is weird.

Then it occurred to me that this may have been a TEST! Imagine a tester checking out the system. He might type “tester” for a license plate, not realizing (or not caring) that someone in Washington actually has that plate. He keys in a fake case number of “111111111” because that’s easy to type, and then he forgets to remove that test data from the database.

Praise the Humans

I called the county clerk’s office to ask about this. At first I was worried, because they used an automated phone service. But I quickly got through to a competent human female. What can humans do? Troubleshoot. She told me that there indeed was a record in their system that I owed them money, but that the case number did not refer to a real case. In fact, she said that the number was incorrectly formatted: all their case number start with a “10.”

“This can’t be right,” she said.

“Could it be test data? Are you just starting to use Alliance One?” I asked.

“We’ve been using Alliance One for years. Oh, but we’re just starting to use their electronic ticketing system.”

She told me I was probably right about it being a test, but that she would investigate and get back to me.

A few days later I received this notice:

So, there you have it. Someone ran a test on November 9th that did not conclude until December 23rd when it is stopped via a court order! Thank you, Judge Timothy B. Odell.

I’m sure this will appear on an episode of Law and Order: Clerical Intent one of these days.

Just imagine if this hadn’t been a parking ticket program, but rather something that told the FBI  to go and break down my door…

Morals of the Story

  1. Beware of testing on the production system.
  2. Always give the humans a way to correct the automation when it goes out of control. (Hear that, Skynet?)
  3. You never know when your test is over.
  4. If your name is “tester” or “test” or “testing”, eventually you will show up as test data in somebody’s project. Beware also if your name is “12345”, “asdf”, “qwerty”, “foobar”, or “999999999999999999999999.”

Yaron Sinai Says Stop Thinking, Stupid Tester

The Factory School is that community of process people who believe testing benefits from eliminating the human element as much as possible. They wish to mechanize testing, and to condition the humans within it to see themselves as machines and emulate machines as much as possible. It’s an idea that has a number of advantages, with the important caveat that it makes good software testing impossible by leaving no room in the process for skill and thinking.

Sometimes when I complain about the Factory School of software testing, people think I’m exaggerating or making it up.

But check out this quote from Yaron Sinai, CEO of Elementools:

“With Test Case, the team doesn’t think,” Sinai said. “They just need to follow the steps. And for you, as a testing manager, you know that once they completed a set of tests, you know they followed the steps that needed to be followed.”

At first when I saw this, I thought it was a joke news story. Apparently not. (For the sake of Mr. Sinai, I hope he was misquoted. I will gladly post a retraction if that is the case.)

The man’s tool is called Test Case, which is emblematic right there. It’s focus is on test cases, not good testing. His view of test cases appears to be test procedure steps, and he wants testers to stop thinking, dammit, and just follow steps. Like factory robots. Robots that don’t question or talk back. Nice. Saaafe. Rooooooboootttts.

I haven’t found much information about Mr. Sinai online. He apparently hasn’t written about testing, per se. At least he is consistent: he has contributed no ideas to the testing craft, and now he sells an idea prevention system in the guise of a test management tool.

I want to ask Mr. Sinai whether, as a CEO, he faithfully follows his “CEO cases” each day, written for him by other, smarter CEO’s. Or does he, gasp, think for himself? I bet he would reply that although he is a smart CEO, not all CEO’s are smart enough to make their own decisions, and thus it’s only reasonable that their work should be scripted. When I suggest that a minimum requirement to be a CEO should be the ability to think about business problems and make decisions on their own behalf, I’m sure he will say “that’s just not practical.”  By which he will mean, of course, that HE doesn’t know how to do it.

The Factory School promotes the antithesis of engineering, while often using the word engineering as if it were some corpse impaled on a stick. Their approach to managing an engineering process is to kill it. Indeed, a dead process, like a dead horse, is much easier to manage once you get used to the smell.

And a lot of top managers buy this crap because the demos are simple, they are unaware of alternatives that would actually help, and the perfume on their cravats tends to mask the stench of tool vendors who don’t know anything about the craft.

The IMVU Shuffle

Michael Bolton reported on our quick test of IMVU, whose development team brags about having no human-mediated test process before deploying their software to the field.

Some commentors have pointed out that the bugs we found in our twenty minute review weren’t serious– or couldn’t have been– because the IMVU  developers feel successful in what they have produced, and apparently, there are satisfied users of the service.

Hearing that, I’m reminded of the Silver Bridge, which fell down suddenly, one day, after forty years of standing up. Up to that day, it must have seemed quite reasonable to claim that the bridge was a high quality bridge, because– look!– it’s still standing! But lack of information is not proof of excellence, it turns out. That’s why we test. Testing doesn’t provide all possible information, but it provides some. Good testing will provide lots of useful information.

I don’t know if the IMVU system is good enough. I do know that IMVU has no basis to claim that their “continuous integration” process, with all their “automated test cases” has anything to do with their success. By exactly the same “not dead yet” argument, they could justify not running any test cases at all. I can’t help but mention that the finance industry used the same logic to defend deregulation and a weak enforcement of the existing laws that allowed Ponzi schemes and credit swap orders to cripple the world economy. Oops, there goes a few trillion dollars– hey maybe we should have been doing better oversight all these years!

It may be that no possible problem that could be revealed by competent testing would be considered a bad problem byIMVU. If that is the case, then the true reason they are successful is that they have chosen to offer a product that doesn’t matter to people who will accept anything they are offered. Of course, they could use ANY set of practices to do that.

Clearly, what they think they’ve done is establish a test process through automation that will probably discover any important problem that could happen before they release. That’s why Michael and I tested it, and we quickly verified what we expected to find: several problems that materially interfered with the claimed functionality of IMVU, and numerous glitches that suggested the presence of more serious problems nearby. Maybe its present users are willing to put up with it, or maybe they are willing to put up with it for now. But that’s not the point.

The point is that IMVU is not doing certain ordinary and obvious things that would reveal problems in their product and they promote that approach to doing business as if it’s an innovation instead of an evasion of responsibility.

The IMVU people can’t know whether there are, in fact, serious problems in their product because they have chosen not to discover them. That they promote this as a good practice (and that manual testing doesn’t scale, which is also bullshit) tells me that they don’t know what testing is for and they don’t know the difference between testing and a set of computerized behaviors called “test cases”.

They are setting themselves up to rediscover what many others have before them– why we test. Their own experiences will be the best teacher. I predict they will have some doozies.

We Need Better Testing Bloggers

I don’t understand the mentality of bloggers like this guy. His view of the history of testing is a fantasy that seems designed to insult people who study testing. It applies at most to certain companies, not to the field itself.

He says we need a better way to test. Those of us who are serious testers have actually been developing and demonstrating better ways to test for decades, as we keep up with technology. Where have you been, Steve? Get out much do ya?

He thinks automation is the answer. What a surprise that a programmer would say that. But the same thing was said in 1972 at the Chapel Hill Symposium. We’ve tried that already. Many many times we’ve tried it.

We know why automation is not the grand solution to the testing problem.

As a board member of AST, I should mention the upcoming CAST Conference— the most advanced practitioner’s testing conference I know. Go to CAST, Steve, and tell Jerry Weinberg to his face (the programmer who started the first independent test group, made up of programmers) all about your theory of testing history.

Also, Jerry’s new book Perfect Software and Other Illusions About Testing, will be available soon. It addresses misconceptions like “Just automate the testing!” along with many others. Jerry is not just an old man of testing. He’s the oldest among us.

Question: How Many Times Should You Run a Test?

Kevin asks: What is the best or industry standard for how many times a test case should be run?

There are questions that should not be answered. For instance, “What size unicorn do you wear?” or “How many cars should I own?” Sure, I could answer them, but the answers are worthless. My answers are A) I don’t wear unicorns and B) 2. In these cases, the more helpful reply is to question the question. For the first question, perhaps you said “uniform” and I misheard you. For the second question, perhaps you own a railroad and you were talking about train cars of different kinds, whereas I assumed you’re a small family and you were asking about automobiles.

I can tell you this for sure: No one I respect in the testing field will give you a direct answer to the general question of how many times a test should be run (except maybe as a joke).

Imagine if the answer was 100,000. Would you believe it? What if the answer was 7? Wouldn’t you wonder what was wrong with 6? I can imagine 7 being the right answer, but only for a very specific hypothetical case, not as any sort of general principle.

The first potentially useful answer I have is to tell you that this question would not even occur to you if you knew how to test, therefore, what you really need to do is start learning how to test. I mean if someone was re-wiring your house, and during that process he asked you what “voltage” is, wouldn’t you get someone else to wire your house? Like electrical work, plumbing, computer programming, or welding, good testing is a skilled activity.

I rarely give that answer, though, because I worry I will just leave people feeling discouraged.

The closest thing to a direct answer I can give you is this:

There exist no testing industry standards that are universally binding or even, in my opinion, more than negligibly helpful. Yes, there are documents that purport to be standards. If you are bound by them then you already know that. You aren’t subject to standards unless one has been imposed upon you by a regulating authority or by contract. Therefore, considering that testing costs money and time, I suggest that you don’t run any tests unless there is a reason to do so. In general, don’t do the same work a second time if you have already done it once. Certainly, if your clients would benefit from you running a test again, go for it. Otherwise, you are just indulging in compulsive/obsessive behavior, and you need help of a different kind than I offer.

A problem with this answer is that it begs the question of how you know when to run a test again. Fortunately, I wrote an essay on possible reasons to repeat tests. I can think of ten good reasons that you may want to repeat any given test (along with one big reason not to).

That’s a pretty good answer, but I think I can offer a little more:

Your job is probably to discover if there are terrible as-yet-unknown problems in your very complex product that you have little time to test. To do that job really well requires that you design and perform many tests, more tests than you probably have time to run. Therefore, when you run a test a second time, you are spending precious time and resources (even if it’s automated, though possibly less so) on something other than running a test you have not yet run that may find one of those big bugs you haven’t yet found. Get it?

So, how about having a small set of very basic tests that touch upon a lot features of the product. You may even want to automate these. It should take ten minutes to run these tests, ideally. Perhaps as long as an hour. Repeat those for every build. Their purpose is to quickly detect huge obvious things that may be wrong. Call that the smoke test suite. For everything else, make a test coverage outline that lists every significant element of the product and every significant element of data. Visit the items on that list and test each one according to its importance and potential for failure. Whenever any part of the product changes, try to figure out what could have been affected, and retest that area– but using different tests; perhaps variations on what you’ve already done.

By the way, the more you learn about testing, the less you will find advice like the preceding paragraph useful, because you will carry within you the ability to design your own test strategy that fits your specific purposes and contexts.

Manual Tests Cannot Be Automated (DEPRECATED)

[Note: This post is here only to serve as a historical example of how I used to speak about “automated testing.” My language has evolved. The sentiment of this post is still valid, but I have become more careful– and I think more professional– in my use of terms.]

I enjoy using tools to support my testing. As a former production coder, automated tests can be a refreshing respite from the relatively imponderable world of product analysis and heuristic test design (I solve sudoku puzzles for the same reason). You know, the first tests I ever wrote were automated. I didn’t even distinguish between automated and manual tests for the first couple of years of my career.

Also for the first six years, or so, I had no way to articulate the role of skill in testing. Looking back, I remember making a lot of notes, reading a lot of books, and having a feeling of struggling to wake up. Not until 1993 did my eyes start to open.

My understanding of cognitive skills of testing and my understanding of test automation are linked, so it was some years before I came to understand what I now propose as the first rule of test automation:

Test Automation Rule #1: A good manual test cannot be automated.

No good manual test has ever been automated, nor ever will be, unless and until the technology to duplicate human brains becomes available. Well, wait, let me check the Wired magazine newsfeed… Nope, still nothing human brain scanner/emulators.

(Please, before you all write comments about the importance and power of automated testing, read a little bit further.)

It is certainly possible to create a powerful and useful automated test. That test, however, will never have been a good manual test. If you then read and hand-execute the code– if you do exactly what it tells you– then congratulations, you will have performed a poor manual test.

Automation rule #1 is based on the fact that humans have the ability to do things, notice things, and analyze things that computers cannot. This is true even of “unskilled” testers. We all know this, but just in case, I sprinkle exercises to demonstrate this fact throughout my testing classes. I give students products to test that have no specifications. They are able to report many interesting bugs in these products without any instructions from me, or any other “programmer.”

A classic approach to process improvement is to dumb down humans to make them behave like machines. This is done because process improvement people generally don’t have the training or inclination to observe, describe, or evaluate what people actually do. Human behavior is frightening to such process specialists, whereas machines are predictable and lawful. Someone more comfortable with machines sees manual tests as just badly written algorithms performed ineptly by suger-carbon blobs wearing contractor badges who drift about like slightly-more-motivated-than-average jellyfish.

Rather than banishing human qualities, another approach to process improvement is to harness them. I train testers to take control of their mental models and devise powerful questions to probe the technology in front of them. This is a process of self-programming. In this way of working, test automation is seen as an extension of the human mind, not a substitute.

A quick image of this paradigm might be the Mars Rover program. Note that the Mars Rovers are completely automated, in the sense that no human is on Mars. Yet they are completely directed by humans. Another example would be a deep sea research submarine. Without the submarine, we couldn’t explore the deep ocean. But without humans, the submarines wouldn’t be exploring at all.

I love test automation, but I rarely approach it by looking at manual tests and asking myself “how can I make the computer do that?” Instead, I ask myself how I can use tools to augment and improve the human testing activity. I also consider what things the computers can do without humans around, but again, that is not automating good manual tests, it is creating something new.

I have seen bad manual tests be automated. This is depressingly common, in my experience. Just let me suggest some corollaries to Rule #1:

Rule #1B: If you can truly automate a manual test, it couldn’t have been a good manual test.

Rule #1C: If you have a great automated test, it’s not the same as the manual test that you believe you were automating.

My fellow sugar blobs, reclaim your heritage and rejoice in your nature. You can conceive of questions; ask them. You are wonderfully distractable creatures; let yourselves be distracted by unexpected bugs. Your fingers are fumbly; press the wrong keys once in while. Your minds have the capacity to notice hundreds of patterns at once; turn the many eyes of your minds toward the computer screen and evaluate what you see.

Studying Jeff Atwood’s Paint Can

I just found Jeff Atwood’s Coding Horror blog. He’s an interesting writer and thinker.

One of his postings presents a good example of the subtle role of skill even in highly scripted activities. He writes about following the instructions on a paint can. His article links to an earlier article, so you might want to read both.

The article is based on a rant by Steve McConnell in his book Rapid Development about the importance of following instructions. Steve talks about how important it is to follow instructions on a paint can when you are painting.

I want to talk, here, about the danger of following instructions, and more specifically, the danger of following people who tell you to follow instructions when they are not taking responsibility for the quality of your work. The instruction-following myth is one of those cancers on our craft, like certification and best practices.

[Full Discolosure: Relations between me and McConnell are strained. In the same book, Rapid Development, in the Classic Mistakes section, Steve misrepresented my work with regard to the role of heroism in software projects. He cited an article I wrote as if it was indicative of a point of view that I do not hold. It was as if he hadn’t read the article he cited, but only looked at the title. When I brought the error to his attention, he insisted that he did indeed understand my article and that his citation was correct.]

Let’s step through some of what Jeff writes:

“But what would happen if I didn’t follow the instructions on the paint can? Here’s a list of common interior painting mistakes:

The single most common mistake in any project is failure to read and follow manufacturer’s instructions for tools and materials being used.”

  • Jeff appears to be citing a study of some kind. What is this study? Is it trustworthy? Is Jeff himself telling me something, or is Jeff channelling a discarnate entity?
  • When he says “the most common mistake” does he mean the one that most frequently is committed by everyone who uses paint? Novices? Professionals? Or is he referring to the most serious mistakes? Or is he referring to the complete set of possible mistakes that are worth mentioning?
  • Is it important for everyone to follow the instructions, or are the instructions there for unskilled people only?
  • Why is it a “mistake” not to read-and-follow instructions? Mistake is a loaded term; one of those danger words that I circle in red pencil and put a question mark next to. It may be a mistake not to follow certain instructions in a certain context. On the other hand, it may be a mistake to follow them.

Consider all the instructions you encounter and do not read. Consider the software you install without reading the “quickstart” guide. Consider the clickwrap licenses you don’t read, or the rental cars you drive without ever consulting the drivers manual in states where you have not studied the local driving laws. Consider the doors marked push that you pull upon. Consider the shampoo bottle that says “wash, rinse, repeat.” Well, I have news for the people who make Prell: I don’t repeat. Did you hear me? I don’t repeat.

I would have to say that most instructions I come across are unimportant and some are harmful. Most instructions I get about software development process, I would say, would be harmful if I believed them and followed them. Most software process instructions I encounter are fairy tales, both in the sense of being made up and in the sense of being cartoonish. Some things that look like instructions, such as “do not try this at home” or “take out the safety card and follow along,” are not properly instructions at all, they are really just ritual phrases uttered to dispel the evil spirits of legal liability. Other things that really are instructions are too vague to follow, such as “use common sense” or “be creative” or “follow the instructions.”

There are, of course, instructions I could cite that have been helpful to me. I saw a sign over a copy room that said “Do not use three hole paper in this copy machine… unless you want it to jam.” and one next to it that said “Do not use the Microwave oven while making copies… unless you want the fuse to blow.” I often find instructions useful when putting furniture together; and I find signs at airports generally useful, even though I have occasionally been steered wrong.

Instructions can be useful, or useless, or something in between. Therefore, I propose that we each develop a skill: the skill of knowing when, where, why and how to follow instructions in specific contexts. Also, let’s develop the skill of giving instructions.

Jeff goes on to write:

“In regard to painting, the most common mistakes are:

* Not preparing a clean, sanded, and primed (if needed) surface.
* Failure to mix the paints properly.
* Applying too much paint to the applicator.
* Using water-logged applicators.
* Not solving dampness problems in the walls or ceilings.
* Not roughing up enamel paint before painting over it.”

Again with the “most common.” Says who? I can’t believe that the DuPont company is hiding in the bushes watching everybody use paint. How do they know what the most common mistakes are?

My colleague Michael Bolton suggested that the most common mistake is “getting the paint on the wrong things.” Personally, I suspect that the truly most common mistake is to try to paint something, but you won’t see THAT on the side of a paint can. As I write this, my bathroom is being repainted. Notice that I am writing and someone else is painting. Someone, I bet, who knows more about painting than I do. I have not committed the mistake of trying to paint my own bathroom, nor of attempting to read paint can instructions. Can I prove that is the most common mistake? Of course not. But notice that the rhetoric of following instructions is different if you draw a different set of lines around the cost/value equation represented by the word “mistake.”

Also, not knowing much about painting, I don’t understand these “mistakes.” For instance:

  • What is a clean surface? How do I sand it? What does “primed” mean and how do I know if that is needed?
  • How do I mix paints? Why would I even need to mix them? What paints should I mix?
  • What is the applicator and how do I apply paint to it? How much is enough?
  • What is a “water-logged” applicator? How does it get water-logged? Is there a “just enough” level of water?
  • How does one recognize and solve a “dampness problem”?
  • I assume that “roughing up” enamel paint means something other than trying to intimidate it. I assume it means sanding it somehow? Am I right? If so, how rough does it need to be and how do I recognize the appropriate level of roughness?

I am not kidding, I really don’t know this stuff.

Then Jeff writes:

“What I find particularly interesting is that none of the mistakes on this checklist have anything to do with my skill as a painter.”

I think what Jeff meant to say is that they have nothing to do with what he recognizes as his skill as a painter. I would recognize these mistakes, assuming for the moment that they are mistakes, as being strongly related to his painting skill. Perhaps since I don’t have any painting skill, it’s easier for me to see it than for him. Or maybe he means something different by the idea of skill than I do. (I think skill is an improvable ability to do something) Either way, there’s nothing slam dunk obvious about his point. I don’t see how it can be just a matter of “read the instructions stupid.”

Jeff writes:

“My technical proficiency (or lack thereof) as a painter doesn’t even register!”

Wait a minute Jeff, think about this. What does have to do with your proficiency as a painter? You must have something in mind. If proficiency is a meaningful idea, then you must believe there is a detectable difference between having proficiency and not having proficiency, and it must go beyond this list of issues. Rather than concluding that your skill doesn’t enter into it, perhaps one could look at the same list of issues and interpret it as a list of things unskilled people frequently do when they try to paint things that often leads them to regret the results. It’s a warning for the unskilled, not a message for skilled painters. A skilled painter might actually want to do these things, for instance, to paint with a water-logged applicator to get some particular artistic effect.

Jeff writes:

“To guarantee a reasonable level of quality, you don’t have to spend weeks practicing your painting skills. You don’t even have to be a good painter. All you have to do is follow the instructions on the paint can!”

Now I have logic vertigo. How did we get from avoiding obvious mistakes, where we started, to “reasonable quality”? Would a professional house painter agree that there is no skill required to achieve reasonable quality? Would a professional artist say that?

(And what is reasonable quality?)

Even following simple instructions requires skill and tradeoff decisions. A paint can is neither a supervisor, nor a mentor, nor a judge of quality. Don’t follow instructions, instead I suggest, consider applying this heuristic: instructions might help.

And one more thing… Does anyone else find it ironic that Jeff’s article about reading instructions on paint cans would include a photo of a paint can where the instructions have been partly obscured by paint? Perhaps the first instruction should be “Check that you see this sentence. If not, please wait for legible instructions.”

Lack of Will

A core problem with quality in our industry is lack of will.

Lack of “will work”, that is. This is because it’s much easier to tell that a product can work than that it will work. And too often it turns out that products will not work in some situations even though they can in others.

Yet, many testers, developers, and managers are recklessly confident in the will part when they’ve only observed the can part.

I often hear someone say that their smoke test suite “just checks that the basically functionality works.” But even this modest sounding goal is impossible to achieve. You can’t derive will from can, unless you give up certainty (“it will work, and I might be wrong”), or you run every possible test (and you can’t do that).

So, the claim of “…it works” is shorthand for something more uncertain, like this:

“During the tests I performed, I looked for cases where the product did not sufficiently fulfill the requirements I was testing for, but I did not see any. Furthermore, I have performed enough of the right kind of tests to justify confidence that the product probably will fulfill those requirements in the future for other people in other cases.”

Or more simply:

“It appeared to meet [some requirement] to [some degree] while I was testing it. It’s possible that the product works.”