Six Things That Go Wrong With Discussions About Testing

Talking about software testing is not easy. It’s not natural! Testing is a “meta” activity. It’s not just a task, but a task that generates new tasks (by finding bugs that should be fixed or finding new risks that must be examined). It’s a task that can never be “completed” yet must get “done.”

Confusion about testing leads to ineffective conversations that focus on unimportant issues while ignoring the things that matter. Here are some specific ways that testing conversations fail:

  1. When people care about how many test cases they have instead of what their testing actually does. The number of test cases (e.g. 500, 257, 39345) tells nothing to anyone about “how much testing” you are doing. The reason that developers don’t brag about how many files they created today while developing their product is that everyone knows that it’s silly to count files, or keystrokes, or anything like that. For the same reasons, it is silly to count test cases. The same test activity can be represented as one test case or one million test cases. What if a tester writes software that automatically creates 100,000 variations of a single test case? Is that really “100,000” test cases, or is it one big test case, or is it no test case at all? The next time someone gives you a test case count, practice saying to yourself “that tells me nothing at all.” Then ask a question about what the tests actually do: What do they cover? What bugs can they detect? What risks are they motivated by?
  2. When people speak of a test as an object rather than an event. A test is not a physical object, although physical things such as documentation, data, and code can be a part of tests. A test is a performance; an activity; it’s something that you do. By speaking of a test as an object rather than a performance, you skip right over the most important part of a test: the attention, motivation, integrity, and skill of the tester. No two different testers ever perform the “same test” in the “same way” in all the ways that matter. Technically, you can’t take a test case and give it to someone else without changing the resulting test in some way (just as no quarterback or baseball player will execute the same play in the same way twice) although the changes don’t necessarily matter.
  3. When people can’t describe their test strategy as it evolves. Test strategy is the set of ideas that guide your choices about what tests to design and what tests to perform in any given situation. Test strategy could also be called the reasoning behind the actions that comprise each test. Test strategy is the answer to questions such as “why are these tests worth doing?” “why not do different tests instead?” “what could we change if we wanted to test more deeply?” “what would we change if we wanted to test more quickly?” “why are we doing testing this way?” These questions arise not just after the testing, but right at the start of the process. The ability to design and discuss test strategy is a hallmark of professional testing. Otherwise, testing would just be a matter of habit and intuition.
  4. When people talk as if automation does testing instead of humans. If developers spoke of development the way that so many people speak of testing, they would say that their compiler created their product, and that all they do is operate the compiler. They would say that the product was created “automatically” rather than by particular people who worked hard and smart to write the code. And management would become obsessed with “automating development” by getting ever better tools instead of hiring and training excellent developers. A better way to speak about testing is the same way we speak about development: it’s something that people do, not tools. Tools help, but tools do not do testing.There is no such thing as an automated test. The most a tool can do is operate a product according to a script and check for specific output according to a script. That would not be a test, but rather a fact check about the product. Tools can do fact checking very well. But testing is more than fact checking because testers must use technical judgment and ingenuity to create the checks and evaluate them and maintain and improve them. The name for that entire human process (supported by tools) is testing. When you focus on “automated tests” you usually defocus from the skills, judgment, problem-solving, and motivation that actually controls the quality of the testing. And then you are not dealing with the important factors that control the quality of testing.
  5. When people talk as if there is only one kind of test coverage. There are many ways you can cover the product when you test it. Each method of assessing coverage is different and has its own dynamics. No one way of talking about it (e.g. code coverage) gives you enough of the story. Just as one example, if you test a page that provides search results for a query, you have covered the functionality represented by the kind of query that you just did (function coverage), and you have covered it with the particular data set of items that existed at that time (data coverage). If you change the query to invoke a different kind of search, you will get new functional coverage. If you change the data set, you will get new data coverage. Either way, you may find a new bug with that new coverage. Functions interact with data; therefore good testing involves covering not just one or the other but also with both together in different combinations.
  6. When people talk as if testing is a static task that is easily formalized. Testing is a learning task; it is fundamentally about learning. If you tell me you are testing, but not learning anything, I say you are not testing at all. And the nature of any true learning is that you can’t know what you will discover next– it is an exploratory enterprise.It’s the same way with many things we do in life, from driving a car to managing a company. There are indeed things that we can predict will happen and patterns we might use to organize our actions, but none of that means you can sleepwalk through it by putting your head down and following a script. To test is to continually question what you are doing and seeing.

    The process of professional testing is not design test cases and then follow the test cases. No responsible tester works this way. Responsible testing is a constant process of investigation and experiment design. This may involve designing procedures and automation that systematically collects data about the product, but all of that must be done with the understanding that we respond to the situation in front of us as it unfolds. We deviate frequently from procedures we establish because software is complicated and surprising; and because the organization has shifting needs; and because we learn of better ways to test as we go.

Through these and other failures in testing conversations, people persist in the belief that good testing is just a matter of writing ever more “test cases” (regardless of what they do); automating them (regardless of what automation can’t do); passing them from one untrained tester to another; all the while fetishizing the files and scripts themselves instead of looking at what the testers are doing with them from day to day.

Regression Test Tool for Trash Walking

My recent flirtation with trash-pickup-as-physical-exercise has led me down a familiar path. Even though it is not my responsibility to clean a public road in the first place, once I do it, I find that I feel irrational ownership of it. I want it to stay clean. But since I’ve adopted about 9 miles of road so far, it takes too long to walk the whole route in a day (remember I have to make one pass for each side of the road, or else I am going to miss a lot of trash). Regression trash walking takes too much effort!

I want automation!

I can travel faster in a car, but there are few places I can safely stop the car. I was thinking maybe I should get a motor-scooter instead; a Vespa or something. But that defeats the primary purpose of my trash walking– which is supposed to be exercise. So, now I’m thinking about maybe a bike will be the ticket. I could combine this with the Steel Grip grabber tool to quickly nab the trash and get back on the road.

Just as with software testing, a big problem with introducing tools to a human process is that it can change the process to make it less sensitive (or far too sensitive). In this case, any vehicle that moves fast will cause me to miss some trash. On the other hand, I will still catch a lot of the trash. It’s probably a good enough solution.

On the whole I think it is a good idea to use a bicycle. The remaining problem is that my wife is terrified I will be hit by a car.

Test Coverage Parallels in Trash Walking

First, about scope…

As I began my trash walking (see here and here), I quickly found that I needed guidelines on what counts as my work space and work product. I am collecting trash along the road, so what does that entail? Here is what I came up with:

  • I began with a broad operational definition of trash: “any loose, inanimate object of low value that may disturb the tranquility of the touring experience.” This applies obviously to the tranquility of pedestrians, cyclists, or motorists, and possibly others.
  • I ignore anything that seems especially toxic or a bio-hazard. Thus no dog poop or road kill (because I am not equipped for that).
  • I ignore things that seem to be serving a purpose by being there.
  • I ignore things that are too large for my trash bag.
  • I ignore things that are too small to pick up.
  • I ignore things that require substantial digging to free from the ground.
  • I ignore groups of things that are too numerous (e.g. one thousand toothpicks)

In testing terms, we call all this my oracle (alternatively, you can say that each list item is an oracle, it makes no difference since we never count oracles, we just use them). An oracle is a means to recognize a problem when you encounter it. Oracles define what is and is not your business as a tester in terms of what you are looking for. Notice that I have described my oracles only in a high level sense. The truth is I have a lot more oracles that I don’t know how to describe. For instance I know how to recognize a broken plastic container, and distinguish that from a sea shell, even though I don’t know how to describe that knowledge. Written oracles are almost always approximations or summaries of the real oracles that a tester uses.

Sometimes the oracle is challenging. Examples:

  • I once found two flipchart markers on the ground next to a driveway and an upright stick. I left them there thinking that maybe someone was putting up a sign. When I returned the next day they were still there, so I decided they must be trash.
  • I saw a child’s pair of prescription glasses on the beach. I left them in case the owners returned but they also were still there the next day. Conclusion: trash!
  • I saw two sneakers and socks on the beach, far from any person. I kept my eye on them, and eventually someone collected them. Close one. I really wanted to put those in my trash bag.
  • I found an envelope taped to a park bench that said “Blue Clue #6.” I left it alone in case it was for some kind of puzzle game that hadn’t yet been played. If it’s still there tomorrow, I’ll get a clue.

Scope is part of mission. My scope is the problem set that belongs to me as opposed to someone else. The totality of my oracles are one aspect of scope, because they dictate what counts as a problem. Another thing that defines scope is what things I am supposed to be looking at. In this case, what is my work surface? What is the place I am searching? I determined this to be:

  • the road itself
  • the shoulder and the ditch (one side of the road on each pass)
  • potentially anywhere visible to a tourist from the road
  • potentially any property which my wife frequents
  • NOT anywhere that is too difficult to access, where difficulty is a subjective assessment related to energy (“that’s too far away”), injury risk (“I’m not climbing down that bank”), and social transgression (“there is a no trespassing sign on that tree”)

Finally I decide on a route. I determined that according to the travel patterns of my primary client: my wife, Lenore. You could say I looked at her “use cases” of road use. Apart from exercise, her respect and pleasure is the big reason I’m doing all this. I want her not to see trash anywhere on the island. (Interestingly, I was unconscious of that motivation until I had already done more than 30 miles of trash-walking.)

My scope is therefore anywhere my wife is likely to see from her car or on foot on Orcas Island. My mission is to remove trash from that area.

My coverage, on the other hand, is what I actually look at. Here is a map of my coverage (data collected with Gaia GPS on an iPhone, then exported to Google Earth):

Let’s zoom in and note some parallels with software testing.

1. My coverage analysis tool is not as accurate as I would wish.

According to this I was wiggling all over the road. But I promise I wasn’t. There are several meters of random inaccuracy in the GPS data.

Similarly, in testing, I rarely get the fantastic logging that let’s me say exactly what was and was not tested. Remember also that even if the coverage map were perfectly accurate, I would still not be able to tell whether the tester was paying attention during that testing. The power of the oracles vary depending on the focus of the tester, unless the oracles are automated. And many vital oracles cannot be automated.

2. Sometimes your client asks for specific coverage.

Lenore asked me to clean the beach, since she often walks there. She and I covered this together as a pair. The beach was too wide to do all at once, so we did a pass on the high part and then a pass on the low part. Lenore was a bit obsessive about what counted as trash, so we picked up literally anything that was visible to the naked eye and seemed like trash. This included plastic particles the size of a penny.

This is similar to risk-based testing. You focus on areas more intensively if they are more critical to your client– defined as a person whose opinion of quality matters.

3. Sometimes you test where it’s easy, not where the bugs are.

This is a private property where my wife likes to walk. When I walk with her, I carry a trash bag. We did find a little trash but only a little, because the owners are pretty clean.

4. Sometimes you decide on your own that deeper coverage is needed.

This is a little public park. I couldn’t walk by when I saw the trash there, even though my wife never goes there.

5. Sometimes you get clues from users.

A fellow in a car pulled into the substation and said “hey! you missed something over here!” That was helpful. I think most people look at me and assume I am their tax dollars at work. I like that. Life is better when people appreciate their government.

6. Sometimes your coverage decisions reflect vanity rather than business sense.

That is the parking lot of the medical center where my doctor works. I wanted her to see me picking up trash so that she knows I really am exercising.

And it’s true in testing, too. Sometimes I want to test in a way that is accessible and impressive to outsiders rather than merely reasonable and sensible. Sometimes I need a little appreciation.

Test Talk About Trash Walks

So, for exercise, I’m picking up trash. Here is a picture of me all kitted up:

Perfectly equipped for road trash collection!

So far, I’ve done 37 miles of trash collecting. And I can’t help but see some interesting parallels with software testing…

Just Like Testing #1: I can use tools, but I cannot automate my work.

I have to make a lot of judgments about what to pick up and what to leave. It would be difficult to even to write a detailed and complete specification for what constitutes trash and what does not, let alone design a machine to pick it up. Yes, there are semi-automated street sweeping machines, and they do great things– but they are also expensive, loud, and disruptive. They also work only on flat paved surfaces, as far as I know, whereas I am cleaning along country roads and fishing garbage out of ditches.

Just Like Testing #2: I crave trouble. If the product is too clean I feel depressed.

I smile when I see a nice juicy old beer bottle. That is paydirt, baby. Aluminum cans and brightly colored drinking cups are almost as sweet. Apart from anything else, they weigh down my trash bag so that it doesn’t flap in the wind, but mainly it is from these undeniably pieces of unsightly rubbish that give me a charge.

On the other hand, when I don’t see trash, I feel like I haven’t done anything. I know that I have: my eyes have searched for trash and that’s a service. But finding trash gives me something to show for my work. I can drop the bag in front of my wife and say “seeee? I’m useful!”

Just Like Testing #3: Trouble that is most likely to upset normal people makes me most happy.

Brightly colored candy wrappers are terrible to see on a country road surrounded by nature, but that same bright color makes it easy for me to spot. So, I hope candy and soda companies don’t start marketing their wares in camouflaged containers. Similarly in testing, when we see a dramatic crash or data loss in testing, we testers give each other high-fives and yessses and “you are a steely eyed missile man”-type comments. It takes extraordinary restraint not to do that right in front of the developer whose product has just failed.

Just Like Testing #4: Gratuitous trouble makes me tired and depressed.

I have sometimes come across caches of garbage, as if someone just hurled a kitchen trash bag off the back of a truck. This is not fun. I don’t mind the ordinary careless litter, to some degree, but industrial scale contempt for the environment make me feel disgust instead of fun.

Most of the trash I find is rather innocent. It falls into these categories:

  • Food wrappers: things a kid might throw out a car window.
  • Brick-a-brack: things that might fall out of the back of a contractor’s pickup truck.
  • Featherweight trash: things that accidentally blow out the window of a car
  • Cycling debris: things that a cyclist might drop accidentally; occasional items of clothing
  • Auto debris: pieces of cars
  • Transported trash: things blown onto the road from adjoining property

But when an item or items of garbage seem diabolical or contemptuous, or systematically careless, I do get a little angry. This is similar to the feeling a tester gets when the developer won’t even do the most basic of testing before throwing it over the wall for a test cycle.

Lots of nice brightly colored things in there, but also some weeds that accidentally got caught up with the gripper… Just goes to show that tools aren’t perfect.

Just Like Testing #5: I became hyper-sensitive to regressions.

Today I drove into town and saw at least four pieces of trash along the way that had not been there yesterday. I am annoyed. This was a perfectly good road when I last cleaned it and now it’s all messed up again. Now, I know, objectively, it’s not “all messed up.” It is still far cleaner than it was when I started working it. But all I can think about is that new trash! Who did it? BURN THEM!

Testers also tend to get oversensitive and find it hard to accept that quality can be good enough when we know that there are unfixed bugs in the product. I guess anything you invest yourself in becomes sharper and larger and more important in that way.

Just Like Testing #6: I overlook some trash no matter how hard I try to look for it.

My wife helped me clean the local beach. We went single file, so she caught some of the things that I missed. There were a lot of them. Some trash I didn’t see was pretty big. My inner experience is “How did I miss that???!?!?!” But I know how I missed it: inattentional blindness.

Inattentional blindness is when you don’t see something that is in your field of view because your attention is on something else. This can have the effect of feeling as if an object literally appeared out of thin air when it was right in front of you all the time. I once covered an area, then turned and looked behind me, and saw a medium sized plastic bag just a few feet behind me. I had walked right over it without seeing it. It’s frustrating, but it’s a fact of life I must accept.

This is why pair testing, group testing, or making multiple passes through the same product helps so much. I always want redundancy. Along the main roads, I want to make at least two passes on each side before I move on to another road.

When I am “regression walking,” I might expect to find only newly dropped trash since my last walk. Instead, just like in testing life, I often find old trash that has been there all along but never before noticed.

[Added August 7th, 2017]

Just Like Testing #7: My quality standards are not fixed or absolute; they vary and they are relative.

I notice that when I am cleaning a very cluttered area, I tend to ignore very small pieces of trash, or trash that is hard to access. But when I covering a clean area, I raise my standards and pick up even tiny pieces (smaller than a bottle cap), as if I am “hungry” for trash.

Similarly, I might pick up a small piece because it is next to a large piece, since I am already “in the neighborhood.” Also, if a large piece has been shattered into small pieces, like a broken beer bottle, I will pick up even tiny pieces of the bottle in order to get the “whole bottle.”

All this is evidence that I do not judge trash on an absolute scale, but rather judge it differently according to a variety of factors, including what’s nearby, what I’ve recently seen, my fatigue, my self-judgment, etc. It’s the same with bugs. I want to find something, but I also have limited energy. And this is why it is good for me to take multiple passes through an area. It helps me to square my selection heuristics with my general and absolute sense of my mission and proper quality standard.

The Unnecessary Tool

My wife bought a Steel Grip 36in Lightweight Aluminum Pick Up Tool.

I saw it on our combination dining room/craft/office table and asked her what it was for.

“My eye pillow fell behind the bed and I can’t reach it.” she told me. (This led to some confusion for me at first because I thought she was referring to an iPillow, presumably an Apple product I had never heard of.)

“I can easily get that for you.” I eventually replied while reaching behind the bed and retrieving her iPillow.

That seemed to end the conversation. But I was still surprised that she bought an entire new gadget to accomplish something that is pretty easy to solve with ordinary human effort– such as asking her husband. I couldn’t resist teasing her about it as I discovered that the squeaky gripper was also a good tool for annoying my dogs. Lenore is usually the epitome of sensible practicality. She’s usually the one restraining me from buying unnecessary things. So, it felt good to see her have a little lapse, for once.

In testing, I see a lot of that: introducing tools that aren’t needed and mostly just clutter up the place. All over the industry, technocrats seem to turn to tools at the slightest excuse. Tools will save us! More tools. Never mind the maintenance costs. Never mind what we lose by distancing ourselves from our problems. Automation!

(Please don’t bother commenting about your useful tool kit. I’m not talking about useful tools, here. I’m talking about a tool that was purchased specifically to solve a problem that was already easily solved without it. I am talking about an unnecessary tool.)

So then what happens…?

A few weeks later, I am getting bored with my walks. Well, let me back up: I am at the age where physical fitness is no longer about looking sharp, or even feeling good. It’s becoming a matter of do I want to keep living or what? The answer is yes I want to live, Clarence. That means I must exercise. This year I have been walking intensively.

But it’s boring. I can’t get anything done when I’m walking. I don’t like listening to music, and anyway I feel uncomfortable being cut off from the sounds of my surroundings. Therefore, I trudge along: bored.

One day I realized I can have more fun walking if I picked up garbage along my way. That way I would be making the world better as I walked. At first I carried a little trash sack at my waist, but my ambitions soon grew, and within days I decided it was time to walk the main road into town with a 50-gallon industrial trash bag and a high viz vest.

As I was leaving on my first mission, Lenore handed me the gripper.

It was the perfect tool.

It was exactly what I needed.

It would save my back and knees.

My gripper gets a lot of use, now. I’m wondering if I need to upgrade to a titanium and carbon fiber version. I’m thinking of crafting a holster for it.

Is There a Moral Here? Yes.

One of the paradoxes of Context-Driven testing is that on the one hand, you must use the right solution for the situation; while, on the other hand, you can only know what the right solution can be if you have already learned about it, and therefore used it, BEFORE you needed it. In other words, to be good problem solvers, we also need to dabble with and be curious about potential solutions even in the absence of a problem.

The gripper spent a few weeks lying around our home until suddenly it became my indispensable friend.

I guess what that means is that it’s good to have some tolerance and playfulness about experimenting with tools. Even useless ones.