We. Use. Tools.

Context-Driven testers use tools to help ourselves test better. But, there is no such thing as test automation.

Want details? Here’s the 10,000 word explanation that Michael Bolton and I have been working on for months.

Editor’s Note: I have just posted version 1.03 of this article. This is the third revision we have made due to typos. Isn’t it interesting how hard it is to find typos in your own work before you ship an article? We used automation to help us with spelling, of course, but most of the typos are down to properly spelled words that are in the wrong context. Spelling tools can’t help us with that. Also, Word spell-checker still thinks there are dozens of misspelled words in our article, because of all the proper nouns, terms of art, and neologisms. Of course there are the grammar checking tools, too, right? Yeah… not really. The false positive rate is very high with those tools. I just did a sweep through every grammar problem the tool reported. Out of the five it thinks it found, only one, a missing hyphen, is plausibly a problem. The rest are essentially matters of writing style.

One of the lines it complained about is this: “The more people who use a tool, the more free support will be available…” The grammar checker thinks we should not say “more free” but rather “freer.” This may be correct, in general, but we are using parallelism, a rhetorical style that we feel outweighs the general rule about comparatives. Only humans can make these judgments, because the rules of grammar are sometimes fluid.

Behavior-Driven Development vs. Testing

The difference between Behavior-Driven Development and testing:

This is a BDD scenario (from Dan North, a man I respect and admire):

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then ensure the account is debited
And ensure cash is dispensed
And ensure the card is returned

This is that BDD scenario turned into testing:

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then check that the account is debited
And check that cash is dispensed
And check that the card is returned
And check that nothing happens that shouldn’t happen and everything else happens that should happen for all variations of this scenario and all possible states of the ATM and all possible states of the customer’s account and all possible states of the rest of the database and all possible states of the system as a whole, and anything happening in the cloud that should not matter but might matter.

Do I need to spell it out for you more explicitly? This check is impossible to perform. To get close to it, though, we need human testers. Their sapience turns this impossible check into plausible testing. Testing is a quest within a vast, complex, changing space. We seek bugs. It is not the process of  demonstrating that the product CAN work, but exploring if it WILL.

I think Dan understands this. I sometimes worry about other people who promote tools like Cucumber or jBehave.

I’m not opposed to such tools (although I continue to suspect that Cucumber is an elaborate ploy to spend a lot of time on things that don’t matter at all) but in the face of them we must keep a clear head about what testing is.

Avoiding My Curse on Tool Vendors

Adam Goucher noticed that I recently laid a curse upon commercial test tool vendors (with the exception of Hexawise, Blueberry Consultants, and Atlassian). He wondered to me how a tool vendor might avoid my curse.

First, I’m flattered that he would even care who I curse. But, it’s a good question. Here’s my answer:

Test tool vendors that bug me:

  • Any vendor who wants me to pay for every machine I use their tool upon. Guys, the nature of testing is that I need to work with a lot of machines. Sell me the tool for whatever you want to charge, but you are harming my testing by putting obstacles between me and my test lab.
  • Any vendor that sell tools conceived and designed by a goddamn developer who hates to goddamn test. How do I know about the developer of a test tool? Well, when I’m looking at a tool and I find myself asking “Have these vendor bozos ever actually had to test something in their lives? Did they actually want a tool like this to help them? I bet this tool will triple the amount of time and energy I have to put into testing, and make me hate every minute of it” then I begin to suspect there are no great lovers of testing in the house. This was my experience when I worked with Rational Test Manager , in 2001. I met the designer of that tool: a kid barely out of MIT with no testing or test management experience who informed me that I, a Silicon Valley test management veteran, wasn’t qualified to criticize his design.
  • Any vendor selling me the opportunity, at great cost, to simulate a dim-witted test executioner. Most tool vendors don’t understand the difference between testing and checking, and they think what I want is a way to “test while I sleep.” Yes, I do want the ability to extend my power as a tester, but that doesn’t mean I’m happy to continually tweak and maintain a brittle set of checks that have weak oracles and weak coverage.
  • Any vendor who designs tools by guessing what will impress top managers in large companies who know nothing about testing. In other words: tools to support ceremonial software testing. Cem and I once got a breathless briefing about a “risk-based test management” tool from Compuware. Cem left the meeting early, in disgust. I lingered and tried to tell them why their tool was worthless. (Have you ever said that to someone, and they reacted by saying “I know it’s not perfect” and you replied by saying “Yes, it’s not perfect. I said it’s worthless, therefore it would follow that it’s also not perfect. You could not pay me to use this tool. This tool further erodes my faith in the American public education system, and by extension the American experiment itself. I’m saying that you just ruined America with your stupid stupid tool. So yeah, it’s not perfect.”) I think what bugged Cem and me the most is that these guys were happy to get our endorsement, if we wanted to give it, but they were not at all interested in our advice about how the tool could be re-designed into being a genuine risk-based testing tool. Ugh, marketers.
  • Vendors who want to sell me a tool that I can code up in Perl in a day. I don’t see the value of Cucumber. I don’t need FIT (although to his credit, the creator of FIT also doesn’t see the big deal of FIT). But if I did want something like that, it’s no big deal to write a tool in Perl. And both of those tools require that you write code, anyway. They are not tools that take coding out of our hands. So why not DIY?

Tool vendors I like:

  • Vendors who care what working testers think of their tools and make changes to impress them. Blueberry, Hexawise, and Sirius Software have all done that.
  • Vendors who have tools that give me vast new powers. I love the idea of virtual test labs. VMWare, for instance.
  • Vendors who don’t shackle me to restrictive licenses. I love ActivePerl, which I can use all over the place. And I happily pay for things like their development kit.
  • Vendors who enjoy testing. Justin Hunter, of Hexawise, is like that. He’s the only vendor speaking at CAST, this year, you know.

We Need Better Testing Bloggers

I don’t understand the mentality of bloggers like this guy. His view of the history of testing is a fantasy that seems designed to insult people who study testing. It applies at most to certain companies, not to the field itself.

He says we need a better way to test. Those of us who are serious testers have actually been developing and demonstrating better ways to test for decades, as we keep up with technology. Where have you been, Steve? Get out much do ya?

He thinks automation is the answer. What a surprise that a programmer would say that. But the same thing was said in 1972 at the Chapel Hill Symposium. We’ve tried that already. Many many times we’ve tried it.

We know why automation is not the grand solution to the testing problem.

As a board member of AST, I should mention the upcoming CAST Conference— the most advanced practitioner’s testing conference I know. Go to CAST, Steve, and tell Jerry Weinberg to his face (the programmer who started the first independent test group, made up of programmers) all about your theory of testing history.

Also, Jerry’s new book Perfect Software and Other Illusions About Testing, will be available soon. It addresses misconceptions like “Just automate the testing!” along with many others. Jerry is not just an old man of testing. He’s the oldest among us.

The Future Will Need Us to Reboot It

I’ve been reading a bit about the Technological Singularity. It’s an interesting and chilling idea conceived by people who aren’t testers. It goes like this: the progress of technology is increasing exponentially. Eventually the A.I. technology will exist that will be capable of surpassing human intelligence and increasing its own intelligence. At that point, called the Singularity, the future will not need us… Transhumanity will be born… A new era of evolution will begin.

I think a tester was not involved in this particular project plan. For one thing, we aren’t even able to define intelligence, except as the ability to perform rather narrow and banal tasks super-fast, so how do we get from there to something human-like? It seems to me that the efforts to create machines that will fool humans into believing that they are smart are equivalent to carving a Ferrari out of wax. Sure you could fool someone, but it’s still not a Ferrari. Wishing and believing doesn’t make it a Ferrari.

Because we know how a Ferrari works, it’s easy to understand that a wax Ferrari is very different from a real one. Since we don’t know what intelligence really is, even smart people easily will confuse wax intelligence for real intelligence. In testing terms, however, I have to ask “What are the features of artificial intelligence? How would you test them? How would you know they are reliable? And most importantly, how would you know that human intelligence doesn’t possess secret and subtle features that have not yet been identified?” Being beaten in chess by a chess computer is no evidence that such a computer can help you with your taxes, or advise you on your troubles with girls. Impressive feats of “intelligence” simply do not encompass intelligence in all the forms that we routinely experience it.

The Google Grid

One example is the so-called Google Grid. I saw a video, the other day, called Epic 2014. It’s about the rise of a collection of tools from Google that create an artificial mass intelligence. One of the features of this fantasy is an “algorithm” that automatically writes news stories by cobbling pieces from other news stories. The problem with that idea is that it seems to know nothing about writing. Writing is not merely text manipulation. Writing is not snipping and remixing. Writing requires modeling a world, modeling a reader’s world, conceiving of a communication goal, and finding a solution to achieve that goal. To write is to express a point of view. What the creators of Epic 2014 seemed to be imagining is a system capable of really really bad writing. We already have that. It’s called Racter. It came out years ago. The Google people are thinking of creating a better Racter, essentially. The chilling thing about that is that it will fool a lot of people, whose lives will be a little less rich for it.

I think the only way we can get to an interesting artificial intelligence is to create conditions for certain interesting phenomena of intelligence to emerge and self-organize in some sort of highly connectionist networked soup of neuron-like agents. We won’t know if it really is “human-like”, except perhaps after a long period of testing, but growing it will have to be a delicate and buggy process, for the same reason that complex software development is complex and buggy. Just like Hal in 2001, maybe it’s really smart, or maybe it’s really crazy and tells lies. Call in the testers, please.

(When Hal claimed in the movie that no 9000 series computers had ever made an error, I was ready to reboot him right then.)

No, you say? You will assemble the intelligence out of trillions of identical simple components and let nature and data stimulation build the intelligence automatically? Well, that’s how evolution works, and look how buggy THAT is! Look how long it takes. Look at how narrow the intelligences are that it has created. And if we turn a narrow and simplistic intelligence to the task of redesigning itself, why suppose that it is more likely to do a good job than a terrible job?

Although humans have written programs, no program yet has written a human. There’s a reason for that. Humans are oodles more sophisticated than programs. So, the master program that threatens to take over humanity would require an even more masterful program to debug itself with. But there can’t be one, because THAT program would require a program to debug itself… and so on.

The Complexity Barrier

So, I predict that the singularity will be drowned and defeated by what might be called the Complexity Barrier. The more complex the technology, the more prone to breakdown. In fact much of the “progress” of technology seems to be accompanied by a process of training humans to accept increasingly fragile technology. I predict that we will discover that the amount of energy and resources needed to surmount the complexity barrier will approach infinity.

In the future, technology will be like weather. We will be able to predict it somewhat, but things will go mysteriously wrong on a regular basis. Things fall apart; the CPU will not hold.

Until I see a workable test plan for the Singularity, I can’t take it seriously.

Confused Methodology Talk #1

This posting by Corey Goldberg illustrates an interesting and all too common kind of confusion people get into when discussing methods and practices. It’s worth pondering.

On SQAForums, someone stated:

“ISEB defines automated tested as useful only in mature testing environments and where functionality is not changing i.e. at regression testing.”

to which Corey replied:

“…and ISEB would be completely wrong on that point. web services testing should be fully automated, as there is no UI, just an API.”

Let’s analyze these statements. The first writer seems to be under the sway of ISEB, which immediately induces a heavy sigh in the pit of my soul.

(There are now thousands of people who might be called “certification zombies” lurching around in an ISEB or ISTQB-induced fog, trying to apply what they learned in a few days of memorizing to the complex reality of testing.)

When the first writer says that ISEB “defines” automation as useful only in a certain context, that’s a perfect example of the inability to separate context and method. To think clearly about methodology, you must be able to sift these things apart. Best practice thinking can’t help you do this, and in fact discourages you from trying.

I don’t know if ISEB actually defines or discusses test automation in that way, but if it does, I can tell you what ISEB is probably thinking.

(BTW, one of the big problems with certification programs is the depersonalization of convictions. I say “ISEB” when what I want to say is Dorothy Graham or one of those people who support and edit the ISEB syllabus. You can’t argue with a document. Only people can have a point of view. To argue with ISEB itself is to argue with an anonymous sock puppet. But that’s the way they want it. Certificationists quite purposefully create a bureaucratic buffer of paper between themselves and any dissenters. To pick someone whom I believe advocates the ISEB way, I will choose Dorothy Graham.)

If Dot advocates that belief, then she is probably thinking about GUI-level automation of some aspects of test execution; a set of detailed scripted actions programmed into a software agent to exercise a system under test. If so then it is indeed likely that modifying the system under test in certain ways will break the test automation. This often leads to a situation where you are constantly trying to fix the automation instead of enjoying the benefits of it. This is especially a problem when the testing is happening via a GUI, because little changes that don’t bother a human will instantly disable a script.

So, even though the first writer appears to be reading off the ISEB script, there is some validity to his claim, in some context.

Now look at Corey’s reply. Corey is not under the sway of ISEB, but I worry that he may be under the sway of a typical affliction common among programmers who talk about testing: the reification fallacy. This is the tendency to think of an abstraction or an emergent process as if it were a fixed concrete thing. Hence if a programmer sees me punch a few keys in the course of my testing, and writes a program that punches those same keys in the same order, he might announce that he as “automated the test”, as if the test were nothing more than a pattern of input and output. Certainly, it is possible to automate some aspects of testing, but the aspect of it that requires human reflection cannot be automated. In fact, it can’t even be precisely duplicated by another human. It is an emergent phenomenon.

(Some would say that I am splitting hairs too finely, and that imprecise duplication may be close enough. I agree that it may be close enough in certain contexts. What I caution against is taking the attitude that most of what is valuable about testing, most of the time, is easy to automate. When I have seen that attitude in practice, the resulting automation has generally been too expensive and too shallow. Rich, interesting, cost-effective test automation, in my experience, is a constructive partnership between human thinkers and their tools. I believe, based on my knowledge of Corey, that he actually is interacting constructively with his tools. But in this case, he’s not talking that way.)

What Corey can do is use tools to interact with a system under test. He uses his un-automatable human mind to program those tools to provide certain input and look for certain output. His tools will be able to reveal certain bugs. His tools in conjunction with un-automatable human assistance during and after execution and un-automatable human assistance to re-program the tests as needed will reveal many more bugs.

The reification fallacy leads to certain absurdities when you consider different frames of reference. Corey points out that a web service has no “user interface”, and therefore is accessible only via a tool, and anything that is accessible only by a tool must therefore require “fully automated” testing. By that reasoning, we can say that all testing is always fully automated because in all cases there is some kind of hardware or software that mediates our access to the object of our test. Therefore, the fact that I am using a keyboard to type this blog posting and a screen to view it, by Cory’s logic, must be fully automated writing! I wonder what will be written next by my magic keyboard?

From one frame of reference, a web service has no user interface. From another frame of reference we can say that it does have a user interface, just not a human interface– its user is another program. How we test such a thing is to write or employ a program that does have a human interface to manipulate the web service. We can operate this interface in batch mode: write a program to submit data, run it, review the results, and re-write the program as needed. Or we can operate the interface interactively: write a program to submit data, present results, then wait for us to type in a new query.

Corey and the first writer are not in a helpful dialog, because they are talking about different things. I would tell the first writer to treat ISEB as having no authority or wisdom, and to instead learn to reason for himself. The relevant reasoning here, I think, is to wonder what kind of tool we could find or write that would allow us to interact with the web service. At the same time, we need to consider how the web service interface might change. We might stick to highly interactive testing for a while, instead of investing in a batching system with lot of automatic oracles, if we feel that the interface and functionality is changing too fast. On the other hand, one of the nice things about testing through an API is that it is often rather inexpensive to script sequences and batches and simple oracles; and consequently inexpensive to fix them when the system under test changes. I suspect that belief informed Corey’s response, although I wish he would make that belief more apparent to people who are used to thinking of testing as a human-driven process.

As a programmer, I am aware of the urge, sometimes, to say “I didn’t do it, my program did.” In testing this naturally turns into “I didn’t test that, my program I wrote to test that did.” The crucial difficulty with this way of speaking, when it comes to testing, is the way it obscures the many, many choices the programmer made while designing the program, as if the program itself made those choices, or as if there were no choices to be made. The thing is, I don’t care, for a regular program, how many other ways it could have been written, or how any other things it could have done. But these are vital concerns when the program is meant to test another program.

Manual Tests Cannot Be Automated (DEPRECATED)

[Note: This post is here only to serve as a historical example of how I used to speak about “automated testing.” My language has evolved. The sentiment of this post is still valid, but I have become more careful– and I think more professional– in my use of terms.]

I enjoy using tools to support my testing. As a former production coder, automated tests can be a refreshing respite from the relatively imponderable world of product analysis and heuristic test design (I solve sudoku puzzles for the same reason). You know, the first tests I ever wrote were automated. I didn’t even distinguish between automated and manual tests for the first couple of years of my career.

Also for the first six years, or so, I had no way to articulate the role of skill in testing. Looking back, I remember making a lot of notes, reading a lot of books, and having a feeling of struggling to wake up. Not until 1993 did my eyes start to open.

My understanding of cognitive skills of testing and my understanding of test automation are linked, so it was some years before I came to understand what I now propose as the first rule of test automation:

Test Automation Rule #1: A good manual test cannot be automated.

No good manual test has ever been automated, nor ever will be, unless and until the technology to duplicate human brains becomes available. Well, wait, let me check the Wired magazine newsfeed… Nope, still nothing human brain scanner/emulators.

(Please, before you all write comments about the importance and power of automated testing, read a little bit further.)

It is certainly possible to create a powerful and useful automated test. That test, however, will never have been a good manual test. If you then read and hand-execute the code– if you do exactly what it tells you– then congratulations, you will have performed a poor manual test.

Automation rule #1 is based on the fact that humans have the ability to do things, notice things, and analyze things that computers cannot. This is true even of “unskilled” testers. We all know this, but just in case, I sprinkle exercises to demonstrate this fact throughout my testing classes. I give students products to test that have no specifications. They are able to report many interesting bugs in these products without any instructions from me, or any other “programmer.”

A classic approach to process improvement is to dumb down humans to make them behave like machines. This is done because process improvement people generally don’t have the training or inclination to observe, describe, or evaluate what people actually do. Human behavior is frightening to such process specialists, whereas machines are predictable and lawful. Someone more comfortable with machines sees manual tests as just badly written algorithms performed ineptly by suger-carbon blobs wearing contractor badges who drift about like slightly-more-motivated-than-average jellyfish.

Rather than banishing human qualities, another approach to process improvement is to harness them. I train testers to take control of their mental models and devise powerful questions to probe the technology in front of them. This is a process of self-programming. In this way of working, test automation is seen as an extension of the human mind, not a substitute.

A quick image of this paradigm might be the Mars Rover program. Note that the Mars Rovers are completely automated, in the sense that no human is on Mars. Yet they are completely directed by humans. Another example would be a deep sea research submarine. Without the submarine, we couldn’t explore the deep ocean. But without humans, the submarines wouldn’t be exploring at all.

I love test automation, but I rarely approach it by looking at manual tests and asking myself “how can I make the computer do that?” Instead, I ask myself how I can use tools to augment and improve the human testing activity. I also consider what things the computers can do without humans around, but again, that is not automating good manual tests, it is creating something new.

I have seen bad manual tests be automated. This is depressingly common, in my experience. Just let me suggest some corollaries to Rule #1:

Rule #1B: If you can truly automate a manual test, it couldn’t have been a good manual test.

Rule #1C: If you have a great automated test, it’s not the same as the manual test that you believe you were automating.

My fellow sugar blobs, reclaim your heritage and rejoice in your nature. You can conceive of questions; ask them. You are wonderfully distractable creatures; let yourselves be distracted by unexpected bugs. Your fingers are fumbly; press the wrong keys once in while. Your minds have the capacity to notice hundreds of patterns at once; turn the many eyes of your minds toward the computer screen and evaluate what you see.

Quick Oracle: Blink Testing

Background:

  1. In testing, an “oracle” is a way to recognize a problem that appears during testing. This contrasts with “coverage”, which has to do with getting a problem to appear. All tests cover a product in some way. All tests must include an oracle of some kind or else you would call it just a tour rather than a test. (You might also call it a test idea, but not a complete test.)
  2. A book called Blink: The Power of Thinking Without Thinking has recently been published on the subject of snap decisions. I took one look at it, flipped quickly through it, and got the point. Since the book is about making decisions based on little information, I can’t believe the author, Malcolm Gladwell, seriously expected me to sit down and read every word.

“Blink testing” represents an oracle heuristic I find quite helpful, quite often. (I used to call it “grokking”, but Michael Bolton convinced me that blink is better. The instant he suggested the name change, I felt he was right.)

What you do in blink testing is plunge yourself into an ocean of data– far too much data to comprehend. And then you comprehend it. Don’t know how to do that? Yes you do. But you may not realize that you know how.

You can do it. I can prove this to you in less than one minute. You will get “blink” in a wink.

Imagine an application that adds two numbers together. Imagine that it has two fields, one for each number, and it has a button that selects random numbers to be added. The numbers chosen are in the range -99 to 99.

Watch this application in action by looking at this movie (which is an interactive EXE packaged in a ZIP file) and ask yourself if you see any bugs. Once you think you have it, click here for my answer.

  • How many test cases do you think that was?
  • Did it seem like a lot of data to process?
  • How did you detect the problem(s)?
  • Isn’t it great to have a brain that notices patterns automatically?

There are many examples of blink testing, including:

  • Page through a long file super rapidly (holding your thumb on the Page Down button, notice the pattern of blurry text on the screen, and look for strange variations in that pattern.
  • Take a 60,000 line log file, paste it into Excel, and set the zoom level to 8%. Scroll down and notice the pattern of line lengths. You can also use conditional formatting in Excel to turn lines red if they meet certain criteria, then notice the pattern of red flecks in the gray lines of text, as you scroll.
  • Flip back and forth rapidly between two similar bitmaps. What catches your eye? Astronomers once did this routinely to detect comets.
  • Take a five hundred page printout (it could be technical documentation, database records, or anything) and flip quickly through it. Ask yourself what draws your attention most about it. Ask yourself to identify three interesting patterns in it.
  • Convert a huge mass of data to sound in some way. Listen for unusual patterns amidst the noise.

All of these involve pattern recognition on a grand scale. Our brains love to do this; our brains are designed to do this. Yes, you will miss some things; no, you shouldn’t care that you are missing some things. This is just one technique, and you use other techniques to find those other problems. We already have test techniques that focus on trees, it also helps to look at the forest.

Test Messy with Microbehaviors

James Lyndsay sent me a little Flash app once that was written to be a testing brainteaser. He challenged me to test it and I had great fun. I found a few bugs, and have since used it in my testing class. “More, more!” I told him. So, he recently sent me a new version of that app. But get this: he fixed the bugs in it.

In a testing class, a product that has known bugs in it make a much better working example than a product that is has only unknown bugs. The imperfections are part of its value, so that testing students have something to find, and the instructor has something to talk about if they fail to find them.

So, Lyndsay’s new version is not, for me, an improvement.

This has a lot to do with a syndrome in test automation: automation is too clean. Now, unit tests can be very clean, and there’s no sin in that. Simple tests that do a few things exactly the same way every time can have value. They can serve the purposes of change detection during refactoring. No, I’m talking about system-level, industrial strength please-find-bugs-fast test automation.

It’s too clean.

It’s been oversimplified, filed down, normalized. In short, the microbehaviors have been removed.

The testing done by a human user interacting in real time is messy. I use a web site, and I press the “back” button occasionally. I mis-type things. I click on the wrong link and try to find my way back. I open additional windows, then minimize them and forget them. I stop in the middle of something and go to lunch, letting my session expire. I do some of this on purpose, but a lot of it is by accident. My very infirmity is a test tool.

I call the consequences of my human infirmity “microbehaviors”, those little ticks and skips and idiosyncrasies that will be different in the behavior of any two people using a product even if they are trying to do the same exact things.

Test automation can have microbehavior, too, I suppose. It would come from subtle differences in timing and memory use due to other processes running on the computer, interactions with peripherals, or network latency. But nothing like the gross variations inherent in human interaction, such as:

  • Variations in the order of apparently order independent actions, such as selecting several check boxes before clicking OK on a dialog box. (But maybe there is some kind of order dependence or timing relationship that isn’t apparent to the user)
  • The exact path of the mouse, which triggers mouse over events.
  • The exact timing and sequence of keyboard input, which occurs in patterns that change relative to the typing skill and physical state of the user.
  • Entering then erasing data.
  • Doing something, then undoing it.
  • Navigating the UI without “doing” anything other than viewing windows and objects. Most users assume this does not at all affect the state of an application.
  • Clicking on the wrong link or button, then backing out.
  • Leaving an application sitting in any state for hours on end. (My son leaves his video games sitting for days, I hope they are tested that way.)
  • Experiencing error messages, dismissing them (or not dismissing them) and trying the same thing again (or something different).
  • Navigating with the keyboard instead of the mouse, or vice versa.
  • Losing track of the application, assuming it is closed, then opening another instance of it.
  • Selecting the help links or the customer service links before returning to complete an activity.
  • Changing browser or O/S configuration settings in the middle of an operation.
  • Dropping things on the keyboard by accident.
  • Inadvertantly going into hibernation mode while using the product, because the batteries ran out on the laptop.
  • Losing network contact at the coffee shop. Regaining it. Losing it again…
  • Accidentally double-clicking instead of single-clicking.
  • Pressing enter too many times.
  • Running other applications at the same time, such as anti-virus scanners, that may pop up over the application under test and take focus.

What make a microbehavior truly micro is that it’s not supposed to make a difference, or that the difference it makes is easily recoverable. That’s why they are so often left out of automated tests. They are optimized away as irrelevant. And yet part of the point of testing is to challenge ideas about what might be relevant.

In a study done at Florida Tech, Pat McGee discovered that automated regression tests for one very complex product found more problems when the order of the tests was varied. Everything else was kept exactly the same. And, anecdotally, every tester with a little experience can probably cite a case where some inadvertent motion or apparently irrelevant variation uncovered a bug.

Even a test suite with hundreds of simple procedural scripts in it cannot hope to flush out all and probably not most of the bugs that matter, in any complex product. Well, you could hope, but your hope would be naive.

So, that’s why I strive to put microbehaviors into my automation. Among the simplest measures is to vary timing and ordering of actions. I also inject idempotent actions (meaning that they end in the same apparent state they started with) on a random basis. These measures are usually very cheap to implement, and I believe they greatly improve my chances of finding certain state-related or timing-related bugs, as well as bugs in exception handling code.

What about those Flash applications that Mr. Lyndsay sent me? He might legitimately assert that his purpose was not to write a buggy Flash app for testers, but a nice clean brainteaser. That’s fine, but the “mistakes” he made in execution turned into bonus brainteasers for me, so I got the original, plus more. And that’s the same with testing.

I want to test on purpose AND by accident, at the same time.

Counterstrings: Self-Describing Test Data

I was at a conference some months ago when Danny Faught showed me a Perl package for manipulating the Windows clipboard. I turned it into a little tool for helping me test text fields.

It’s called PerlClip. Feel free to download it. You don’t need Perl to run it.

One of the things PerlClip does is allow you to produce what I call “counterstrings”. A counterstring is a graduated string of arbitrary length. No matter where you are in the string, you always know the character position. This comes in handy when you are pasting huge strings into fields and they get truncated at a certain point. You want to know how many characters that is.

Here is a 35 character counterstring:
2*4*6*8*11*14*17*20*23*26*29*32*35*

Each asterisk in the string occurs at a position specified by the immediately preceding number. Thus, the asterisk following the 29 is the 29th character in that string. So, you can chop the end of the string anywhere, and you know exactly where it was cut. Without having to count, you know that the string “2*4*6*8*11*14*17*2” has exactly 18 characters in it. This saves some effort when you’re dealing with a half million characters. I pasted a 4000 character counterstring into the address field of Explorer and it was truncated at “2045*20”, meaning that 2047 characters were pasted.

I realize this is may not be a very interesting sort of testing, except perhaps for security purposes or when you’re first getting to know the app. But security is an increasingly important issue in our field, and sometimes when no one tells you the limits and dynamics of text fields, this can come in handy.

Testability Through Audibility

I was working with a client today who complained that there were hidden errors buried in a log file produced by the product he was testing. So, I wrote him a tool that continuously monitors any text file, such as a server log (as long as it is accessible through the file system, as in the case of a test server running locally) and plays WAV files whenever certain string patterns appear in the stream.

With this little tool, a streaming verbose log can be rendered as a stream of clicks and whirrs, if you want, or you can just have it yell “ERROR!” when an error pops up in the log. All this in real time without taking your eyes off the application. Using this, I found a bug in a browser based app whereby perfectly ordinary looking HTML displayed on the screen coincided with a Java null pointer exception in the log.

I released this bit of code with the GPL 2.0 license and you can find it here:

http://www.satisfice.com/tools/log-watch.zip

By the way, this is an example of what I call agile test tooling. I paired with a tester. I heard a complaint. I offered a tool idea. The tester said “yes, please.” I delivered the tool the next day. As we were playing with it, I added a couple of features. I don’t believe you have to be a programmer to be a great tester, but it helps to have a programmer or two on the testing staff. It’s nice work for programmers like me, who get bored with long term production coding.

Tools Come at a Cost

One of the experiences I share with a lot of people in this modern world is that I forget phone numbers. I never used to. The problem is that my mobile phone remembers them for me. So, phone numbers no longer stick in my own head. If I want to call a colleague, I first look for my phone. If I can’t find my phone, I don’t make the call.

Another way of looking at this is that my life has been simplified in some ways by my mobile phone, and in some ways it has been made more complicated. I would argue that it was simpler for me when I was forced to memorize phone numbers. It was simpler in that my use of many useful phone numbers was completely independent of external equipment.

Any tool that helps me, also costs something. Any tool, agent, or organization that abstracts away a detail may also takes away a resource that I might sometimes need, and may atrophy if not used on a regular basis.

Test Tools Come at a Cost– Even if They are Free
This weekend, I attended the 5th Austin Workshop on Test Automation. This is a group of “test automation” people who are sharing information about test tools– specifically, open source test tools. It’s wonderful. I’m learning a lot about free stuff that might help me.

But I notice a pattern that concerns me: an apparent assumption by some of my helpful tool developer friends that a tool of theirs that handles something for me (so that I don’t have to do it myself) is obviously better than not having that tool.

So, let’s consider what is offered when someone offers me a tool that solves a problem that crops up in the course of doing a task:

  • Some capability I may not already have.
  • Possibly a new set of abstractions that help me think better about my task.
  • Possibly a higher standard of “good enough” quality in my task that I can attain because of the new capability and abstractions.

But what is also offered is this:

  • Problems in that tool.
  • New problems due to how the tool changes my task.
  • New problems due to how the tool interacts with my technological or social environment.
  • One more thing to install on all the platforms I use.
  • The necessity of paying the operating costs of the tool to the extent I choose to use it.
  • The necessity investing time to learn the tool if I choose to use it (and to keep up with that learning).
  • The necessity of investing effort in using the tool (creating tool specific artifacts, for instance) that might not pay off as well as an alternative.
  • Having invested effort, the possibility of losing that investment when the tool becomes obsolete.
  • Avoidance of the learning and mastery of details that I might get by solving the problem myself.
  • A relationship with one more thing over which I have limited influence; and a potentially weaker relationship with something else that I know today.
  • Possible dependence on the owner of the tool to keep it current.
  • Possible legal entanglements from using the tool.
  • A sense of obligation to the provider of the tool.

I find it useful to explore tools. I want to learn enough to hold in my mind a useful index of possible solutions. And of course, I use many test tools, small and large. But I’m wary of claims that a new tool will make my life simpler. I appreciate a certain simplicity in the complexity of my world.

— James

Bug Report: Weird Charges and Customer Service

Two months ago, I had a $732 charge on my Sprint phone bill for “telephone equipment.” I had not purchased any telephone equipment, so I called customer service to dispute the bill.

The service agent told me “we haven’t sold phone equipment for months, so this can’t be right. I’ll remove the charge.” (Notice how the human used inference to provide me a service that I can’t imagine any automated system providing as part of resolving a billing dispute).

The following month I received a notice of impending disconnect if I failed to pay the past due $732. So, I got back on the phone with Sprint, ready to breath fire on them. The friendly agent told me there were notations on my customer record about the problem, then put me on hold for 10 minutes or so while she researched it. When she got back on the line, she told me that the first agent had failed to fill out the proper forms to remove the charge. It turned out that filling out those forms takes several minutes, so she asked me to stay on the line with her “or else someone else will call and I may forget to do it.”

The interesting thing about the second agent is that she told me how the charge had occurred. Apparently, I bought a telephone from Sprint three years ago on installments. When I moved my business line to a new location, a couple months ago, the information about the long since paid for telephone was moved with it and somehow became refreshed as if it was a new transaction. Hence the new charge for equipment not even any longer sold by Sprint. (Notice that the second agent– her name is Virginia and she’s based in Sprint’s Orlando office– also behaved in a way wonderfully human, developing a causal theory of the problem instead of treating merely as One of Those Things).

What does this mean as far as possible bugs in Sprint’s software? I think any of the following:

  • An obsolete feature remains active in their software. It could have been deactivated, or caught by some auditing process that knows no new telephone equipment charges are possible.
  • Simply changing the address of an existing telephone line apparently means recreating the entire customer record, and the process of doing that apparently copies things that should not be copied, such as old records of purchases from years ago. This seems to be design flaw.
  • A purchase order should require a specific authorizing action by a human agent. Either there is a usability problem in the software whereby the user accidentally signalled that I had purchased equipment, or the system automatically decided that I had purchased it without a specific authorizing step.

I’ve had a number of problems with Sprint; with their flaky website and billing systems. But I have not yet had a bad experience with their customer service people. Which only goes to show: quality is the whole relationship with the customer, not just what happens with the products. Virginia gave me her full name and direct line in case the problem comes back. I’m impressed.

And then there was Dell…

I no longer do business with Dell. Their customer service has been terrible. I once sent my malfunctioning laptop to Dell for service and they not only announced that they could not find any problem with it (when it came back to me the problem was even worse) but it came back with scratches on the screen. I could have sent it back to them with more specific instructions for making the problem happen, but by then I had so little respect for the ability of their tech support people I just found a workaround and suffered with it until I could afford to replace my laptop with a Toshiba.

But that’s not the bug I want to report. The bug has to do with paying off a computer lease I once had with Dell. Leasing a computer from Dell is just a bad idea all around. It’s very expensive for one thing. But the other thing is that their record keeping is shockingly bad.

I tried to pay off my lease early. And get this. They called me SEVEN times over the next several months to demand further payments, because their systems had no record of my final payment.

I had FedExed the payment to them. I had records of the FedEx being received and the check being cashed. Each time their collections people called I gave them the same evidence, and each time they accepted it, apparently making some note in my records. But every few weeks, their computer would once again kick out an alarm that I had not paid. These calls got increasingly farcical. On one of them, the guy called me demanding payment, then immediately announced that he could see from his system that I owed them nothing, apologized, and hung up. I don’t think I said a word other than “hello.” and “I know.”

During my various investigations of the problem, working with various agents, I discovered that there are three different departments that must all coordinate at Dell in order for Dell The Corporation to believe that a lease is paid off. This coordination was apparently not happening in my case. The reason for this was not known, or if known, I wasn’t told. The collections department is apparently yet another independent group, and is incapable of doing anything to solve such problems. Do not lease with Dell.

What’s the bug, there? I don’t know. It appears to be a completely inadequate system. I might complain about this to them, but what’s the point? I learned my lesson. I buy my computers strictly through my local computer store, now, which has amazing customer service (http://www.royaloakcomputers.com). I think they ship computers, too.

Dell seems to be trying to cut costs by automating more and training its people less. All the more need, then, to test test test those systems.

Bug Report: United Airlines Self-Check-In

I recently used the United Airlines “self check-in” kiosk. It’s a touchscreen system at the airport that allows passengers to get a boarding pass for a flight without having to see a live ticket agent. United is keen for people to use them. The more popular the automation is, the less staff United needs for the ticket counters.

I’ve tried those kiosks twice, now, and twice they have failed me.

Here’s how they work. You touch the screen to start, and the system asks you for a credit card or any of various other forms of identification. After receiving that, the system conducts you through a series of screens to identify your flight and seat assignments. The boarding pass pops out and you’re done. Theoretically.

The first time I used one was just after they appeared at Washington Dulles airport, my home base. On my flight that day, I had applied for and received an upgrade to business class. A passenger can upgrade using paper coupons, electronic “coupons” stored in my frequent flyer account, or by spending frequent flyer miles. The self-check-in system correctly informed me that my upgrade had been granted, and then asked whether I wanted to provide paper upgrade coupons or use electronic ones from my frequent flyer account. My answer was “none of the above, I want to use miles” but there was no way to select that option. That option was missing. Since I had no paper coupons, and had no electronic coupons in my account, I was forced to abort the process and get in line for a human. The humans, it turned out, knew what to do.

If it is an ordinary operation to use miles for a service upgrade, why couldn’t the kiosk handle it?

For months after that incident I just ignored the kiosks. It’s too annoying to stand in line to use the automated system only to have to go stand in another line to see a human when the automation breaks down. I am a very frequent flyer. I often use my miles to upgrade. Whether the omission of that feature was an oversight or a calculated decision, I think United screwed up. Even if there’s some compelling reason to omit that capability, the system could still have displayed a message to the effect that the feature was coming soon, or that it was available from the human staff. The incident left me with not just disatisfied with the kiosks, but with the feeling that United is careless about the needs of its customers.

The next time I considered using the kiosks was on New Years Eve, 2003. The line for humans looked pretty long, but the self check-in kiosks seemed deserted. What the heck, I thought, I’ll try them again. I expected no trouble with my upgrade, since I was using coupons.

The system took me through all of the screens, then announced that the printer was broken and that I should try again with another kiosk or see a human agent. Grumbling, I stepped up to the kiosk next to it, which cheerfully informed me that I had already-checked-in-everything’s-fine-have-a-nice-flight-goodbye. I supposed it assumed that I already had my boarding passes. Meanwhile, the first kiosk I tried had reverted back to its welcome screen, luring unsuspecting travellers to the same time wasting trap.

What are the bugs, here? I think any of the following:

  • The kiosk apparently does not specifically log whether or not a boarding pass was actually issued to the passenger. Or if it is logged, that function failed for me.
  • The printer failure message directs passengers to use another kiosk, even though that is guaranteed never to help in the case of a printer failure.
  • The kiosk assumes that passengers must have no need for its services if they’ve already checked in. But what if I’d lost my boarding pass, or wanted to change my seat, or wanted to un-upgrade or re-upgrade? The kiosk should not just kick me off if I’ve already checked-in.
  • Even if the kiosk is intended only for initial check-ins, there could and should be a message directing passengers to see a human agent in the case of a lost boarding pass or a seat change request, etc.
  • The kiosk continues to take new users even when it knows (after the first failure) that its printer is broken.

Many testers who work on a kiosk like this might think of testing the printer failed condition, but most of those, I suspect, would have stopped after seeing that the apparently “correct” error message appeared. But the bug I found is only revealed when you go beyond the simple function tests and look at the complete lifecycle of a passengers interaction with the system.

The Moral
The moral, here, might be “scenario testing gives you value that function testing does not.” Another moral might be “watch users use your system.” If I was United, I would be studying log files and surveillance video of users trying to figure out the kiosks. I would count the number of people who struck out at the kiosks and defected to seek an audience with a friendly and expensive human.

Oh, and that’s another moral: friendly and expensive humans are difficult to replace with computers, because humans can cope with things that go awry. That’s why pilotless aircraft will never ferry human passengers. We can accept humans making mistakes, but we can’t accept machines making a mistake that a human might have averted.

United’s brittle software has cost it real money, too. The reason I was flying to Las Vegas on New Years Eve was to get 4000 more miles to qualify for a special status as a United frequent flyer. I flew to Vegas and immediately got back on the very same plane to come home. I would have been happy to purchase a ticket and not flown anywhere, just to get the miles. United could sell the seat to someone else, and I could stay comfortably at home, confident in my exalted new status as an elite United customer. Everybody wins. For whatever reason, United can’t do that.

But when I got the airport, they announced that the flight was overbooked and asked for volunteers to be bumped from the flight. I considered volunteering, as long as I was credited with the mileage. The gate agent said that might be possible, but on reflection, I realized there was nothing the agent could say that would convince me that United’s computer was smart enough to give me the right kind of mileage credit. My lack of confidence in United therefore led directly to them kicking someone else off the flight and giving them a free ticket or travel voucher.

A Personal Appeal to United
Hire me. I’ll give you a discounted rate. I’ll work for miles (literally and figuratively). Let me help you test your kiosks so that your customers will have more confidence in you. Look, I can’t afford for you to go out of business. I’ve invested too much in you. Don’t make me start over with another airline.