James Lyndsay sent me a little Flash app once that was written to be a testing brainteaser. He challenged me to test it and I had great fun. I found a few bugs, and have since used it in my testing class. “More, more!” I told him. So, he recently sent me a new version of that app. But get this: he fixed the bugs in it.
In a testing class, a product that has known bugs in it make a much better working example than a product that is has only unknown bugs. The imperfections are part of its value, so that testing students have something to find, and the instructor has something to talk about if they fail to find them.
So, Lyndsay’s new version is not, for me, an improvement.
This has a lot to do with a syndrome in test automation: automation is too clean. Now, unit tests can be very clean, and there’s no sin in that. Simple tests that do a few things exactly the same way every time can have value. They can serve the purposes of change detection during refactoring. No, I’m talking about system-level, industrial strength please-find-bugs-fast test automation.
It’s too clean.
It’s been oversimplified, filed down, normalized. In short, the microbehaviors have been removed.
The testing done by a human user interacting in real time is messy. I use a web site, and I press the “back” button occasionally. I mis-type things. I click on the wrong link and try to find my way back. I open additional windows, then minimize them and forget them. I stop in the middle of something and go to lunch, letting my session expire. I do some of this on purpose, but a lot of it is by accident. My very infirmity is a test tool.
I call the consequences of my human infirmity “microbehaviors”, those little ticks and skips and idiosyncrasies that will be different in the behavior of any two people using a product even if they are trying to do the same exact things.
Test automation can have microbehavior, too, I suppose. It would come from subtle differences in timing and memory use due to other processes running on the computer, interactions with peripherals, or network latency. But nothing like the gross variations inherent in human interaction, such as:
- Variations in the order of apparently order independent actions, such as selecting several check boxes before clicking OK on a dialog box. (But maybe there is some kind of order dependence or timing relationship that isn’t apparent to the user)
- The exact path of the mouse, which triggers mouse over events.
- The exact timing and sequence of keyboard input, which occurs in patterns that change relative to the typing skill and physical state of the user.
- Entering then erasing data.
- Doing something, then undoing it.
- Navigating the UI without “doing” anything other than viewing windows and objects. Most users assume this does not at all affect the state of an application.
- Clicking on the wrong link or button, then backing out.
- Leaving an application sitting in any state for hours on end. (My son leaves his video games sitting for days, I hope they are tested that way.)
- Experiencing error messages, dismissing them (or not dismissing them) and trying the same thing again (or something different).
- Navigating with the keyboard instead of the mouse, or vice versa.
- Losing track of the application, assuming it is closed, then opening another instance of it.
- Selecting the help links or the customer service links before returning to complete an activity.
- Changing browser or O/S configuration settings in the middle of an operation.
- Dropping things on the keyboard by accident.
- Inadvertantly going into hibernation mode while using the product, because the batteries ran out on the laptop.
- Losing network contact at the coffee shop. Regaining it. Losing it again…
- Accidentally double-clicking instead of single-clicking.
- Pressing enter too many times.
- Running other applications at the same time, such as anti-virus scanners, that may pop up over the application under test and take focus.
What make a microbehavior truly micro is that it’s not supposed to make a difference, or that the difference it makes is easily recoverable. That’s why they are so often left out of automated tests. They are optimized away as irrelevant. And yet part of the point of testing is to challenge ideas about what might be relevant.
In a study done at Florida Tech, Pat McGee discovered that automated regression tests for one very complex product found more problems when the order of the tests was varied. Everything else was kept exactly the same. And, anecdotally, every tester with a little experience can probably cite a case where some inadvertent motion or apparently irrelevant variation uncovered a bug.
Even a test suite with hundreds of simple procedural scripts in it cannot hope to flush out all and probably not most of the bugs that matter, in any complex product. Well, you could hope, but your hope would be naive.
So, that’s why I strive to put microbehaviors into my automation. Among the simplest measures is to vary timing and ordering of actions. I also inject idempotent actions (meaning that they end in the same apparent state they started with) on a random basis. These measures are usually very cheap to implement, and I believe they greatly improve my chances of finding certain state-related or timing-related bugs, as well as bugs in exception handling code.
What about those Flash applications that Mr. Lyndsay sent me? He might legitimately assert that his purpose was not to write a buggy Flash app for testers, but a nice clean brainteaser. That’s fine, but the “mistakes” he made in execution turned into bonus brainteasers for me, so I got the original, plus more. And that’s the same with testing.
I want to test on purpose AND by accident, at the same time.