Two “Scoops” of “Bugs”

I have often said something like “We found a hundred bugs!” Lots of people have heard me say it. Statements like that are very valuable to me. But we should ask some vital questions about them.

Consider Raisin Bran cereal. If you lived in America and weren’t in solitary confinement during the 80’s an 90’s you would have seen this commercial for Raisin Bran at some point (or one like it):

Two scoops of raisins!

Huh? Two scoops of raisins? What does that mean?

Perhaps the conversation went like this:

“I want my Bran to have MANY raisins!” barked Boss Kellogg.

“But, Mr. Kellogg, we already include nearly one full standard scoop.” replied the Chief Cereal Mixer. “No one has more raisins than we do.”

“Increase to maximum scoop!”

“But sir! that would violate every–”

“TWO SCOOPS! And damn the consequences.”

“The skies will be black with raisins!”

“Then we shall eat in the shade.”

I doubt anything like that happened, though. I suspect what happened is that somebody mixed some raisins with some bran flakes until it tasted pretty good. Maybe he adjusted it a little to optimize cost of goods (and perhaps they adjust the bran/raisin ratio as cost of goods change). Later, I bet, and completely unrelated to the engineering and manufacturing process, Kellogg’s advertising agency decided to create the impression that customers are getting a lot of value for the money, so they invented a distinguishing characteristic that actually makes no sense at all: an absolute measurement called a “scoop”. And began to speak of it AS IF it were meaningful.

The reason the measurement makes no sense at all is that the “Two Scoops” slogan was pasted onto boxes of substantially different sizes. But even if the measurement makes no sense, the pretentious claim makes a lot of sense, because we humans don’t think through the rational basis of measurements like this unless we are A) rather well trained, and more importantly B) highly motivated. So our unconscious lizard brain says to itself “two means yummy. two means yummy. means two yummy. yummy two…”

At some point, someone (an intern, perhaps) may have asked “But are there actually two scoops of raisins in those boxes?” and the answer was much laughing. Because it could be argued that if there are at least two raisins in the box, then there are two scoops of raisins in the box. It could be argued that if there is one raisin in the box and you used two scoops to measure it (“measure twice and cut once”) then there are two scoops of raisins in the box. If you make up your own measuring unit, such as, say, “scoop”, you can go on to make any other claim you want. This is exactly the point of Jerry Weinberg’s famous dictum “If quality doesn’t matter, you can achieve any other goal you want.”

I was thinking about doing a scientific analysis of this, but someone beat me to it.

Oh What Silliness… OR IS IT?

We have a real problem in testing, and no good solution for it. We are supposed to report the ground truth. Concrete reality. But this turns out to be a very difficult matter. Apart from all the problems of observation and interpretation, we have to summarize our findings. When we do that we are tempted to use scientific tropes (such as nonsensical measurements)  to bolster our reports, even when they are poorly founded. We are often encouraged to do this by managers who were raised on Kelloggs commercials and therefore confuse numbers with food.

Let’s look once again at the Raisin Bran situation and consider what might be the reasonable communication hidden there:

Maybe “two scoops” is intended to mean “ample” or “amply supplied with raisins.” In other words they are saying “You won’t regret buying our Raisin Bran, which always has enough raisins for you. While you’re eating it, we predict you will hum the ‘two scoops of raisins!’ song instead of calling a lawyer or becoming a cereal killer.”

I think there’s a scale built into all of us. It’s a comparative scale. It goes like this:

  • Minimum Possible
  • Nothing
  • Hardly any
  • Some
  • Enough
  • Plenty
  • Remarkable
  • “OMG! That must be a record!”
  • Maximum Possible

This scale is a bit of a mess. The italicized values move around (e.g. maximum possible may be not enough in some situation). The others although fixed relative to each other, aren’t fixed in any way more definite than their ordering. The scale is highly situational. It’s relative to our understanding of the situation. For instance you might be impressed to learn that the Colonia cable ship, which was the largest cable ship in the world in 1925, could carry 300 miles of cable in her hold. If so you would be very easily impressed, because I just lied to you… According to that article it actually could hold 3,000 miles of cable. (However, bonus points if you were thinking “what KIND of cable?”)

What I do with bug numbers, etc.

I want you to notice my first paragraph in this post. Notice that every sentence in that paragraph invokes an unspecified quantity.

  • “I have often…” Often compared to what?
  • “Lots of…” Lots compared to what?
  • “Very…” Very compared to what?
  • “Vital…” Vital compared to what?

You could say “He’s not saying anything definite in those sentences.” I agree, I’m not. I’m just giving an impression. My point is this: an impression is a start. An impression might be reasonable. An impression may make conversation possible. An impression may make conversation successful.

Most engineering statements like this don’t stand alone. Like flower buds, they blossom under the sunlight of questioning. And that’s why I can’t take any engineer seriously who gets offended when his facts are questioned. They cry: “Don’t you believe me?” I answer: “I don’t know what you mean, so belief has no meaning, yet.”

So, as a professional tester who prides himself on self-examination, I am ready for the probing perspective question that might follow my attempt to send an impression: “compared to what?” I am ready for the data question, too: “what did you see or hear that leads you to say this?”

I strive (meaning I consciously and consistently work on this) to be reasonable and careful in my use of qualifiers, quantifiers, quantities, and intensifiers. For instance, you will notice that I just used the word “reasonable”, by which I intend to invoke images of normal professional practice in your mind (A LOT like invoking the image of two healthy reasonable scoops of delicious raisins).

One important and definite thing that is accomplished by this relatively loose use of language is that it allows us to talk to each other without bogging down the conversation with ALL the specifics RIGHT NOW.

Kelloggs used the method mostly to trick you into buying their bran smothered raisin products. They didn’t have any reasoning behind “two scoops.” But we can use the same technique wisely and ethically, if we choose. We can be ready to back up our claims.

For Bugs: If I tell you I “found X bugs!!” in your product, the number of exclamation points indicates the true message. An exclamation point means “remarkable” or “lots.” If I tell you I found a lot of bugs in your product, I mean I found substantially more than I expected to find in the product, and more than a reasonable and knowledgeable person in this situation would consider acceptable. And by “more” I don’t mean quantity of bug reports, I mean the totality of diversity of problems, impact of problems, and frequency of occurrence of problems. The headline for that is “lots of bugs” or maybe I should say “two scoops of bugs!”

Three New Testing Heuristics

A lot of what I do is give names to testing behaviors and patterns that have been around a long time but that people are not systematically studying or using. I’m not seeking to create a standard language, but simply by applying some kind of terminology, I want to make these patterns easier to apply and to study.

This is a quick note about three testing heuristics I named this week:

Steeplechase Heuristic (of exploratory boundary testing)

When you are exploring boundaries, think of your data as having to get to the boundary and then having to go other places down the line. Picture it as one big obstacle course with the boundary you are testing right in the middle.

Then consider that very large, long, extreme data that the boundary is designed to stop might founder on some obstacle before it ever gets to the boundary you want to test. In other words, a limit of 1,000 characters on a field might work fine unless you paste 1,000,000 characters in, in which case it may crash the program instantly before the boundary check ever gets a chance to reject the data.

But also look downstream, and consider that extreme data which barely gets by your boundary may get mangled on another boundary down the road. So don’t just stop testing when you see one boundary is handled properly. Take that data all around to the other functions that process it.

Galumphing (style of test execution)

Galumphing means doing something in a deliberately over-elaborate way. I’ve been doing this for a long time in my test execution. I add lots of unnecessary but inert actions that are inexpensive and shouldn’t (in theory) affect the test outcome. The idea is that sometimes– surprise!– they do affect it, and I get a free bug out of it.

An example is how I frequently click on background areas of windows while moving my mouse pointer to the button I intend to push. Clicking on blank space shouldn’t matter, right? Doesn’t hurt, right?

I actually learned the term from the book “Free Play” by Stephen Nachmanovitch, who pointed out that it is justified by the Law of Requisite Variety. But I didn’t connect it with my test execution practice until jogged by a student in my recent Sydney testing class, Ted Morris Dawson.

Creep & Leap (for pattern investigation)

If you think you understand the pattern of how a function works, try performing some tests that just barely violate that pattern (expecting an error or some different behavior), and try some tests that boldly take that behavior to an extreme without violating it. The former I call creeping; the latter is leaping.

The point here is that we are likely to learn a little more from a mildly violating test than from a hugely violating test because the mildly violating test is much more likely to surprise us, and the surprise will be easier to sort out.

Meanwhile, stretching legal input and expectations as far as they can reasonably go also can teach us a lot.

Creep & Leap is useful for investigating boundaries, of course, but works in situations without classic boundaries, too, such as when we creep by trying a different type of data in a function that is supposed to be rejected.

Tester Pilot

Richard drove up to the hangar just as I was checking the oil on the Husky, his prized baby float plane. Nuts. He was right on time. I was late. I’m supposed to have the plane ready to go when he arrives.

“Hey Dad, looks like a good day for flying. I’m just in the middle of the pre-flight.”

“Where are we going today?” He asked.

“I haven’t been up for a few months, so I figured just a sightseeing tour around the islands and then some pattern work at Friday Harbor.” I hate pattern work: landing and taking off while talking on the radio to the other pilots. That’s exactly why I need to do it, though. I must get over my nerves; must become a safe pilot. It’s a lesson from testing: focus on the risk.

“How much fuel do we need for that?”

“There’s about 18 or 20 gallons on board. That’s actually enough, but I figure it would be better to bring it to 35, just in case.”

“How much will you put in each tank, then?”

“7 gallons.”

“7 plus 7 plus 18 doesn’t add up to 35. Decide on the fuel you want and get that. If you’re going to fudge, fudge toward having more fuel. What are the four most useless things in the world?”

Oh I know this… “Um, altitude above you… runway behind you… gas in the gas truck, and… Um–”

“–and a tenth of a second ago” he finished. “But you remembered the important one. Gas. We don’t want that terrible feeling when they close the airport for 30 minutes to clean something up on the runway, and we don’t have the fuel to divert.”

I could have quibbled with him. What we would actually do in that situation is land 10 minutes away at Friday Harbor airport, or heck, anywhere, because we’re a float plane in the midst of an archipelago. But that’s not the point. The point was the habit of precision; of being conscious and calculated about the risks I take, wherever reasonable. Again this relates to testing. When I’m testing, the habit of precision comes out in considering and controlling the states of my test platforms and test data, and of knowing what to look for in test results.

Dad called the flight center for a briefing. He already knew what they’d say, since he always checked the weather before he left home, but Richard Bach is an especially fastidious pilot. He’s not exactly “by the book.” It’s more that he prides himself on safety. Nothing surprises him without itself being surprised at how prepared he is for it. Yes he knew you were coming; here’s your cake.

Dad and me

The Tester’s Attitude in the Air

My father’s philosophy dovetails perfectly with the tester’s mindset. We expect things to go wrong. We know we will be surprised at times, but we anticipate the kinds of surprises that might occur. That means we perform checks that might seem unnecessary at times. And we keep our eyes open.

I was almost done with the walkaround when he got off the phone.

“Three knots at zero six zero over at Friday,” he announced.

I paused to visualize, then tried to sound authoritative. “That’s a crosswind favoring runway three four.”

“Yes. We have the same conditions here.”

Cool. I got it right. I’m supposed to pretend to be the pilot. Officially, Dad is the pilot-in-command, but I do everything he would do, while he supervises and is ready to take over in case there’s a problem. While I’m doing the preflight, he’s doing it too, but not telling me anything– unless I miss something important. Each time we fly, I’m determined to find a problem with the aircraft that he hasn’t noticed, first. I haven’t yet succeeded.

“Dad, what’s this rust colored streak coming out of this bolt?” Yay, I found something! “There’s one of each side of the elevator.”

“Just a little bit of rust.” He smiled and materialized a can of WD-40 and blasted the bolts with it. This airplane is pristine, so even a little blemish of rust really stands out.

“Were you flying recently?”

“Yeah, I went out last week and splashed around at Lake Watcom.”

“That explains the streaks. Water spray on the tail. Did you pump out the floats afterward?”

“No, but I doubt there’s more than a pint of water in there.”

“Let’s see about that.” I retrieved the hand pump while he popped out the drain plugs. He was right again, I couldn’t suck out more than a cup of water from the floats, total, from all the compartments.

But there was something odd about the last one.

“This water is PINK, Dad!”

Now he was not smiling.

“Unless you landed at the rainbow lake on Unicorn Planet, there may be a hydraulic leak in there.”

He put his fingers in the residue and sniffed it like a bush guide on the trail of a white tiger. “Yeah, that’s what it is. Let’s pop the hatch and take a look.”

Testing With Open Expectations

This is an example of why good testing is inherently an open investigation. Yes, I had definite ideas of what I was testing for: leaky floats. Yes, I had a specific method in mind for performing that test. Had I not a specific method, I would have developed one on the fly. That’s the testing discipline. My oracles were judging the amount of water I was able to pump out of the floats compared to other occasions, and I also tasted the water a couple of times to detect if it was salty. It shouldn’t be, because it had been several flights since we had landed in salt water, but I check just in case there was a previously undetected leak from before. If salt water gets in there, we could have a serious corrosion problem.

I had no conscious intent to check the color of the water. But in testing we take every anomaly seriously. We take data in and ask ourselves, does this make sense? In that way, we are like secret service agents, trained to deal with known threats and to have a good chance to discern fresh and unusual threats, too.

The question “Can I explain everything that I see?” is a good heuristic to keep in mind while testing.

But if I were to have automated my float pump tests, I would never have found this problem. Because unlike a human, a program can’t look at something and just try to explain it. It must be told exactly what to look for.

I got an email, today…

Geoff confirmed the hydraulic leak at the connection in the left float, and will be sealing it, probably tomorrow.  He’ll move the Husky to the big hangar to do the work. Nice, that you decided to pump the floats!

Dad

“Mipping”: A Strategy for Reporting Iffy Bugs

When I first joined ST Labs, years ago, we faced a dilemma. We had clients telling us what kind of bugs we should not report. “Don’t worry about the installer. Don’t report bugs on that. We have that covered.” No problem, dear customer, we cheerfully replied. Then after the project we would hear complaints about all the installation bugs we “missed”.

So, we developed a protocol called Mention In Passing, or “mipping”. All bugs shall be reported, without exception. Any bug that seems questionable or prohibited we will “mention in passing” in our status reports or emails. In an extreme case we mention it by voice, but I generally want to have a written record. That way we are not accused of wasting time investigating and reporting the bug formally, but we also can’t be accused of missing it entirely.

If a client tells me to stop bothering him about those bugs, even in passing, I might switch to batching them, or I might write a memo to all involved that I will henceforth not report that kind of problem. But if there is reasonable doubt in my mind that my client and I have a strong common understanding of what should and should not be reported, I simply tell them that I “mip” bugs to periodically check to see if I have accidentally misconstrued the standard for reporting, or to see if the standard has changed.

Designing Experiments

I experience intellectual work, such as testing, as a web of interconnected activities. If I were to suggest what is at the center of the testing web, on my short list would be: designing experiments. A good test is, ultimately, an experiment.

I’ve been looking around online for some good references about how to design experiments (since most testers I talk to have a lot of trouble with it). Here is a good one.

If you know of any other straightforward description of the logic of experiments, please let me know. I have some good books. I just need more online material.

How to Investigate Intermittent Problems

The ability and the confidence to investigate an intermittent bug is one of the things that marks an excellent tester. The most engaging stories about testing I have heard have been stories about hunting a “white whale” sort of problem in an ocean of complexity. Recently, a thread on the SHAPE forum made me realized that I had not yet written about this fascinating aspect of software testing.

Unlike a mysterious non-intermittent bug, an intermittent bug is more of a testing problem than a development problem. A lot of programmers will not want to chase that white whale, when there’s other fishing to do.

Intermittent behavior itself is no big deal. It could be said that digital computing is all about the control of intermittent behavior. So, what are we really talking about?

We are not concerned about intermittence that is both desirable and non-mysterious, even if it isn’t exactly predictable. Think of a coin toss at the start of a football game, or a slot machine that comes up all 7’s once in a long while. We are not even concerned about mysterious intermittent behavior if we believe it can’t possibly cause a problem. For the things I test, I don’t care much about transient magnetic fields or minor random power spikes, even though they are happening all the time.

Many intermittent problems have not yet been observed at all, perhaps because they haven’t manifested, yet, or perhaps because they have manifested and not yet been noticed. The only thing we can do about that is to get the best test coverage we can and keep at it. No algorithm can exist for automatically detecting or preventing all intermittent problems.

So, what we typically call an intermittent problem is: a mysterious and undesirable behavior of a system, observed at least once, that we cannot yet manifest on demand.

Our challenge is to transform the intermittent bug into a regular bug by resolving the mystery surrounding it. After that it’s the programmer’s headache.

Some Principles of Intermittent Problems:

  • Be comforted: the cause is probably not evil spirits.
  • If it happened once, it will probably happen again.
  • If a bug goes away without being fixed, it probably didn’t go away for good.
  • Be wary of any fix made to an intermittent bug. By definition, a fixed bug and an unfixed intermittent bug are indistinguishable over some period of time and/or input space.
  • Any software state that takes a long time to occur, under normal circumstances, can also be reached instantly, by unforeseen circumstances.
  • Complex and baffling behavior often has a simple underlying cause.
  • Complex and baffling behavior sometimes has a complex set of causes.
  • Intermittent problems often teach you something profound about your product.
  • It’s easy to fall in love with a theory of a problem that is sensible, clever, wise, and just happens to be wrong.
  • The key to your mystery might be resting in someone else’s common knowledge.
  • An intermittent problem in the lab might be easily reproducible in the field.
  • The Pentium Principle of 1994: an intermittent technical problem may pose a *sustained and expensive* public relations problem.
  • The problem may be intermittent, but the risk of that problem is ever present.
  • The more testability is designed into a product, the easier it is to investigate and solve intermittent problems.
  • When you have eliminated the impossible, whatever remains, however improbable, could have done a lot of damage by then! So, don’t wait until you’ve fully researched an intermittent problem before you report it.
  • If you ever get in trouble an intermittent problem that you could not lock down before release, you will fare a lot better if you made a faithful, thoughtful, vigorous effort to find and fix it. The journey can be the reward, you might say.

Some General Suggestions for Investigating Intermittent Problems:

  • Recheck your most basic assumptions: are you using the computer you think you are using? are you testing what you think you are testing? are you observing what you think you are observing?
  • Eyewitness reports leave out a lot of potentially vital information. So listen, but DO NOT BECOME ATTACHED to the claims people make.
  • Invite more observers and minds into the investigation.
  • Create incentives for people to report intermittent problems.
  • If someone tells you what the problem can’t possibly be, consider putting extra attention into those possibilities.
  • Check tech support websites for each third party component you use. Maybe the problem is listed.
  • Seek tools that could help you observe and control the system.
  • Improve communication among observers (especially with observers who are users in the field).
  • Establish a central clearinghouse for mystery bugs, so that patterns among them might be easier to spot.
  • Look through the bug list for any other bug that seems like the intermittent problem.
  • Make more precise observations (consider using measuring instruments).
  • Improve testability: Add more logging and scriptable interfaces.
  • Control inputs more precisely (including sequences, timing, types, sizes, sources, iterations, combinations).
  • Control state more precisely (find ways to return to known states).
  • Systematically cover the input and state spaces.
  • Save all log files. Someday you’ll want to compare patterns in old logs to patterns in new ones.
  • If the problem happens more often in some situations than in others, consider doing a statistical analysis of the variance between input patterns in those situations.
  • Consider controlling things that you think probably don’t matter.
  • Simplify. Try changing only one variable at a time; try subdividing the system. (helps you understand and isolate problem when it occurs)
  • Complexify. Try changing more variables at once; let the state get “dirty”. (helps you make a lottery-type problem happen)
  • Inject randomness into states and inputs (possibly by loosening controls) in order to reach states that may not fit your typical usage profile.
  • Create background stress (high loads; large data).
  • Set a trap for the problem, so that the next time it happens, you’ll learn much more about it.
  • Consider reviewing the code.
  • Look for interference among components created by different organizations.
  • Celebrate and preserve stories about intermittent problems and how they were resolved.
  • Systematically consider the conceivable causes of the problem (see below).
  • Beware of burning huge time on a small problem. Keep asking, is this problem worth it?
  • When all else fails, let the problem sit a while, do something else, and see if it spontaneously recurs.

Considering the Causes of Intermittent Problems

When investigating an intermittent problem, it maybe worth considering the kinds of things that cause such problems. The list of guideword heuristics below may help you systematically do that analysis. There is some redundancy among the items in the list, because causes can be viewed from different perspectives.

Possibility 1: The system is NOT behaving differently. The apparent intermittence is an artifact of the observation.

  • Bad observation: The observer may have made a poor observation. (e.g. “Innattentional Blindness” is a phenomena whereby an observer whose mind is occupied may not see things that are in plain view. When presented with the scene a second time, the observer may see new things in the scene and assume that they weren’t there, before. Also, certain optical illusions cause apparently intermittent behavior in an unchanging scene. See “the scintillating grid”)
  • Irrelevant observation: The observer may be looking at differences that don’t matter. The things that matter may not be intermittent. This can happen when an observation is too precise for its purpose.
  • Bad memory: The observer may have mis-remembered the observation, or records of the observation could have been corrupted. (There’s a lot to observe when we observe! Our mind immediately compact the data and relate it to other data. Important data may be edited out. Besides, a lot of system development and testing involve highly repetitive observations, and we sometimes get them mixed up.)
  • Misattribution: The observer may have mis-attributed the observation. (“Microsoft Word crashed” might mean that *Windows* crashed for a reason that had nothing whatsoever to do with Word. Word didn’t “do” anything. This is a phenomenon also known as “false correlation” and often occurs in the mind of an observer when one event follows hard on the heels of another event, making one appear to be caused by the other. False correlation is also chiefly responsible for many instances whereby an intermittent problem is mistakenly construed to be a non-intermittent problem with a very complex and unlikely set of causes)
  • Misrepresentation: The observer may have misrepresented the observation. (There are various reasons for this. An innocent reason is that the observer is so confident in an inference that they have the honest impression that they did observe it and report it as such. I once asked my son if his malfunctioning Playstation was plugged in. “Yes!” he said impatiently. After some more troubleshooting, I had just concluded that the power supply was shot when I looked down and saw that it was obviously not plugged in.)
  • Unreliable oracle: The observer may be applying an intermittent standard for what constitutes a “problem.” (We may get the impression that a problem is intermittent only because some people, some of the time, don’t consider the behavior to be a problem, even if the behavior is itself predictable. Different observers may have different tolerances and sensitivities; and the same observer may vary in that way from one hour to the next.)
  • Unreliable communication: Communication with the observer may be inconsistent. (We may get the impression that a problem is intermittent simply because reports about it don’t consistently reach us, even if the problem is itself quite predictable. “I guess people aren’t seeing the problem anymore” may simply mean that people no longer bother to complain.)

Possibility 2: The system behaved differently because it was a different system.

  • Deus ex machina: A developer may have changed it on purpose, and then changed it back. (This can occur easily when multiple developers or teams are simultaneously building or servicing different parts of an operational server platform without coordinating with each other. Another possibility, of course, is that the system has been modified by a malicious hacker.)
  • Accidental change: A developer may be making accidental changes. (The changes may have unanticipated side effects, leading to the intermittent behavior. Also, a developer may be unwittingly changing a live server instead of a sandbox system.)
  • Platform change: A platform component may have been swapped or reconfigured. (An administrator or user may have changed, intentionally or not, a component on which the product depends. Common sources of these problems include Windows automatic updates, memory and disk space reconfigurations.)
  • Flakey hardware: A physical component may have transiently malfunctioned. (Transient malfunctions may be due factors such as inherent natural variation, magnetic fields, excessive heat or cold, battery low conditions, poor maintenance, or physical shock.)
  • Trespassing system: A foreign system may be intruding. (For instance, in web testing, I might get occasionally incorrect results due to a proxy server somewhere at my ISP that provides a cached version of pages when it shouldn’t. Other examples are background virus scans, automatic system updates, other programs, or other instances of the same program.)
  • Executable corruption: The object code may have become corrupted. (One of the worst bugs I ever created in my own code (in terms of how hard it was to find) involved machine code in a video game that occasionally wrote data over a completely unrelated part of the same program. Because of the nature of that data, the system didn’t crash, but rather the newly corrupted function passed control to the function that immediately followed it in memory. Took me days (and a chip emulator) to figure it out.)
  • Split personality: The “system” may actually be several different systems that perform as one. (For instance, I may get inconsistent results from Google depending on which Google server I happen to get; or I might not realize that different machines in the test lab have different versions of some key component; or I might mistype a URL and accidentally test on the wrong server some of the time.)
  • Human element: There may be a human in the system, making part of it run, and that human is behaving inconsistently.

Possibility 3: The system behaved differently because it was in a different state.

  • Frozen conditional: A decision that is supposed to be based on the status of a condition may have stopped checking that condition. (It could be stuck in an “always yes” or “always no” state.)
  • Improper initialization: One or more variables may not have been initialized. (The starting state of a computation would therefore depend on the state of some previous computation of the same or other function.)
  • Resource denial: A critical file, stream, or other variable may not be available to the system. (This could happen either because the object does not exist, has become corrupted, or is locked by another process.)
  • Progressive data corruption: A bad state may have slowly evolved from a good state by small errors propagating over time. (Examples include timing loops that are slightly off, or rounding errors in complicated or reflexive calculations.)
  • Progressive destabilization: There may be a classic multi-stage failure. (The first part of the bug creates an unstable state– such as a wild pointer– when a certain event occurs, but without any visible or obvious failure. The second part precipitates a visible failure at a later time based on the unstable state in combination with some other condition that occurs down the line. The lag time between the destabilizing event and the precipitating event makes it difficult to associate the two events to the same bug.)
  • Overflow: Some container may have filled to beyond its capacity, triggering a failure or an exception handler. (In an era of large memories and mass storage, overflow testing is often shortchanged. Even if the condition is properly handled, the process of handling it may interact with other functions of the system to cause an emergent intermittent problem.)
  • Occasional functions: Some functions of a system may be invoked so infrequently that we forget about them. (These include exception handlers, internal garbage collection functions, auto-save, and periodic maintenance functions. These functions, when invoked, may interact in unexpected ways with other functions or conditions of the system. Be especially wary of silent and automatic functions.)
  • Different mode or option setting: The system can be run in a variety of modes and the user may have set a different mode. (The new mode may not be obviously different from the old one.)

Possibility 4: The system behaved differently because it was given different input.

  • Accidental input: User may have provided input or changed the input in a way that shouldn’t have mattered, yet did. (This might also be called the Clever Hans syndrome, after the mysteriously repeatable ability of Clever Hans, the horse, to perform math problems. It was eventually discovered by Oskar Pfungst that the horse was responding to subtle physical cues that its owner was unintentionally conveying. In the computing world, I once experienced an intermittent problem due to sunlight coming through my office window and hitting an optical sensor in my mouse. The weather conditions outside shouldn’t have constituted different input, but they did. Another more common example is different behavior that may occur when using the keyboard instead of mouse to enter commands. The accidental input might be invisible unless you use special tools or recorders. For instance, two identical texts, one saved in RTF format from Microsoft Word and one saved in RTF format from Wordpad, will be very similar on the disk but not exactly identical.)
  • Secret boundaries and conditions: The software may behave differently in some parts of the input space than it does in others. (There maybe hidden boundaries, or regions of failure, that aren’t documented or anticipated in your mental model of the product. I once tested a search routine that invoked different logic when the total returned hits were =1000 and = 50,000. Only by accident did I discover these undocumented boundaries.)
  • Different profile: Some users may have different profiles of use than other users. (Different biases in input will lead to different experiences of output. Users with certain backgrounds, such as programmers, may be systematically more or less likely to experience, or notice, certain behaviors.)
  • Ghost input: Some other machine-based source than the user may have provided different input. (Such input is often invisible to the user. This includes variations due to different files, different signals from peripherals, or different data coming over the network.)
  • Deus Ex Machina: A third party may be interacting with the product at the same time as the user. (This maybe a fellow tester, friendly user, or a malicious hacker.)
  • Compromised input: Input may have been corrupted or intercepted on its way into the system. (Especially a concern in client-server systems.)
  • Time as input: Intermittence over time may be due to time itself. (Time is the one thing that constantly changes, no matter whatever else you control. Whenever time and date, or time and date intervals, are used as input, bugs in that functionality may appear at some times but not others.)
  • Timing lottery: Variations in input that normally don’t matter may matter at certain times or at certain loads. (The Mars Rover suffered from a problem like this involving a three microsecond window of vulnerability when a write operation could write to a protected part of memory.)
  • Combination lottery: Variations in input that normally don’t matter may matter when combined in a certain way.

Possibility 5: The other possibilities are magnified because your mental model of the system and what influences it is incorrect or incomplete in some important way.

  • You may not be aware of each variable that influences the system.
  • You may not be aware of sources of distortion in your observations.
  • You may not be aware of available tools that might help you understand or observe the system.
  • You may not be aware of the all the boundaries of the system and all the characteristics of those boundaries.
  • The system may not actually have a function that you think it has; or maybe it has extra functions.
  • A complex algorithm may behave in a surprising way, intermittently, that is entirely correct (e.g. mathematical chaos can look like random behavior).