The Bestselling Software Intended for People Who Couldn’t Use It.

In 1983, my boss, Dale Disharoon, designed a little game called Alphabet Zoo. My job was to write the Commodore 64 and Apple II versions of that game. Alphabet Zoo is a game for kids who are learning to read. The child uses a joystick to move a little character through a maze, collecting letters to spell words.

We did no user testing on this game until the very day we sent the final software to the publisher. On that day, we discovered that our target users (five year-olds) did not have the ability to use a joystick well enough to play the game. They just got frustrated and gave up.

We shipped anyway.

This game became a bestseller.

alphabet-zoo

It was placed on a list of educational games recommended by the National Education Association.

alphabet-zoo-list

Source: Information Please Almanac, 1986

So, how to explain this?

Some years later, when I became a father, I understood. My son was able to play a lot of games that were too hard for him because I operated the controls. I spoke to at least one dad who did exactly that with Alphabet Zoo.

I guess the moral of the story is: we don’t necessarily know the value of our own creations.

Reinventing Testing: What is Integration Testing? (part 2)

These thoughts have become better because of these specific commenters on part 1: Jeff Nyman, James Huggett, Sean McErlean, Liza Ivinskaia, Jokin Aspiazu, Maxim Mikhailov, Anita Gujarathi, Mike Talks, Amit Wertheimer, Simon Morley, Dimitar Dimitrov, John Stevenson. Additionally, thank you Michael Bolton and thanks to the student whose productive confusion helped me discover a blindspot in my work, Anita Gujarathi.

Integration testing is a term I don’t use much– not because it doesn’t matter, but because it is so fundamental that it is already baked into many of the other working concepts and techniques of testing. Still, in the past week, I decided to upgrade my ability to quickly explain integration, integration risk, and integration testing. This is part of a process I recommend for all serious testers. I call it: reinventing testing. Each of us may reinvent testing concepts for ourselves, and engage in vigorous debates about them (see the comments on part 1, which is now the most commented of any post I have ever done).

For those of you interested in getting to a common language for testing, this is what I believe is the best way we have available to us. As each of us works to clarify his own thinking, a de facto consensus about reasonable testing ontology will form over time, community by community.

So here we go…

There several kinds of testing that involve or overlap with or may even be synonymous with integration testing, including: regression testing, system testing, field testing, interoperability testing, compatibility testing, platform testing, and risk-based testing. Most testing, in fact, no matter what it’s called, is also integration testing.

Here is my definition of integration testing, based on my own analysis, conversations with RST instructors (mainly Michael Bolton), and stimulated by the many commenters from part 1. All of my assertions and definitions are true within the Rapid Software Testing methodology namespace, which means that you don’t have to agree with me unless you claim to be using RST.

What is integration testing?

Integration testing is:
1. Testing motivated by potential risk related to integration.
2. Tests designed specifically to assess risk related to integration.

Notes:

1. “Motivated by” and “designed specifically to” overlap but are not the same. For instance, if you know that a dangerous criminal is on the loose in your neighborhood you may behave in a generally cautious or vigilant way even if you don’t know where the criminal is or what he looks like. But if you know what he looks like, what he is wearing, how he behaves or where he is, you can take more specific measures to find him or avoid him. Similarly, a newly integrated product may create a situation where any kind of testing may be worth doing, even if that testing is not specifically aimed at uncovering integration bugs, as such; OR you can perform tests aimed at exposing just the sort of bugs that integration typically causes, such as by performing operations that maximize the interaction of components.

The phrase “integration testing” may therefore represent ANY testing performed specifically in an “integration context”, or applying a specific “integration test technique” in ANY context.

This is a special case of the difference between risk-based test management and risk-based test design. The former assigns resources to places where there is potential risk but does not dictate the testing to be performed; whereas the latter crafts specific tests to examine the product for specific kinds of problems.

2. “Potential risk” is not the same as “risk.” Risk is the danger of something bad happening, and it can be viewed from at least three perspectives: probability of a bad event occurring, the impact of that event if it occurs, and our uncertainty about either of those things. A potential risk is a risk about which there is substantial uncertainty (in other words, you don’t know how likely the bug is to be in the product or you don’t know how bad it could be if it were present). The main point of testing is to eliminate uncertainty about risk, so this often begins with guessing about potential risk (in other words, making wild guesses, educated guesses, or highly informed analyses about where bugs are likely to be).

Example: I am testing something for the first time. I don’t know how it will deal with stressful input, but stress often causes failure, so that’s a potential risk. If I were to perform stress testing, I would learn a lot about how the product really handles stress, and the potential risk would be transformed into a high risk (if I found serious bugs related to stress) or a low risk (if the product handled stress in a consistently graceful way).

What is integration?

General definition from the Oxford English Dictionary: “The making up or composition of a whole by adding together or combining the separate parts or elements; combination into an integral whole: a making whole or entire.”

Based on this, we can make a simple technical definition related to products:

Integration is:
v. the process of constructing a product from parts.
n. a product constructed from parts.

Now, based on General Systems Theory, we make these assertions:

An integration, in some way and to some degree:

  1. Is composed of parts:
  • …that come from differing sources.
  • …that were produced for differing purposes.
  • …that were produced at different times.
  • …that have differing attributes.
  1. Creates or represents an internal environment for its parts:
  • …in which its parts interact among themselves.
  • …in which its parts depend on each other.
  • …in which its parts interact with or depend on an external environment.
  • …in which these things are not visible from the outside.
  1. Possesses attributes relative to its parts:
  • …that depend on them.
  • …that differ from them.

Therefore, you might not be able to discern everything you want to know about an integration just by looking at its parts.

This is why integration risk exists. In complex or important systems, integration testing will be critically important, especially after changes have been made.

It may be possible to gain enough knowledge about an integration to characterize the risk (or to speak more plainly: it may be possible to find all the important integration bugs) without doing integration testing. You might be able to do it with unit testing. However, that process, although possible in some cases, might be impractical. This is the case partly because the parts may have been produced by different people with different assumptions, because it is difficult to simulate the environment of an integration prior to actual integration, or because unit testing tends to focus on what the units CAN do and not on what they ACTUALLY NEED to do. (If you unit test a calculator, that’s a lot of work. But if that calculator will only ever be asked to add numbers under 50, you don’t need to do all that work.)

Integration testing, although in some senses being complex, may actually simplify your testing since some parts mask the behavior of other parts and maybe all you need to care about is the final outputs.

Notes:

1. “In some way and to some degree” means that these assertions are to be interpreted heuristically. In any specific situation, these assertions are highly likely to apply in some interesting or important way, but might not. An obvious example is where I wrote above that the “parts interact with each other.” The stricter truth is that the parts within an integration probably do not EACH directly interact with ALL the other ones, and probably do not interact to the same degree and in the same ways. To think of it heuristically, interpret it as a gentle warning such as  “if you integrate something, make it your business to know how the parts might interact or depend on each other, because that knowledge is probably important.”

By using the phrase “in some way and to some degree” as a blanket qualifier, I can simplify the rest of the text, since I don’t have to embed other qualifiers.

2. “Constructing from parts” does not necessarily mean that the parts pre-existed the product, or have a separate existence outside the product, or are unchanged by the process of integration. It just means that we can think productively about pieces of the product and how they interact with other pieces.

3. A product may possess attributes that none of its parts possess, or that differ from them in unanticipated or unknown ways. A simple example is the stability of a tripod, which is not found in any of its individual legs, but in all the legs working together.

4. Disintegration also creates integration risk. When you takes things away, or take things apart, you end up with a new integration, and that is subject to the much the same risk as putting them together.

5. The attributes of a product and all its behaviors obviously depend largely on the parts that comprise it, but also on other factors such as the state of those parts, the configurations and states of external and internal environments, and the underlying rules by which those things operate (ultimately, physics, but more immediately, the communication and processing protocols of the computing environment).

6. Environment refers to the outside of some object (an object being a product or a part of a product), comprising factors that may interact with that object. A particular environment might be internal in some respects or external in other respects, at the same time.

  • An internal environment is an environment controlled by the product and accessible only to its parts. It is inside the product, but from the point vantage point of some of parts, it’s outside of them. For instance, to a spark plug the inside of an engine cylinder is an environment, but since it is not outside the car as a whole, it’s an internal environment. Technology often consists of deeply nested environments.
  • An external environment is an environment inhabited but not controlled by the product.
  • Control is not an all-or-nothing thing. There are different levels and types of control. For this reason it is not always possible to strictly identify the exact scope of a product or its various and possibly overlapping environments. This fact is much of what makes testing– and especially security testing– such a challenging problem. A lot of malicious hacking is based on the discovery that something that the developers thought was outside the product is sometimes inside it.

7. An interaction occurs when one thing influences another thing. (A “thing” can be a part, an environment, a whole product, or anything else.)

8. A dependency occurs when one thing requires another thing to perform an action or possess an attribute (or not to) in order for the first thing to behave in a certain way or fulfill a certain requirement. See connascence and coupling.

9. Integration is not all or nothing– there are differing degrees and kinds. A product may be accidentally integrated, in that it works using parts that no one realizes that it has. It may be loosely integrated, such as a gecko that can jettison its tail, or a browser with a plugin. It may be tightly integrated, such as when we take the code from one product and add it to another product in different places, editing as we go. (Or when you digest food.) It may preserve the existing interfaces of its parts or violate them or re-design them or eliminate them. The integration definition and assertions, above, form a heuristic pattern– a sort of lens– by which we can make better sense of the product and how it might fail. Different people may identify different things as parts, environments or products. That’s okay. We are free to move the lens around and try out different perspectives, too.

Example of an Integration Problem

bitmap

This diagram shows a classic integration bug: dueling dependencies. In the top two panels, two components are happy to work within their own environments. Neither is aware of the other while they work on, let’s say, separate computers.

But when they are installed together on the same machine, it may turn out that each depends on factors that exclude the other. Even though the components themselves don’t clash (the blue A box and the blue B boxes don’t overlap). Often such dependencies are poorly documented, and may be entirely unknown to the developer before integration time.

It is possible to discover this through unit testing… but so much easier and probably cheaper just to try to integrate sooner rather than later and test in that context.

 

Justifying Real Acceptance Testing

This post is not about the sort of testing people talk about when nearing a release and deciding whether it’s done. I have another word for that. I call it “testing,” or sometimes final testing or release testing. Many projects perform that testing in such a perfunctory way that it is better described as checking, according to the distinction between testing and checking I have previously written of on this blog. As Michael Bolton points out, that checking may better be described as rejection checking since a “fail” supposedly establishes a basis for saying the product is not done, whereas no amount of “passes” can show that it is done.

Acceptance testing can be defined in various ways. This post is about what I consider real acceptance testing, which I define as testing by a potential acceptor (a customer), performed for the purpose of informing a decision to accept (to purchase or rely upon) a product.

Do we need acceptance testing?

Whenever a business decides to purchase and rely upon a component or service, there is a danger that the product will fail and the business will suffer. One approach to dealing with that problem is to adopt the herd solution: follow the thickest part of the swarm; choose a popular product that is advertised or reputed to do what you want it to do and you will probably be okay. I have done that with smartphones, ruggedized laptops, file-sharing services, etc. with good results, though sometimes I am disappointed.

My business is small. I am nimble compared to almost every other company in the world. My acceptance testing usually takes the form of getting a trial subscription to service, or downloading the “basic” version of a product. Then I do some work with it and see how I feel. In this way I learned to love Dropbox, despite its troubling security situation (I can’t lock up my Dropbox files), or the fact that there is a significant chance it will corrupt very large files. (I no longer trust it with anything over half of a gig).

But what if I were advising a large company about whether to adopt a service or product that it will rely upon across dozens or hundreds or thousands of employees? What if the product has been customized or custom built specifically for them? That’s when acceptance testing becomes important.

Doesn’t the Service Level Agreement guarantee that the product will work?

There are a couple of problems with relying on vendor promises. First, the vendor probably isn’t promising total satisfaction. The service “levels” in the contract are probably narrowly and specifically drawn. That means if you don’t think of everything that matters and put that in the contract, it’s not covered. Testing is a process that helps reveal the dimensions of the service that matter.

Second, there’s an issue with timing. By the time you discover a problem with the vendor’s product, you may be already relying on it. You may already have deployed it widely. It may be too late to back out or switch to a different solution. Perhaps your company negotiated remedies in that case, but there are practical limitations to any remedy. If your vendor is very small, they may not be able to afford to fix their product quickly. If you vendor is very large, they may be able to afford to drag their feet on the fixes.

Acceptance testing protects you and makes the vendor take quality more seriously.

Acceptance testing should never be handled by the vendor. I was once hired by a vendor to do penetration testing on their product in order to appease a customer. But the vendor had no incentive to help me succeed in my assignment, nor to faithfully report the vulnerabilities I discovered. It would have been far better if the customer had hired me.

Only the accepting party has the incentive to test well. Acceptance testing should not be pre-agreed or pre-planned in any detail– otherwise the vendor will be sure that the product passes those specific tests. It should be unpredictable, so that the vendor has an incentive to make the product truly meet its requirements in a general sense. It should be adaptive (exploratory) so that any weakness you find can be examined and exploited.

The vendor wants your money. If your company is large enough, and the vendor is hungry, they will move mountains to make the product work well if they know you are paying attention. Acceptance testing, done creatively by skilled testers on a mission, keeps the vendor on its toes.

By explicitly testing in advance of your decision to accept the product, you have a fighting chance to avoid the disaster of discovering too late that the product is a lemon.

My management doesn’t think acceptance testing matters. What do I do?

1. Make the argument, above.
2. Communicate with management, formally, about this. (In writing, so that there is a record.)
It is up to management to make decisions about business risk. They may feel the risk is not worth worrying about. In that case, you must wait and watch. People are usually more persuaded by vivid experiences, rather than abstract principles, so:
1. Collect specific examples of the problems you are talking about. What bugs have you experienced in vendor products?
2. Collect news reports about bugs in products of your vendors (or other vendors) that have been disruptive.
3. In the event you get to do even a little acceptance testing, make a record of the problems you find and be ready to remind management of that history.

 

Quality is Dead #2: The Quality Creation Myth

One of the things that makes it hard to talk about quality software is that we first must overcome the dominating myth about quality, which goes like this: The quality of a product is built into it by its development team. They create quality by following disciplined engineering practices to engineer the source code so that it will fulfill the requirements of the user.

This is a myth, not a lie. It’s a simplified story that helps us make sense of our experience. Myths like this can serve a useful purpose, but we  must take care not to believe in them as if they were the great and hoary truth.

Here are some of the limitations of the myth:

  1. Quality is not a thing and it is not built. To think of it as a thing is to commit the “reification fallacy” that my colleague Michael Bolton loves to hate. Instead, quality is a relationship. Excellent quality is a wonderful sort of relationship. Instead of “building” quality, it’s more coherent to say we arrange for it. Of course you are thinking “what’s the difference between arrange and build? A carpenter could be said to arrange wood into the form of a cabinet. So what?” I like the word arrange because it shifts our attention to relationships and because arrangement suggests less permanence. This is important because in technology we are obliged to work with many elements that are subject to imprecision, ambiguity and drift.
  2. A “practice” is not the whole story of how things get done. To say that we accomplish things by following “practices” or “methods” is to use a figure of speech called a synecdoche– the substitution of a part for the whole. What we call practices are the public face of a lot of shadowy behavior that we don’t normally count as part of the way we work. For instance, joking around, or eating a salad at your desk, or choosing which email to read next, and which to ignore. A social researcher examining a project in progress would look carefully at who talks to whom, how they talk and what they talk about. How is status gained or lost? How do people decide what to do next? What are the dominant beliefs about how to behave in the office? How are documents created and marketed around the team? In what ways do people on the team exert or accept control?
  3. Source code is not the product. The product is the experience that the user receives. That experience comes from the source code in conjunction with numerous other components that are outside the control and sometimes even the knowledge of product developers. It also comes from documentation and support. And that experience plays out over time on what is probably a chaotic multi-tasking computing environment.
  4. “Requirements” are not the requirements, and the “users” are not the users. I don’t know what my requirements are for any of the software I have ever used. I mean, I do know some things. But for anything I think I know, I’m aware that someone else may suggest something that is different that might please me better. Or maybe they will show me how something I thought was important is actually harmful. I don’t know my own requirements for certain. Instead, I make good guesses. Everyone tries to do that. People learn, as they see and work with products, more about what they want. Furthermore, what they want actually changes with their experiences. People change. The users you think you are targeting may not be the users you get.
  5. Fulfillment is not forever and everywhere. The state of the world drifts. A requirement fulfilled today may no longer be fulfilled tomorrow, because  of a new patch to the operating system, or because a new competing product has been released.  Another reason we can’t count on a requirement being fulfilled is that can does not mean will. What I see working with one data set on one computer may not work with other data on another computer.

These factors make certain conversations about quality unhelpful. For instance, I’m impatient when someone claims that unit testing or review will guarantee a great product, because unit testing and review do not account for system level effects, or transient data occurring in the field, or long chains of connected transactions, or intermittent failure of third-party components. Unit testing and review focus on source code. But source code is not the product. So they can be useful, but they are still mere heuristic devices. They provide no guarantee.

Once in a while, I come across a yoho who thinks that a logical specification language like “Z” is the great solution. Because then your specification can be “proven correct.” The big problems with that, of course, is that correctness in this case simply means self-consistency. It does not mean that the specification corresponds to the needs of the customer, nor that it corresponds to the product that is ultimately built.

I’m taking an expansive view of products and projects and quality, because I believe my job is to help people get what they want. Some people, mainly those who go on and on about “disciplined engineering processes” and wish to quantify quality, take a narrower view of their job. I think that’s because their overriding wish is that any problems not be “their fault” but rather YOUR fault. As in, “Hey, I followed the formal spec. If you put the wrong things in the formal spec, that’s YOUR problem, stupid.”

My Take on the Quality Story

Let me offer a more nuanced version of the quality story– still a myth, yes– but one more useful to professionals:

A product is a dynamic arrangement, like a garden that is subject to the elements. A high quality product takes skillful tending and weeding over time. Just like real gardeners, we are not all powerful or all knowing as we grow our crop. We review the conditions and the status of our product as we go. We try to anticipate problems, and we react to solve the problems that occur. We try to understand what our art can and cannot do, and we manage the expectations of our customers accordingly. We know that our product is always subject to decay, and that the tastes of our customers vary. We also know that even the most perfect crop can be spoiled later by a bad chef. Quality, to a significant degree, is out of our hands.

After many years of seeing things work and fail (or work and THEN fail), I think of quality as ephemeral. It may be good enough, at times. It may be better than good enough. But it fades; it always fades, like something natural.

Or like sculpture by Andy Goldsworthy.  (Check out this video.)

This is true for all software, but the degree to which it is a problem will vary. Some systems have been built that work well over time. That is the result of excellent thinking and problem solving on the part of the development team. But I would argue it is also the result of favorable conditions in the surrounding environment. Those conditions are subject to change without notice.

The Future Will Need Us to Reboot It

I’ve been reading a bit about the Technological Singularity. It’s an interesting and chilling idea conceived by people who aren’t testers. It goes like this: the progress of technology is increasing exponentially. Eventually the A.I. technology will exist that will be capable of surpassing human intelligence and increasing its own intelligence. At that point, called the Singularity, the future will not need us… Transhumanity will be born… A new era of evolution will begin.

I think a tester was not involved in this particular project plan. For one thing, we aren’t even able to define intelligence, except as the ability to perform rather narrow and banal tasks super-fast, so how do we get from there to something human-like? It seems to me that the efforts to create machines that will fool humans into believing that they are smart are equivalent to carving a Ferrari out of wax. Sure you could fool someone, but it’s still not a Ferrari. Wishing and believing doesn’t make it a Ferrari.

Because we know how a Ferrari works, it’s easy to understand that a wax Ferrari is very different from a real one. Since we don’t know what intelligence really is, even smart people easily will confuse wax intelligence for real intelligence. In testing terms, however, I have to ask “What are the features of artificial intelligence? How would you test them? How would you know they are reliable? And most importantly, how would you know that human intelligence doesn’t possess secret and subtle features that have not yet been identified?” Being beaten in chess by a chess computer is no evidence that such a computer can help you with your taxes, or advise you on your troubles with girls. Impressive feats of “intelligence” simply do not encompass intelligence in all the forms that we routinely experience it.

The Google Grid

One example is the so-called Google Grid. I saw a video, the other day, called Epic 2014. It’s about the rise of a collection of tools from Google that create an artificial mass intelligence. One of the features of this fantasy is an “algorithm” that automatically writes news stories by cobbling pieces from other news stories. The problem with that idea is that it seems to know nothing about writing. Writing is not merely text manipulation. Writing is not snipping and remixing. Writing requires modeling a world, modeling a reader’s world, conceiving of a communication goal, and finding a solution to achieve that goal. To write is to express a point of view. What the creators of Epic 2014 seemed to be imagining is a system capable of really really bad writing. We already have that. It’s called Racter. It came out years ago. The Google people are thinking of creating a better Racter, essentially. The chilling thing about that is that it will fool a lot of people, whose lives will be a little less rich for it.

I think the only way we can get to an interesting artificial intelligence is to create conditions for certain interesting phenomena of intelligence to emerge and self-organize in some sort of highly connectionist networked soup of neuron-like agents. We won’t know if it really is “human-like”, except perhaps after a long period of testing, but growing it will have to be a delicate and buggy process, for the same reason that complex software development is complex and buggy. Just like Hal in 2001, maybe it’s really smart, or maybe it’s really crazy and tells lies. Call in the testers, please.

(When Hal claimed in the movie that no 9000 series computers had ever made an error, I was ready to reboot him right then.)

No, you say? You will assemble the intelligence out of trillions of identical simple components and let nature and data stimulation build the intelligence automatically? Well, that’s how evolution works, and look how buggy THAT is! Look how long it takes. Look at how narrow the intelligences are that it has created. And if we turn a narrow and simplistic intelligence to the task of redesigning itself, why suppose that it is more likely to do a good job than a terrible job?

Although humans have written programs, no program yet has written a human. There’s a reason for that. Humans are oodles more sophisticated than programs. So, the master program that threatens to take over humanity would require an even more masterful program to debug itself with. But there can’t be one, because THAT program would require a program to debug itself… and so on.

The Complexity Barrier

So, I predict that the singularity will be drowned and defeated by what might be called the Complexity Barrier. The more complex the technology, the more prone to breakdown. In fact much of the “progress” of technology seems to be accompanied by a process of training humans to accept increasingly fragile technology. I predict that we will discover that the amount of energy and resources needed to surmount the complexity barrier will approach infinity.

In the future, technology will be like weather. We will be able to predict it somewhat, but things will go mysteriously wrong on a regular basis. Things fall apart; the CPU will not hold.

Until I see a workable test plan for the Singularity, I can’t take it seriously.

A Question About Test Strategy

Maria writes:

A) Your presentation Test Strategy: “What is it? What does it look like?” applies to creating a test strategy for a specific application (I’ve also read Ingrid B. Ottevanger’s article on “A Risk-Based Test Strategy”). How can I apply the idea to an overall test strategy for the company that I’m working for? Is it possible to create a really good test strategy for a company so that it covers several diverse applications? Having difficulties in finding a way to create a non-poorly stated strategy from which we can create an efficient test process it leaves me with another question: “How do we make a clear line between the overall test strategy and the company test process?”

B) The precondition for this activity is unfortunately not the best leaving us with a tight time schedule and very little time to do a thorough work. My concern is that neither the strategy nor the test process will actually be something possible to use, leaving us with as you say “A string of test technique buzzwords.” So how can I argue that the test strategy and the test process are not just two documents that we have to create but it’s the thoughts behind the documents that are important.

Test strategy is an important yet little-described aspect of test methodology. Let me introduce three definitions:

Test Plan: the set of ideas that guide a test project

Test Strategy: the set of ideas that guide test design

Test Logistics: the set of ideas that guide the application of resources to fulfill a test strategy

I find these ideas to be a useful jumping off point. Here are some implications:

  • The test plan is the sum of test strategy and test logistics.
  • The test plan document does not necessarily contain a test plan. This is because many test plan documents are created by people who are following templates without understanding them, or writing things to please their bosses, without knowing how to fulfill their promises, or simply because it once was a genuine test plan but now is obsolete.
  • Conversely, a genuine test plan is not necessarily documented. This is because new ideas may occur to you each day that change how you test. In my career, I have mostly operated without written test plans.

One quick way to think about test strategy is to realize that testing is (usually) a process of constructing an explanation of the status of the product. Therefore, the ideas that should guide our testing are those that relate to the marshalling of evidence for that explanation.

Here’s an example: Let’s say I need to test Inkscape, an open source painting program. The people I work for want this product to be a viable alternative to Adobe Photoshop.

This leads to an overarching question for the testing: “Given the general capabilities of Inkscape, is the program sufficiently reliable and are its capabilities well enough deployed that a serious user would consider that Inkscape is a viable alternative to Photoshop?” This is a question about the status of Inkscape. Answering it is not merely a processing of determining yes or no, because, as a tester, I must supply an explanation that justifies my answer.

Working backwards, I would have to do something like the following:

  1. Catalog the capabilities of Inkscape and match them to Photoshop.
  2. Determine the major objectives users might have in using a paint program, as well as various kinds of users.
  3. Learn about the product. Get a feel for it in terms of its structures, functions, data, and platforms.
  4. List the general kinds of problems that might occur in the product, based on my knowledge of the technology and the users.
  5. Decide which parts of the product are more likely to fail and/or are more important. Focus more attention on those areas.
  6. Determine what kinds of operations I need to do and which systematic observations I need to make in order to detect problems in the product (area by area and capability by capability) and compare it to Photoshop. (Here’s where I would also apply a variety of test techniques.)
  7. Carry out those test activities and repeat as necessary.
  8. Consider testability and automation as I go.

In doing these things, I would be gathering the evidence I need to argue for the specific ways in which Inkscape does or does not stand up to Photoshop.

Company-wide Test Strategy

In my way of thinking, a good test strategy is product specific. You can have a generic test strategy, but since you don’t test generic products, but only specific products, it will become better when it is made specific to what you are testing at the moment.

Perhaps what you are talking about is a strategy that relates to what you need for a test lab infrastructure, or for developing the appropriate product-specific test skills? Or perhaps you are thinking of creating materials to aid test leads in producing specific test strategies?

If so, one thing to consider is a risk catalog (aka bug taxonomy). A risk catalog is an outline of the kinds of problems that typically occur in products that use a particular technology. You can create one of these based on your experience testing a product, then reuse it for any other similar product.

Company Test Process

I suggest using the term methodology instead of process. “Process” is the way things happen. “Methodology” is a system of methods. You use a methodology; but you participate in a process. When you have an idea and say, “I think I’ll try doing that” the idea itself is probably a method. When you do it, it becomes a practice (a practice is something that you actually do), and therefore it influences the process.

I use these words carefully because process is a very rich and complex reality that I do not want to oversimplify. For instance, it may be a part of your methodology to create a test plan, but at the same time it may be a genuine part of your process that your test plan document is ignored. “Ignore the test plan document” is not going to be written down in anyone’s methodology, yet it can be an important part of the process, especially if the test plan document is full of bad ideas.

The dividing line between test strategy and test methodology is not hard to find, I think. A test strategy is product specific, and a test methodology is not. Another important element you haven’t mentioned is test skill. Your methodology is useless without skilled testers to apply it.

I would suggest that a more important dividing line for you to consider is the line between skill and method. How much do you rely on skilled people to select the right thing to do next, and how much are you trying to program them with methodology? Many process people get this all mixed up. They treat testers as if they are children, or idiots, trying to dictate solutions to problems instead of letting the testers solve the problems for themselves. What the process people then get is either bad testing, or, hopefully, their methodology is ignored and the testers do a good job anyway.

When I develop a test methodology, as I have done for a number of companies, I focus on training the testers and on creating a systematic training and coaching mechanism. That way the methodology documentation is much thinner and less expensive to maintain.

Surprise Heuristic

At the recent Workshop on Training Software Testers, Morven Gentleman showed us a chart of some test results. I was surprised to see a certain pattern in the results. I began to think of new and better tests to probe the phenomenon.

Morven told us that the tester who produced that chart did not see anything strange about it. This intrigued me. Why did Morven and I see something worth investigation when the tester did not?

Then I stopped myself and tried to discover my own thought process on this. A few minutes later this exploratory testing heuristic came to mind:

I MAKE AN OBSERVATION DURING A TEST…

1. I experience surprise associated with a pattern within the observation.

That triggers REFLECTION about PLAUSIBILITY…

2. The pattern seems implausible relative to my current model of the phenomenon.

That triggers REFLECTION about RISK…

3. I can bring to mind a risk associated with that implausible pattern.

That triggers REFLECTION on MAGNITUDE OF RISK…

4. The risk seems important.

That triggers TEST REDESIGN…

Now, I don’t really know if this is my thought process, but it’s a pattern I might be able to use to explain to new testers how surprise can be a test tool.

How to Investigate Intermittent Problems

The ability and the confidence to investigate an intermittent bug is one of the things that marks an excellent tester. The most engaging stories about testing I have heard have been stories about hunting a “white whale” sort of problem in an ocean of complexity. Recently, a thread on the SHAPE forum made me realized that I had not yet written about this fascinating aspect of software testing.

Unlike a mysterious non-intermittent bug, an intermittent bug is more of a testing problem than a development problem. A lot of programmers will not want to chase that white whale, when there’s other fishing to do.

Intermittent behavior itself is no big deal. It could be said that digital computing is all about the control of intermittent behavior. So, what are we really talking about?

We are not concerned about intermittence that is both desirable and non-mysterious, even if it isn’t exactly predictable. Think of a coin toss at the start of a football game, or a slot machine that comes up all 7’s once in a long while. We are not even concerned about mysterious intermittent behavior if we believe it can’t possibly cause a problem. For the things I test, I don’t care much about transient magnetic fields or minor random power spikes, even though they are happening all the time.

Many intermittent problems have not yet been observed at all, perhaps because they haven’t manifested, yet, or perhaps because they have manifested and not yet been noticed. The only thing we can do about that is to get the best test coverage we can and keep at it. No algorithm can exist for automatically detecting or preventing all intermittent problems.

So, what we typically call an intermittent problem is: a mysterious and undesirable behavior of a system, observed at least once, that we cannot yet manifest on demand.

Our challenge is to transform the intermittent bug into a regular bug by resolving the mystery surrounding it. After that it’s the programmer’s headache.

Some Principles of Intermittent Problems:

  • Be comforted: the cause is probably not evil spirits.
  • If it happened once, it will probably happen again.
  • If a bug goes away without being fixed, it probably didn’t go away for good.
  • Be wary of any fix made to an intermittent bug. By definition, a fixed bug and an unfixed intermittent bug are indistinguishable over some period of time and/or input space.
  • Any software state that takes a long time to occur, under normal circumstances, can also be reached instantly, by unforeseen circumstances.
  • Complex and baffling behavior often has a simple underlying cause.
  • Complex and baffling behavior sometimes has a complex set of causes.
  • Intermittent problems often teach you something profound about your product.
  • It’s easy to fall in love with a theory of a problem that is sensible, clever, wise, and just happens to be wrong.
  • The key to your mystery might be resting in someone else’s common knowledge.
  • An intermittent problem in the lab might be easily reproducible in the field.
  • The Pentium Principle of 1994: an intermittent technical problem may pose a *sustained and expensive* public relations problem.
  • The problem may be intermittent, but the risk of that problem is ever present.
  • The more testability is designed into a product, the easier it is to investigate and solve intermittent problems.
  • When you have eliminated the impossible, whatever remains, however improbable, could have done a lot of damage by then! So, don’t wait until you’ve fully researched an intermittent problem before you report it.
  • If you ever get in trouble an intermittent problem that you could not lock down before release, you will fare a lot better if you made a faithful, thoughtful, vigorous effort to find and fix it. The journey can be the reward, you might say.

Some General Suggestions for Investigating Intermittent Problems:

  • Recheck your most basic assumptions: are you using the computer you think you are using? are you testing what you think you are testing? are you observing what you think you are observing?
  • Eyewitness reports leave out a lot of potentially vital information. So listen, but DO NOT BECOME ATTACHED to the claims people make.
  • Invite more observers and minds into the investigation.
  • Create incentives for people to report intermittent problems.
  • If someone tells you what the problem can’t possibly be, consider putting extra attention into those possibilities.
  • Check tech support websites for each third party component you use. Maybe the problem is listed.
  • Seek tools that could help you observe and control the system.
  • Improve communication among observers (especially with observers who are users in the field).
  • Establish a central clearinghouse for mystery bugs, so that patterns among them might be easier to spot.
  • Look through the bug list for any other bug that seems like the intermittent problem.
  • Make more precise observations (consider using measuring instruments).
  • Improve testability: Add more logging and scriptable interfaces.
  • Control inputs more precisely (including sequences, timing, types, sizes, sources, iterations, combinations).
  • Control state more precisely (find ways to return to known states).
  • Systematically cover the input and state spaces.
  • Save all log files. Someday you’ll want to compare patterns in old logs to patterns in new ones.
  • If the problem happens more often in some situations than in others, consider doing a statistical analysis of the variance between input patterns in those situations.
  • Consider controlling things that you think probably don’t matter.
  • Simplify. Try changing only one variable at a time; try subdividing the system. (helps you understand and isolate problem when it occurs)
  • Complexify. Try changing more variables at once; let the state get “dirty”. (helps you make a lottery-type problem happen)
  • Inject randomness into states and inputs (possibly by loosening controls) in order to reach states that may not fit your typical usage profile.
  • Create background stress (high loads; large data).
  • Set a trap for the problem, so that the next time it happens, you’ll learn much more about it.
  • Consider reviewing the code.
  • Look for interference among components created by different organizations.
  • Celebrate and preserve stories about intermittent problems and how they were resolved.
  • Systematically consider the conceivable causes of the problem (see below).
  • Beware of burning huge time on a small problem. Keep asking, is this problem worth it?
  • When all else fails, let the problem sit a while, do something else, and see if it spontaneously recurs.

Considering the Causes of Intermittent Problems

When investigating an intermittent problem, it maybe worth considering the kinds of things that cause such problems. The list of guideword heuristics below may help you systematically do that analysis. There is some redundancy among the items in the list, because causes can be viewed from different perspectives.

Possibility 1: The system is NOT behaving differently. The apparent intermittence is an artifact of the observation.

  • Bad observation: The observer may have made a poor observation. (e.g. “Innattentional Blindness” is a phenomena whereby an observer whose mind is occupied may not see things that are in plain view. When presented with the scene a second time, the observer may see new things in the scene and assume that they weren’t there, before. Also, certain optical illusions cause apparently intermittent behavior in an unchanging scene. See “the scintillating grid”)
  • Irrelevant observation: The observer may be looking at differences that don’t matter. The things that matter may not be intermittent. This can happen when an observation is too precise for its purpose.
  • Bad memory: The observer may have mis-remembered the observation, or records of the observation could have been corrupted. (There’s a lot to observe when we observe! Our mind immediately compact the data and relate it to other data. Important data may be edited out. Besides, a lot of system development and testing involve highly repetitive observations, and we sometimes get them mixed up.)
  • Misattribution: The observer may have mis-attributed the observation. (“Microsoft Word crashed” might mean that *Windows* crashed for a reason that had nothing whatsoever to do with Word. Word didn’t “do” anything. This is a phenomenon also known as “false correlation” and often occurs in the mind of an observer when one event follows hard on the heels of another event, making one appear to be caused by the other. False correlation is also chiefly responsible for many instances whereby an intermittent problem is mistakenly construed to be a non-intermittent problem with a very complex and unlikely set of causes)
  • Misrepresentation: The observer may have misrepresented the observation. (There are various reasons for this. An innocent reason is that the observer is so confident in an inference that they have the honest impression that they did observe it and report it as such. I once asked my son if his malfunctioning Playstation was plugged in. “Yes!” he said impatiently. After some more troubleshooting, I had just concluded that the power supply was shot when I looked down and saw that it was obviously not plugged in.)
  • Unreliable oracle: The observer may be applying an intermittent standard for what constitutes a “problem.” (We may get the impression that a problem is intermittent only because some people, some of the time, don’t consider the behavior to be a problem, even if the behavior is itself predictable. Different observers may have different tolerances and sensitivities; and the same observer may vary in that way from one hour to the next.)
  • Unreliable communication: Communication with the observer may be inconsistent. (We may get the impression that a problem is intermittent simply because reports about it don’t consistently reach us, even if the problem is itself quite predictable. “I guess people aren’t seeing the problem anymore” may simply mean that people no longer bother to complain.)

Possibility 2: The system behaved differently because it was a different system.

  • Deus ex machina: A developer may have changed it on purpose, and then changed it back. (This can occur easily when multiple developers or teams are simultaneously building or servicing different parts of an operational server platform without coordinating with each other. Another possibility, of course, is that the system has been modified by a malicious hacker.)
  • Accidental change: A developer may be making accidental changes. (The changes may have unanticipated side effects, leading to the intermittent behavior. Also, a developer may be unwittingly changing a live server instead of a sandbox system.)
  • Platform change: A platform component may have been swapped or reconfigured. (An administrator or user may have changed, intentionally or not, a component on which the product depends. Common sources of these problems include Windows automatic updates, memory and disk space reconfigurations.)
  • Flakey hardware: A physical component may have transiently malfunctioned. (Transient malfunctions may be due factors such as inherent natural variation, magnetic fields, excessive heat or cold, battery low conditions, poor maintenance, or physical shock.)
  • Trespassing system: A foreign system may be intruding. (For instance, in web testing, I might get occasionally incorrect results due to a proxy server somewhere at my ISP that provides a cached version of pages when it shouldn’t. Other examples are background virus scans, automatic system updates, other programs, or other instances of the same program.)
  • Executable corruption: The object code may have become corrupted. (One of the worst bugs I ever created in my own code (in terms of how hard it was to find) involved machine code in a video game that occasionally wrote data over a completely unrelated part of the same program. Because of the nature of that data, the system didn’t crash, but rather the newly corrupted function passed control to the function that immediately followed it in memory. Took me days (and a chip emulator) to figure it out.)
  • Split personality: The “system” may actually be several different systems that perform as one. (For instance, I may get inconsistent results from Google depending on which Google server I happen to get; or I might not realize that different machines in the test lab have different versions of some key component; or I might mistype a URL and accidentally test on the wrong server some of the time.)
  • Human element: There may be a human in the system, making part of it run, and that human is behaving inconsistently.

Possibility 3: The system behaved differently because it was in a different state.

  • Frozen conditional: A decision that is supposed to be based on the status of a condition may have stopped checking that condition. (It could be stuck in an “always yes” or “always no” state.)
  • Improper initialization: One or more variables may not have been initialized. (The starting state of a computation would therefore depend on the state of some previous computation of the same or other function.)
  • Resource denial: A critical file, stream, or other variable may not be available to the system. (This could happen either because the object does not exist, has become corrupted, or is locked by another process.)
  • Progressive data corruption: A bad state may have slowly evolved from a good state by small errors propagating over time. (Examples include timing loops that are slightly off, or rounding errors in complicated or reflexive calculations.)
  • Progressive destabilization: There may be a classic multi-stage failure. (The first part of the bug creates an unstable state– such as a wild pointer– when a certain event occurs, but without any visible or obvious failure. The second part precipitates a visible failure at a later time based on the unstable state in combination with some other condition that occurs down the line. The lag time between the destabilizing event and the precipitating event makes it difficult to associate the two events to the same bug.)
  • Overflow: Some container may have filled to beyond its capacity, triggering a failure or an exception handler. (In an era of large memories and mass storage, overflow testing is often shortchanged. Even if the condition is properly handled, the process of handling it may interact with other functions of the system to cause an emergent intermittent problem.)
  • Occasional functions: Some functions of a system may be invoked so infrequently that we forget about them. (These include exception handlers, internal garbage collection functions, auto-save, and periodic maintenance functions. These functions, when invoked, may interact in unexpected ways with other functions or conditions of the system. Be especially wary of silent and automatic functions.)
  • Different mode or option setting: The system can be run in a variety of modes and the user may have set a different mode. (The new mode may not be obviously different from the old one.)

Possibility 4: The system behaved differently because it was given different input.

  • Accidental input: User may have provided input or changed the input in a way that shouldn’t have mattered, yet did. (This might also be called the Clever Hans syndrome, after the mysteriously repeatable ability of Clever Hans, the horse, to perform math problems. It was eventually discovered by Oskar Pfungst that the horse was responding to subtle physical cues that its owner was unintentionally conveying. In the computing world, I once experienced an intermittent problem due to sunlight coming through my office window and hitting an optical sensor in my mouse. The weather conditions outside shouldn’t have constituted different input, but they did. Another more common example is different behavior that may occur when using the keyboard instead of mouse to enter commands. The accidental input might be invisible unless you use special tools or recorders. For instance, two identical texts, one saved in RTF format from Microsoft Word and one saved in RTF format from Wordpad, will be very similar on the disk but not exactly identical.)
  • Secret boundaries and conditions: The software may behave differently in some parts of the input space than it does in others. (There maybe hidden boundaries, or regions of failure, that aren’t documented or anticipated in your mental model of the product. I once tested a search routine that invoked different logic when the total returned hits were =1000 and = 50,000. Only by accident did I discover these undocumented boundaries.)
  • Different profile: Some users may have different profiles of use than other users. (Different biases in input will lead to different experiences of output. Users with certain backgrounds, such as programmers, may be systematically more or less likely to experience, or notice, certain behaviors.)
  • Ghost input: Some other machine-based source than the user may have provided different input. (Such input is often invisible to the user. This includes variations due to different files, different signals from peripherals, or different data coming over the network.)
  • Deus Ex Machina: A third party may be interacting with the product at the same time as the user. (This maybe a fellow tester, friendly user, or a malicious hacker.)
  • Compromised input: Input may have been corrupted or intercepted on its way into the system. (Especially a concern in client-server systems.)
  • Time as input: Intermittence over time may be due to time itself. (Time is the one thing that constantly changes, no matter whatever else you control. Whenever time and date, or time and date intervals, are used as input, bugs in that functionality may appear at some times but not others.)
  • Timing lottery: Variations in input that normally don’t matter may matter at certain times or at certain loads. (The Mars Rover suffered from a problem like this involving a three microsecond window of vulnerability when a write operation could write to a protected part of memory.)
  • Combination lottery: Variations in input that normally don’t matter may matter when combined in a certain way.

Possibility 5: The other possibilities are magnified because your mental model of the system and what influences it is incorrect or incomplete in some important way.

  • You may not be aware of each variable that influences the system.
  • You may not be aware of sources of distortion in your observations.
  • You may not be aware of available tools that might help you understand or observe the system.
  • You may not be aware of the all the boundaries of the system and all the characteristics of those boundaries.
  • The system may not actually have a function that you think it has; or maybe it has extra functions.
  • A complex algorithm may behave in a surprising way, intermittently, that is entirely correct (e.g. mathematical chaos can look like random behavior).

Testing Heuristic: Rumble Strip

A rumble strip is a strip of corrugated pavement running alongside highways. If you go to sleep and and drift off the road, the wheels of your car hit the rumble strips and that makes the car vibrate with a loud BUR-R-R-R-R-R. A rumble strip doesn’t damage your car, but it’s an alarming event all the same, because it means that you’re about to have a major accident if you don’t do something to get back on the road.

Programs are like cars driven by notes left by an absent programmer. For testers, hearing the rumble strip means a good bug is literally around the corner.

The rumble strip heuristic in testing says that when you’re testing and you see the product do strange things (especially when it wasn’t doing those strange things just before) that could indicate a big disaster is about to happen. By “strange” I mean behaviors that you are confident the programmer did not intend. For instance, I was testing an input field once and at 23,829 characters all the text turned white. This is a strange bug, but mainly it served as a rumble strip under the tires of the absent programmer. I knew the program was leaving the highway, so I kept pushing with my tests until I got it to crash.

Obvious/Oblivious

(Revised: 1/2015)

As a process guy, I often hear the term “common sense”. It’s a strange term. Like “luck” it’s a concept that generally brings an end to inquiry, because it’s a reference to magic. “That’s just common sense” seems to be a signal that “I haven’t thought about that, and I’m not prepared to talk about it, either.”

Same with the word “obvious”, sometimes. When you ask a technical question about a product and a developer says that the answer is obvious, you can generally count on that to mean that he hasn’t discussed the matter with anyone. Not discussing it, the team is likely to be oblivious to risks associated with it, if there are any.

It is the tester’s duty to keep an eye on things that seem unimportant, yet might be important after all.

Just because something about a product is obvious to one person, doesn’t mean it’s obvious in the same way to everyone. It may be especially non-obvious to a user.

The obvious leads to the oblivious, and that makes me nervious.