Rethinking Equivalence Class Partitioning, Part 1

Wikipedia’s article on equivalence class partitioning (ECP) is a great example of the poor thinking and teaching and writing that often passes for wisdom in the testing field. It’s narrow and misleading, serving to imply that testing is some little game we play with our software, rather than an open investigation of a complex phenomenon.

(No, I’m not going to edit that article. I don’t find it fun or rewarding to offer my expertise in return for arguments with anonymous amateurs. Wikipedia is important because it serves as a nearly universal reference point when criticizing popular knowledge, but just like popular knowledge itself, it is not fixable. The populus will always prevail, and the populus is not very thoughtful.)

In this article I will comment on the Wikipedia post. In a subsequent post I will describe ECP my way, and you can decide for yourself if that is better than Wikipedia.

“Equivalence partitioning or equivalence class partitioning (ECP)[1] is a software testing technique that divides the input data of a software unit into partitions of equivalent data from which test cases can be derived.”

Not exactly. There’s no reason why ECP should be limited to “input data” as such. The ECP thought process may be applied to output, or even versions of products, test environments, or test cases themselves. ECP applies to anything you might be considering to do that involves any variations that may influence the outcome of a test.

Yes, ECP is a technique, but a better word for it is “heuristic.” A heuristic is a fallible method of solving a problem. ECP is extremely fallible, and yet useful.

“In principle, test cases are designed to cover each partition at least once. This technique tries to define test cases that uncover classes of errors, thereby reducing the total number of test cases that must be developed.”

This text is pretty good. Note the phrase “In principle” and the use of the word “tries.” These are softening words, which are important because ECP is a heuristic, not an algorithm.

Speaking in terms of “test cases that must be developed,” however, is a misleading way to discuss testing. Testing is not about creating test cases. It is for damn sure not about the number of test cases you create. Testing is about performing experiments. And the totality of experimentation goes far beyond such questions as “what test case should I develop next?” The text should instead say “reducing test effort.”

“An advantage of this approach is reduction in the time required for testing a software due to lesser number of test cases.”

Sorry, no. The advantage of ECP is not in reducing the number of test cases. Nor is it even about reducing test effort, as such (even though it is true that ECP is “trying” to reduce test effort). ECP is just a way to systematically guess where the bigger bugs probably are, which helps you focus your efforts. ECP is a prioritization technique. It also helps you explain and defend those choices. Better prioritization does not, by itself, allow you to test with less effort, but we do want to stumble into the big bugs sooner rather than later. And we want to stumble into them with more purpose and less stumbling. And if we do that well, we will feel comfortable spending less effort on the testing. Reducing effort is really a side effect of ECP.

“Equivalence partitioning is typically applied to the inputs of a tested component, but may be applied to the outputs in rare cases. The equivalence partitions are usually derived from the requirements specification for input attributes that influence the processing of the test object.”

Typically? Usually? Has this writer done any sort of research that would substantiate that? No.

ECP is a process that we all do informally, not only in testing but in our daily lives. When you push open a door, do you consciously decide to push on a specific square centimeter of the metal push plate? No, you don’t. You know that for most doors it doesn’t matter where you push. All pushable places are more or less equivalent. That is ECP! We apply ECP to anything that we interact with.

Yes, we apply it to output. And yes, we can think of equivalence classes based on specifications, but we also think of them based on all other learning we do about the software. We perform ECP based on all that we know. If what we know is wrong (for instance if there are unexpected bugs) then our equivalence classes will also be wrong. But that’s okay, if you understand that ECP is a heuristic and not a golden ticket to perfect testing.

“The fundamental concept of ECP comes from equivalence class which in turn comes from equivalence relation. A software system is in effect a computable function implemented as an algorithm in some implementation programming language. Given an input test vector some instructions of that algorithm get covered, ( see code coverage for details ) others do not…”

At this point the article becomes Computer Science propaganda. This is why we can’t have nice things in testing: as soon as the CS people get hold of it, they turn it into a little logic game for gifted kids, rather than a pursuit worthy of adults charged with discovering important problems in technology before it’s too late.

The fundamental concept of ECP has nothing to do with computer science or computability. It has to do with logic. Logic predates computers. An equivalence class is simply a set. It is a set of things that share some property. The property of interest in ECP is utility for exploring a particular product risk. In other words, an equivalence class in testing is an assertion that any member of that particular group of things would be more or less equally able to reveal a particular kind of bug if it were employed in a particular kind of test.

If I define a “test condition” as something about a product or its environment that could be examined in a test, then I can define equivalence classes like this: An equivalence class is a set of tests or test conditions that are equivalent with respect to a particular product risk, in a particular context. 

This implies that two inputs which are not equivalent for the purposes of one kind of bug may be equivalent for finding another kind of bug. It also implies that if we model a product incorrectly, we will also be unable to know the true equivalence classes. Actually, considering that bugs come in all shapes and sizes, to have the perfectly correct set of equivalence classes would be the same as knowing, without having tested, where all the bugs in the product are. This is because ECP is based on guessing what kind of bugs are in the product.

If you read the technical stuff about Computer Science in the Wikipedia article, you will see that the author has decided that two inputs which cover the same code are therefore equivalent for bug finding purposes. But this is not remotely true! This is a fantasy propagated by people who I suspect have never tested anything that mattered. Off the top of my head, code-coverage-as-gold-standard ignores performance bugs, requirements bugs, usability bugs, data type bugs, security bugs, and integration bugs. Imagine two tests that cover the same code, and both involve input that is displayed on the screen, except that one includes an input which is so long that when it prints it goes off the edge of the screen. This is a bug that the short input didn’t find, even though both inputs are “valid” and “do the same thing” functionally.

The Fundamental Problem With Most Testing Advice Is…

The problem with most testing advice is that it is either uncritical folklore that falls apart as soon as you examine it, or else it is misplaced formalism that doesn’t apply to realistic open-ended problems. Testing advice is better when it is grounded in a general systems perspective as well as a social science perspective. Both of these perspectives understand and use heuristics. ECP is a powerful, ubiquitous, and rather simple heuristic, whose utility comes from and is limited by your mental model of the product. In my next post, I will walk through an example of how I use it in real life.

Accountability for What You Say is Dangerous and That’s Okay

[Note: I offered Maaret Pyhäjärvi the right to review this post and suggest edits to it before I published it. She declined.]

A few days ago I was keynoting at the New Testing Conference, in New York City, and I used a slide that has offended some people on Twitter. This blog post is intended to explore that and hopefully improve the chances that if you think I’m a bad guy, you are thinking that for the right reasons and not making a mistake. It’s never fun for me to be a part of something that brings pain to other people. I believe my actions were correct, yet still I am sorry that I caused Maaret hurt, and I will try to think of ways to confer better in the future.

Here’s the theme of this post: Getting up in front of the world to speak your mind is a dangerous process. You will be misunderstood, and that will feel icky. Whether or not you think of yourself as a leader, speaking at a conference IS an act of leadership, and leadership carries certain responsibilities.

I long ago learned to let go of the outcome when I speak in public. I throw the ideas out there, and I do that as an American Aging Overweight Left-Handed Atheist Married Father-And-Father-Figure Rough-Mannered Bearded Male Combative Aggressive Assertive High School Dropout Self-Confident Freedom-Loving Sometimes-Unpleasant-To-People-On-Twitter Intellectual. I know that my ideas will not be considered in a neutral context, but rather in the context of how people feel about all that. I accept that.  But, I have been popular and successful as a speaker in the testing world, so maybe, despite all the difficulties, enough of my message and intent gets through, overall.

What I can’t let go of is my responsibility to my audience and the community at large to speak the truth and to do so in a compassionate and reasonable way. Regardless of what anyone else does with our words, I believe we speakers need to think about how our actions help or harm others. I think a lot about this.

Let me clarify. I’m not saying it’s wrong to upset people or to have disagreement. We have several different culture wars (my reviewers said “do you have to say wars?”) going on in the software development and testing worlds right now, and they must continue or be resolved organically in the marketplace of ideas. What I’m saying is that anyone who speaks out publicly must try to be cognizant of what words do and accept the right of others to react.

Although I’m surprised and certainly annoyed by the dark interpretations some people are making of what I did, the burden of such feelings is what I took on when I first put myself forward as a public scold about testing and software engineering, a quarter century ago. My annoyance about being darkly interpreted is not your problem. Your problem, assuming you are reading this and are interested in the state of the testing craft, is to feel what you feel and think what you think, then react as best fits your conscience. Then I listen and try to debug the situation, including helping you debug yourself while I debug myself. This process drives the evolution of our communities. Jay Philips, Ash Coleman, Mike Talks, Ilari Henrik Aegerter, Keith Klain, Anna Royzman, Anne-Marie Charrett, David Greenlees, Aaron Hodder, Michael Bolton, and my own wife all approached me with reactions that helped me write this post. Some others approached me with reactions that weren’t as helpful, and that’s okay, too.

Leadership and The Right of Responding to Leaders

In my code of conduct, I don’t get to say “I’m not a leader.” I can say no one works for me and no one has elected me, but there is more to leadership than that. People with strong voices and ideas gain a certain amount of influence simply by virtue of being interesting. I made myself interesting, and some people want to hear what I have to say. But that comes with an implied condition that I behave reasonably. The community, over time negotiates what “reasonable” means. I am both a participant and a subject of those negotiations. I recommend that we hold each other accountable for our public, professional words. I accept accountability for mine. I insist that this is true for everyone else. Please join me in that insistence.

People who speak at conferences are tacitly asserting that they are thought leaders– that they deserve to influence the community. If that influence comes with a rule that “you can’t talk about me without my permission” it would have a chilling effect on progress. You can keep to yourself, of course; but if you exercise your power of speech in a public forum you cannot cry foul when someone responds to you. Please join me in my affirmation that we all have the right of response when a speaker takes the microphone to keynote at a conference.

Some people have pointed out that it’s not okay to talk back to performers in a comedy show or Broadway play. Okay. So is that what a conference is to you? I guess I believe that conferences should not be for show. Conferences are places for conferring. However, I can accept that some parts of a conference might be run like infomercials or circus acts. There could be a place for that.

The Slide

Here is the slide I used the other day:


Before I explain this slide, try to think what it might mean. What might its purposes be? That’s going to be difficult, without more information about the conference and the talks that happened there. Here are some things I imagine may be going through your mind:

  • There is someone whose name is Maaret who James thinks he’s different from.
  • He doesn’t trust nice people. Nice people are false. Is Maaret nice and therefore he doesn’t trust her, or does Maaret trust nice people and therefore James worries that she’s putting herself at risk?
  • Is James saying that niceness is always false? That’s seems wrong. I have been nice to people whom I genuinely adore.
  • Is he saying that it is sometimes false? I have smiled and shook hands with people I don’t respect, so, yes, niceness can be false. But not necessarily. Why didn’t he put qualifying language there?
  • He likes debate and he thinks that Maaret doesn’t? Maybe she just doesn’t like bad debate. Did she actually say she doesn’t like debate?
  • What if I don’t like debate, does that mean I’m not part of this community?
  • He thinks excellence requires attention and energy and she doesn’t?
  • Why is James picking on Maaret?

Look, if all I saw was this slide, I might be upset, too. So, whatever your impression is, I will explain the slide.

Like I said I was speaking at a conference in NYC. Also keynoting was Maaret Pyhäjärvi. We were both speaking about the testing role. I have some strong disagreements with Maaret about the social situation of testers. But as I watched her talk, I was a little surprised at how I agreed with the text and basic concepts of most of Maaret’s actual slides, and a lot of what she said. (I was surprised because Maaret and I have a history. We have clashed in person and on Twitter.) I was a bit worried that some of what I was going to say would seem like a rehash of what she just did, and I didn’t want to seem like I was papering over the serious differences between us. That’s why I decided to add a contrast slide to make sure our differences weren’t lost in the noise. This means a slide that highlights differences, instead of points of connection. There were already too many points of connection.

The slide was designed specifically:

  • for people to see who were in a specific room at a specific time.
  • for people who had just seen a talk by Maaret which established the basis of the contrast I was making.
  • about differences between two people who are both in the spotlight of public discourse.
  • to express views related to technical culture, not general social culture.
  • to highlight the difference between two talks for people who were about to see the second talk that might seem similar to the first talk.
  • for a situation where both I and Maaret were present in the room during the only time that this slide would ever be seen (unless someone tweeted it to people who would certainly not understand the context).
  • as talking points to accompany my live explanation (which is on video and I assume will be public, someday).
  • for a situation where I had invited anyone in the audience, including Maaret, to ask me questions or make challenges.

These people had just seen Maaret’s talk and were about to see mine. In the room, I explained the slide and took questions about it. Maaret herself spoke up about it, for which I publicly thanked her for doing so. It wasn’t something I was posting with no explanation or context. Nor was it part of the normal slides of my keynote.

Now I will address some specific issues that came up on Twitter:

1. On Naming Maaret

Maaret has expressed the belief that no one should name another person in their talk without getting their permission first. I vigorously oppose that notion. It’s completely contrary to the workings of a healthy society. If that principle is acceptable, then you must agree that there should be no free press. Instead, I would say if you stand up and speak in the guise of an expert, then you must be personally accountable for what you say. You are fair game to be named and critiqued. And the weird thing is that Maaret herself, regardless of what she claims to believe, behaves according to my principle of freedom to call people out. She, herself, tweeted my slide and talked about me on Twitter without my permission. Of course, I think that is perfectly acceptable behavior, so I’m not complaining. But it does seem to illustrate that community discourse is more complicated than “be nice” or “never cause someone else trouble with your speech” or “don’t talk about people publicly unless they gave you permission.”

2. On Being Nice

Maaret had a slide in her talk about how we can be kind to each other even though we disagree. I remember her saying the word “nice” but she may have said “kind” and I translated that into “nice” because I believed that’s what she meant. I react to that because, as a person who believes in the importance of integrity and debate over getting along for the sake of appearances, I observe that exhortations to “be nice” or even to “be kind” are often used when people want to quash disturbing ideas and quash the people who offer them. “Be nice” is often code for “stop arguing.” If I stop arguing, much of my voice goes away. I’m not okay with that. No one who believes there is trouble in the world should be okay with that. Each of us gets to have a voice.

I make protests about things that matter to me, you make protests about things that matter to you.

I think we need a way of working together that encourages debate while fostering compassion for each other. I use the word compassion because I want to get away from ritualized command phrases like “be nice.” Compassion is a feeling that you cultivate, rather than a behavior that you conform to or simulate. Compassion is an antithesis of “Rules of Order” and other lists of commandments about courtesy. Compassion is real. Throughout my entire body of work you will find that I promote real craftsmanship over just following instructions. My concern about “niceness” is the same kind of thing.

Look at what I wrote: I said “I don’t trust nice people.” That’s a statement about my feelings and it is generally true, all things being equal. I said “I’m not nice.” Yet, I often behave in pleasant ways, so what did I mean? I meant I seek to behave authentically and compassionately, which looks like “nice” or “kind”, rather than to imagine what behavior would trick people into thinking I am “nice” when indeed I don’t like them. I’m saying people over process, folks.

I was actually not claiming that Maaret is untrustworthy because she is nice, and my words don’t say that. Rather, I was complaining about the implications of following Maaret’s dictum. I was offering an alternative: be authentic and compassionate, then “niceness” and acts of kindness will follow organically. Yes, I do have a worry that Maaret might say something nice to me and I’ll have to wonder “what does that mean? is she serious or just pretending?” Since I don’t want people to worry about whether I am being real, I just tell them “I’m not nice.” If I behave nicely it’s either because I feel genuine good will toward you or because I’m falling down on my responsibility to be honest with you. That second thing happens, but it’s a lapse. (I do try to stay out of rooms with people I don’t respect so that I am not forced to give them opinions they aren’t willing or able to process.)

I now see that my sentence “I want to be authentic and compassionate” could be seen as an independent statement connected to “how I differ from Maaret,” implying that I, unlike her, am authentic and compassionate. That was an errant construction and does not express my intent. The orange text on that line indicated my proposed policy, in the hope that I could persuade her to see it my way. It was not an attack on her. I apologize for that confusion.

3. Debate vs. Dialogue

Maaret had earlier said she doesn’t want debate, but rather dialogue. I have heard this from other Agilists and I find it disturbing. I believe this is code for “I want the freedom to push my ideas on other people without the burden of explaining or defending those ideas.” That’s appropriate for a brainstorming session, but at some point, the brainstorming is done and the judging begins. I believe debate is absolutely required for a healthy professional community. I’m guided in this by dialectical philosophy, the history of scientific progress, the history of civil rights (in fact, all of politics), and the modern adversarial justice system. Look around you. The world is full of heartfelt disagreement. Let’s deal with it. I helped create the culture of small invitational peer conferences in our industry which foster debate. We need those more than ever.

But if you don’t want to deal with it, that’s okay. All that means is that you accept that there is a wall between your friends and those other people whom you refuse to debate with. I will accept the walls if necessary but I would rather resolve the walls. That’s why I open myself and my ideas for debate in public forums.

Debate is not a process of sticking figurative needles into other people. Debate is the exchange of views with the goal of resolving our differences while being accountable for our words and actions. Debate is a learning process. I have occasionally heard from people I think are doing harm to the craft that they believe I debate for the purposes of hurting people instead of trying to find resolution. This is deeply insulting to me, and to anyone who takes his vocation seriously. What’s more, considering that these same people express the view that it’s important to be “nice,” it’s not even nice. Thus, they reveal themselves to be unable to follow their own values. I worry that “Dialogue not debate” is a slogan for just another power group trying to suppress its rivals. Beware the Niceness Gang.

I understand that debating with colleagues may not be fun. But I’m not doing it for fun. I’m doing it because it is my responsibility to build a respectable craft. All testing professionals share this responsibility. Debate serves another purpose, too, managing the boundaries between rival value systems. Through debate we may discover that we occupy completely different paradigms; schools of thought. Debate can’t bridge gaps between entirely different world views, and yet I have a right to my world view just as you have a right to yours.

Jay Philips said on Twitter:

I admire Jay. I called her and we had a satisfying conversation. I filled her in on the context and she advised me to write this post.

One thing that came up is something very important about debate: the status of ideas is not the only thing that gets modified when you debate someone; what also happens is an evolution of feelings.

Yes I think “I’m right.” I acted according to principles I think are eternal and essential to intellectual progress in society. I’m happy with those principles. But I also have compassion for the feelings of others, and those feelings may hold sway even though I may be technically right. For instance, Maaret tweeted my slide without my permission. That is copyright violation. She’s objectively “wrong” to have done that. But that is irrelevant.

[Note: Maaret points out that this is legal under the fair use doctrine. Of course, that is correct. I forgot about fair use. Of course, that doesn’t change the fact that though I may feel annoyed by her selective publishing of my work, that is irrelevant, because I support her option to do that. I don’t think it was wise or helpful for her to do that, but I wouldn’t seek to bar her from doing so. I believe in freedom to communicate, and I would like her to believe in that freedom, too]

I accept that she felt strongly about doing that, so I [would] choose to waive my rights. I feel that people who tweet my slides, in general, are doing a service for the community. So while I appreciate copyright law, I usually feel okay about my stuff getting tweeted.

I hope that Jay got the sense that I care about her feelings. If Maaret were willing to engage with me she would find that I care about her feelings, too. This does not mean she gets whatever she wants, but it’s a factor that influences my behavior. I did offer her the chance to help me edit this post, but again, she refused.

4. Focus and Energy

Maaret said that eliminating the testing role is a good thing. I worry it will lead to the collapse of craftsmanship. She has a slide that says “from tester to team member” which is a sentiment she has expressed on Twitter that led me to say that I no longer consider her a tester. She confirmed to me that I hurt her feelings by saying that, and indeed I felt bad saying it, except that it is an extremely relevant point. What does it mean to be a tester? This is important to debate. Maaret has confirmed publicly (when I asked a question about this during her talk) that she didn’t mean to denigrate testing by dismissing the value of a testing role on projects. But I don’t agree that we can have it both ways. The testing role, I believe, is a necessary prerequisite for maintaining a healthy testing craft. My key concern is the dilution of focus and energy that would otherwise go to improving the testing craft. This is lost when the role is lost.

This is not an attack on Maaret’s morality. I am worried she is promoting too much generalism for the good of the craft, and she is worried I am promoting too much specialism. This is a matter of professional judgment and perspective. It cannot be settled, I think, but it must be aired.

The Slide Should Not Have Been Tweeted But It’s Okay That It Was

I don’t know what Maaret was trying to accomplish by tweeting my slide out of context. Suffice it to say what is right there on my slide: I believe in authenticity and compassion. If she was acting out of authenticity and compassion then more power to her. But the slide cannot be understood in isolation. People who don’t know me, or who have any axe to grind about what I do, are going to cry “what a cruel man!” My friends contacted me to find out more information.

I want you to know that the slide was one part of a bigger picture that depicts my principled objection to several matters involving another thought leader. That bigger picture is: two talks, one room, all people present for it, a lot of oratory by me explaining the slide, as well as back and forth discussion with the audience. Yes, there were people in the room who didn’t like hearing what I had to say, but “don’t offend anyone, ever” is not a rule I can live by, and neither can you. After all, I’m offended by most of the talks I attend.

Although the slide should not have been tweeted, I accept that it was, and that doing so was within the bounds of acceptable behavior. As I announced at the beginning of my talk, I don’t need anyone to make a safe space for me. Just follow your conscience.

What About My Conscience?

  • My conscience is clean. I acted out of true conviction to discuss important matters. I used a style familiar to anyone who has ever seen a public debate, or read an opinion piece in the New York Times. I didn’t set out to hurt Maaret’s feelings and I don’t want her feelings to be hurt. I want her to engage in the debate about the future of the craft and be accountable for her ideas. I don’t agree that I was presuming too much in doing so.
  • Maaret tells me that my slide was “stupid and hurtful.” I believe she and I do not share certain fundamental values about conferring. I will no longer be conferring with her, until and unless those differences are resolved.
  • Compassion is important to me. I will continue to examine whether I am feeling and showing the compassion for my fellow humans that they are due. These conversations and debates I have with colleagues help me do that.
  • I agree that making a safe space for students is important. But industry consultants and pundits should be able to cope with the full spectrum, authentic, principled reactions by their peers. Leaders are held to a higher standard, and must be ready and willing to defend their ideas in public forums.
  • The reaction on Twitter gave me good information about a possible trend toward fragility in the Twitter-facing part of the testing world. There seems to be a significant group of people who prize complete safety over the value that comes from confrontation. In the next conference I help arrange, I will set more explicit ground rules, rather than assuming people share something close to my own sense of what is reasonable to do and expect.
  • I will also start thinking, for each slide in my presentation: “What if this gets tweeted out of context?”

(Oh, and to those who compared me to Donald Trump… Can you even imagine him writing a post like this in response to criticism? BELIEVE ME, he wouldn’t.)

We. Use. Tools.

Context-Driven testers use tools to help ourselves test better. But, there is no such thing as test automation.

Want details? Here’s the 10,000 word explanation that Michael Bolton and I have been working on for months.

Editor’s Note: I have just posted version 1.03 of this article. This is the third revision we have made due to typos. Isn’t it interesting how hard it is to find typos in your own work before you ship an article? We used automation to help us with spelling, of course, but most of the typos are down to properly spelled words that are in the wrong context. Spelling tools can’t help us with that. Also, Word spell-checker still thinks there are dozens of misspelled words in our article, because of all the proper nouns, terms of art, and neologisms. Of course there are the grammar checking tools, too, right? Yeah… not really. The false positive rate is very high with those tools. I just did a sweep through every grammar problem the tool reported. Out of the five it thinks it found, only one, a missing hyphen, is plausibly a problem. The rest are essentially matters of writing style.

One of the lines it complained about is this: “The more people who use a tool, the more free support will be available…” The grammar checker thinks we should not say “more free” but rather “freer.” This may be correct, in general, but we are using parallelism, a rhetorical style that we feel outweighs the general rule about comparatives. Only humans can make these judgments, because the rules of grammar are sometimes fluid.

How Not to Standardize Testing (ISO 29119)

Many years ago I took a management class. One of the exercises we did was on achieving consensus. My group did not reach an agreement because I wouldn’t lower my standards. I wanted to discuss the matter further, but the other guys grew tired of arguing with me and declared “consensus” over my objections. This befuddled me, at first. The whole point of the exercise was to reach a common decision, and we had failed, by definition, to do that– so why declare consensus at all? It’s like getting checkmated in chess and then declaring that, well, you still won the part of the game that you cared about… the part before the checkmate.

Later I realized this is not so bizarre. What they had effectively done is ostracize me from the team. They had changed the players in the game. The remaining team did come to consensus. In the years since, I have found that changing the boundaries or membership of a community is indeed an important pillar of consensus building. I have used this tactic many times to avoid unhelpful debate. It is one reason why I say that I’m a member of the Context-Driven School of Testing. My school does not represent all schools, and the other schools do not represent mine. Therefore, we don’t need consensus with them.

Then what about ISO 29119?

The ISO organization claims to have a new standard for software testing. But ISO 29119 is not a standard for testing. It cannot be a standard for testing.

A standard for testing would have to reflect the values and practices of the world community of testers. Yet, the concerns of the Context-Driven School of thought, which has been in development for at least 15 years have been ignored and our values shredded by this so-called standard and the process used to create it. They have done this by excluding us. There are two organizations explicitly devoted to Context-Driven values (AST and ISST) and our community holds several major conferences a year. Members of our community speak at all the major practitioners conferences, and our ideas are widely cited. Some of the most famous testers in the the world, including me, are Context-Driven testers. We exist, and together with the Agilists, we are the source of nearly every new idea in testing in the last decade.

The reason they have excluded us is that they know we won’t agree to any simplistic standard based on templates or simple formulae. We know those things look pretty but they don’t help. If ISO doesn’t exclude us, they worry they will never finish. They know we will challenge their evidence, and even their ethics and basic competence. This is why I say the craft is not ready for standards. It will be years before all the recognized experts in testing can come together and agree on anything substantial.

The people running the ISO effort know exactly who we are. I personally have had multiple public debates with Stuart Reid, on stage. He cannot pretend we don’t exist. He cannot pretend we are some sort of lunatic fringe. Tens of thousands of testers have watched my video lectures or bought my books. This is not a case where ISO can simply declare us to be outsiders.

The Burden of Proof

The Context-Driven community stands for excellence in testing. This is why we must reject this depraved attempt by ISO to grab power and assert control over our craft. Our craft is still an open marketplace of ideas, and it is full of strong debates. We must protect that marketplace and allow it to evolve. I want the fair chance to put my competitors out of business (or get them to change their business) with the high quality of my work. Context-Driven testing has been growing in strength and numbers over the years. Whereas this ISO effort appears to be a job protection program for people who can’t stomach debate. They can’t win the debate so they want to remake the rules.

The burden of proof is not on me or any of us to show that the standard is wrong, nor is it our job to make it right. The burden is on those who claim that the craft can be standardized to study the craft and recognize and resolve the deep differences among us. Failing that, there can be no ethical or rational basis for standardization.

This blog post puts me on record as opposing the ISO 29119 standard. Together with my colleagues, we constitute a determined and sustained and principled opposition.

Benjamin Mitchell and the Trap of False Hypocrisy

One of the puzzles of intellectual life is how to criticize something you admire without sounding like you don’t admire it. Benjamin Mitchell has given an insightful talk about social dynamics in Agile projects. You should see it. I enjoyed it, but I also felt pricked by several missed opportunities where he could have done an even deeper analysis. This post is about one example of that.

Benjamin offers an example of feedback he got about feedback he gave to a member of his team:

“Your feedback to the team member was poor because:
it did not focus on any positive actions, and
it didn’t use any examples”

Benjamin immediately noticed that this statement appears to violate itself. Obviously, it doesn’t focus on positive actions and it doesn’t use any examples. To Benjamin this demonstrates hypocrisy and a sort of incompetence and he got his reviewer (who uttered the statement) to agree with him about that. “It’s incompetent in the sense that it has a theory of effectiveness that it violates,” Benjamin says. From his tone, he clearly doesn’t see this as the product of anything sinister, but more as an indicator of how hard it is to deeply walk our talk. Let’s try harder not to be hypocrites, I think he’s saying.

Except this is not an example of hypocrisy.

In this case, the mistake lies with Benjamin, and then with the reviewer for not explaining and defending himself when challenged.

It’s worth dwelling on this because methodologists, especially serious professional ones like Benjamin and me, are partly in the business of listening to people who have trouble saying what they mean (a population that includes all of humanity), then helping them say it better. He and I need to be very very good at what social scientists call “verbal protocol analysis.” So, let’s learn from this incident.

In order to demonstrate my point, I’d like to see if you agree to two principles:

  1. Context Principle: Everything that we ever do, we do in some particular situation, and that context has a large impact on what, how, and why we do things. For instance, I’m writing this in the situation of a quiet afternoon on Orcas Island, purely by choice, and not because I’m paid or forced to write it by a shadowy client with a sinister agenda.
  2. Enoughness Principle: Anything we do that is good or bad could have been even better, or even worse. Although it makes sense to try to do good work, that comes at a cost, and therefore in practice we stop at whatever we consider to be “good enough” and not necessarily the best we can do.

Assuming you accept those principles, see what happens when I slightly reword the offending comment:

In that situation, your feedback to the team member was poor compared to what you could easily have achieved because:
it did not focus on any positive actions, and
it didn’t use any examples”

Having added the words, what happens if Benjamin tells me that this statement doesn’t focus on positive actions and doesn’t cite an example? I reply like this:

“That’s a reasonable observation, but I think it’s out of place here. My advice pertains to giving feedback to people who feel frightened or threatened or may not have the requisite skills to comprehend the feedback or in a situation where I am not seen as a credible reviewer. And my advice pertains to situations where you want to invest in giving vivid, powerful advice– advice that teaches. However, in this case, I felt it was good enough (not perfect but good at a reasonable investment of my time) to ignore the positive (because, Benjamin, you already know you’re good, and you know that I know that you are good– so you don’t need me to give you a swig of brandy before telling you the “bad news”) and I thought that investing in careful phrasing of a vivid example might actually sound patronizing to you, because you already know what I’m talking about, man.”

In other words, with the added words in bold face, it becomes a little clearer that the situation of him advising his client, and us advising him, are different in important ways.

Imagine that Benjamin spots a misspelled word in my post. Does he need to give me an example of how to spell it? Does he need to speak about the potential benefits of good spelling? Does he need to praise my use of commas before broaching the subject of spelling? No. He just needs to point and say “that’s spelled wrong.” He can do that without being a hypocrite, don’t you think?

(Of course, if the situations are not different and the quality of the comment made to Benjamin is clearly not good enough, then it is fair to raise the issue that the feedback does not meet its own implied standard.)

Finally: I added those bolded words, but if I’m in a community that assumes them, I don’t need to add them. They are there whether I say them or not. We don’t need to make explicit that which is already a part of our culture. Perhaps the person who offered this feedback to Benjamin was assuming that he understood that advice is situational, and that a summary form of feedback is better in this case than a lengthy ritual of finding something to praise about Benjamin and then citing at least three examples.

…unless Benjamin is a frightened student… which he isn’t. Look at him in that video. He exudes self-confidence. That man is a responsible adult. He can take a punch.

Who’s the Real Monster?

“Best practice” thinking itself causes these misunderstandings. Many people seek to memorize protocols such as “how to give feedback… always do this… step 1: always say something nice step 2: always focus on solutions not problems… etc.” instead of understanding the underlying dynamics of communication and relationships. Then when they slip and accidentally behave in an insightful and effective way instead of following their silly scripts, their friends accuse them of being hypocrites.

When the explicit parts of our procedures are at war with the tacit parts, we chronically fall into such traps.

There is a silver lining here: it’s good to be a hypocrite if you are preaching the wrong things. Watch yourself. The next time you fail in your discipline to do X, seriously consider if your discipline is actually wrong, and your “failure” is actually success of some kind.

This is why when I talk about procedures, I speak of heuristics (which are fallible) and skills (which are plastic) and context (which varies). There are no best practices.

I’m going to wrap this up with some positive feedback, because he doesn’t know me very well, yet. Benjamin, I appreciate how, in your work, you question what you are told and reflect on your own thought processes in a spirit of both humility and confidence. YOU don’t seem infected by “best practice” folklore. Thank you for that.



Context-Driven Testing at a Crossroads

Cem Kaner, who controls, has announced an interesting change in his view of the Context-Driven School. He says he prefers to think of it in terms of the Context-Driven approach, not a school of thought. This is a significant change from his original view, which was that CDT is a different paradigm.

That means I’m the last of the founders of the Context-Driven School, as such, who remain true to the original vision. I will bear its torch along with any fellow travelers who wish to pursue a similar program.

Polarization? No. Paradigm!

One of the things that concerns Cem is the polarization of the craft. He doesn’t like it, anymore. I suppose he wants more listening to people who have different views about whether there are best practices or not. To me, that’s unwise. It empties the concept of much of its power. And frankly, it makes a mockery of what we have stood for. To me, that would be like a Newtonian physicist in the 1690’s wistfully wishing to “share ideas” with the Aristotelians. There’s no point. The Aristotelians were on a completely different path.

For me, Context-Driven thinking is delightfully about listening to people and talking to people about practices and dynamics of software testing. But this must happen within the humanist framework that we laid out in the seven principles of the Context-Driven school. That’s our world.

Polarization is beside the point. Polarization is a natural consequence of the fact that our world view is simply different. We are a different paradigm. Our paradigm cannot be explained or contained by any other testing paradigm, such as the Factory School, or the Analytical School. We must have the stomach to keep moving along with our program regardless of the huddled masses who Don’t Get It.

Why Is This Division Happening Now?

Cem’s change of position is happening partly because, after 16 years, he and I are no longer collaborators. Due to a simmering personal dispute (nothing to do with testing) that blew up last year, we no longer can stand to be in the same room with each other. Alas, I don’t think this will change. What that means, professionally, is that the conversations that we once had– the passionate arguments– which led to mutual accommodations and syntheses, no longer happen. This is too bad, because the Factory schoolers, who greatly outnumber us, will make good rhetoric out of any appearance of confusion between Cem and I about our visions of testing.

Meanwhile, I will say this about Cem: He’s a great man. His contributions to testing have been enormous. I disagree with him on some aspects of testing, but by and large he does great work. I’m sure if he weren’t so furious with me and I were able to talk to him without feeling an overpowering urge to kick holes in walls (I mean that literally), we would still be able to develop testing ideas together. However, I trust that whatever he does will be worth looking at. And I do have many other bright collaborators, so I’m going to be fine.

The Context-Driven School continues, because I, and those like me, are compelled to pursue excellence wherever it leads us, even if that means breaking with “conventional” software testing thinkers. I wish Cem luck as he consorts with those guys, but I fear his time will be, for the most part, wasted.




Willful Ignorance on Parade

Michael Bolton is accused of hand-waving in this thread on LinkedIn. (See the comment by Peter).

Michael and I talk a lot about cognition and exploration. We speak in tropes that come from philosophy and various branches of science. Once in a while, some fellow who understands little of what we say assumes that we just made it all up to impress the ladies.

It brings to mind the Large Hadron Collider. Here’s an excerpt from one of their bulletins:

“The very smooth and fast transition to operation with ions was made possible by very good beam instrumentation performance with a relatively low number of charges per bunch, and magnetic behaviour very similar to operation with protons, as expected. These two factors combined allowed the setting-up operations to be completed very quickly, and stable beam operation, with 2 bunches per beam, was achieved in just a few days.”

Gee, if you know nothing about physics, or the LHC project, this might sound very much like hand-waving, too! In this case, however, if it sounds sketchy, it’s probably more about the receiver than the source. After all, there really is a $10 billion device sitting in the ground with 4000 physicists poring over the data it generates.

As ambitious professionals, we need to be able to speak about complex subjects without constantly going back to kindergarten to bring along the people who refuse to study their own craft.

I don’t mind if an earnest seeker, who happens to be ignorant, asks what seems to be a silly question. I will help everyone who wants to learn. And I don’t mind the assertive dissenter who has done the homework and yet has a different style and judgment from mine.

I’m talking about something different: the willfully ignorant blowhard.

Please don’t be like that.

What Exploratory Testing is Not

Michael Bolton has gone off like a volcano in Iceland, writing a series about what exploratory testing isn’t:

Another thing I would add to this:

Exploratory testing is not defined by any specific example of exploratory testing.

Just as tap dancing does not characterize ballroom dancing, you can’t take any one example of exploratory testing and treat that as representative of the entire concept of ET.

If you were to hear me singing an aria by Mozart, that would be an example of opera singing. It would be an example of BAD opera singing, but it would truly be an example of the style. Similarly, I regularly talk to testers who go “oh yeah I’ve seen that exploratory testing stuff but it’s not structured… not documented… not x… not y… not whatever.” And my reply is “you probably haven’t seen skilled exploratory testing. Would you like to hear me sing an opera now? OR, I could show you a good example of ET in practice.”

Exploratory testing can be done in an unskilled, slapdash, silly way. Just as a unskilled driver behind the wheel of a car is still a driver who is driving a car, a poor tester can still be doing ET– albeit probably not very well.

The cool thing about ET is that, even done badly, it’s still a great way to find some bugs. Michael and I try to help you do much, much better than that.

The core idea of ET remains as it always has been. It’s been expressed in many different ways, but boils down to this: test design and test execution and learning mixed together in a mutually supportive way. Whenever you see that, and to the degree that you see that, you are seeing exploratory testing.

Exploratory Testing is not “Experienced-Based” Testing

Prabhat Nayak is yet another thinking tester recently hired by the rising Indian testing powerhouse, Moolya. Speaking of the ISTQB syllabus, he writes:

One such disagreement of mine is they have put “Exploratory Testing” on purely experienced based testing. James, correct me if I have got ET wrong (and I am always ready to be corrected if I have misunderstood something), a novice tester who has got great cognizance and sapience and can explore things better, can think of different ways the product may fail to perform per requirement can always do a great job in ET than a 5 years experienced tester who has only learned to execute a set of test cases. That is probably one of the beauties of ET. There is of course, always an advantage of having some experience but that alone doesn’t suffice ET to be put under experienced based testing.

You are quite correct Prabhat. Thank you for pointing this out.

The shadowy cabal known as the ISTQB insulates itself from debate and criticism. They make their decisions in secret (under NDA, if you can believe it!) and they don’t invite my opinion, nor anyone’s opinion who has made a dedicated study of exploratory testing. That alone would be a good reason to dismiss whatever they do or claim.

But this case is an especially sad example of incompetent analysis. Let me break it down:

What does “experience-based” mean?

Usually when people in the technical world speak of something as “x-based” they generally mean that it is “organized according to a model of x” or perhaps “dominated by a consideration of x.” The “x”, whatever it is, plays a special role in the method compared to its role in some other “normal” or “typical” method.

What is a normal or typical method of software testing? I’m not aware the the ISTQB explicitly takes a position on that. But by calling ET an experience-based technique, they imply that no other test technique involves the application of experience to a comparable degree. If they have intended that implication– that would be a claim both remarkable and absurd. Why should any test technique not benefit from experience? Do they think that a novice tester and an experienced tester would choose the exact same tests when practicing other test techniques? Do they think there is no value to experience except when using ET? What research have they done to substantiate this opinion? I bet none.

If they have not intended this implication, then by calling ET experience-based it seems to me they are merely making impressive sounds for the sake of it. They might as well have called ET “breathing-based” on the theory that testers will have to breathe while testing, too.

Ah, but maybe there is another interpretation. They may have called ET “experienced-based” not to imply that ET is any more experience-based than other techniques, but rather as a warning that expresses their belief that the ONLY way ET can be valuable is through the personal heroism and mastery of the individual tester. In other words, what they meant to say was that ET is “personal excellence-based” testing, rather than testing whose value derives from an explicit algorithm that is objective and independent of the tester himself.

I suspect that what’s really going on, here: They think the other techniques are concrete and scientific, whereas ET is somehow mystical and perhaps based on the same sort of dodgy magic that you find in Narnia or MiddleEarth. They say “experience-based” to refer to a dark and scary forest that some enter but none ever return therefrom… They say “experienced-based” because they have no understanding of any other basis that ET can possibly have!

Why would it be difficult for Factory School testing thinkers (of which ISTQB is a product) to understand the basis of ET?

It’s difficult for them because Factory School people, by the force of their creed, seek to minimize the role of humanness in any technical activity. They are radical mechanizers. They are looking for algorithms instead of heuristics. They want to focus on artifacts, not thoughts or feelings or activities. They need to deny the role and value of tacit knowledge and skill. Their theory of learning was state of the art in the 18th century: memorization and mimicry. Then, when they encounter ET, they look for something to memorize or mimic, and find nothing.

Those of us who study ET, when we try to share it, talk a lot about cognitive science, epistemology, and modern learning theory. We talk about the importance of practice. This sounds to the Factory Schoolers like incomprehensible new agey incantations in High Elvish. They suspect we are being deliberately obscure just to keep our clients confused and intimidated.

This is also what makes them want to call ET a technique, rather than an approach. I have, since the late nineties, characterized exploratory testing as an approach that applies to any technique. It is a mindset and set of behaviors that occur, to some degree, in ALL testing. To say “Let’s use ET, now” is technically as incoherent as saying “Let’s use knowledge, now.” You are always using knowledge, to some degree, in any work that you do. “Knowledge” is not a technique that you sometimes deploy. However, knowledge plays more a role in some situations and less a role in others. Knowledge is not always and equally applicable, nor is it uniformly applied even when applicable.

For the Factory Schoolers to admit that ET is endemic to all testing, to some degree, would force them to admit that their ignorance of ET is largely ignorance of testing itself! They cannot allow themselves to do that. They have invested everything in the claim that they understand testing.  No, we will have to wait until those very proud and desperately self-inflated personalities retire, dry up, and blow away. The salvation of our craft will come from recruiting smart young testers into a better way of thinking about things like ET. The brain drain will eventually cause the Factory School culture to sink into the sea like a very boring version of Altantis.

Bottom Line: Most testing benefits from experience, but no special experience is necessary to do ET

Exploratory testing is not a technique, so it doesn’t need to be categorized alongside techniques. However, a more appropriate way to characterize ET, if you want to charactize it in some way, is to call it self-managed and self-structured (as opposed to externally managed and externally structured). It is testing wherein the design part of the process and the execution part of the process are parallel and interactive.

You know what else is self-managed and self-structured? Learning how to walk and talk. Does anyone suggest that only “experienced people” should be allowed to do that?

Avoiding My Curse on Tool Vendors

Adam Goucher noticed that I recently laid a curse upon commercial test tool vendors (with the exception of Hexawise, Blueberry Consultants, and Atlassian). He wondered to me how a tool vendor might avoid my curse.

First, I’m flattered that he would even care who I curse. But, it’s a good question. Here’s my answer:

Test tool vendors that bug me:

  • Any vendor who wants me to pay for every machine I use their tool upon. Guys, the nature of testing is that I need to work with a lot of machines. Sell me the tool for whatever you want to charge, but you are harming my testing by putting obstacles between me and my test lab.
  • Any vendor that sell tools conceived and designed by a goddamn developer who hates to goddamn test. How do I know about the developer of a test tool? Well, when I’m looking at a tool and I find myself asking “Have these vendor bozos ever actually had to test something in their lives? Did they actually want a tool like this to help them? I bet this tool will triple the amount of time and energy I have to put into testing, and make me hate every minute of it” then I begin to suspect there are no great lovers of testing in the house. This was my experience when I worked with Rational Test Manager , in 2001. I met the designer of that tool: a kid barely out of MIT with no testing or test management experience who informed me that I, a Silicon Valley test management veteran, wasn’t qualified to criticize his design.
  • Any vendor selling me the opportunity, at great cost, to simulate a dim-witted test executioner. Most tool vendors don’t understand the difference between testing and checking, and they think what I want is a way to “test while I sleep.” Yes, I do want the ability to extend my power as a tester, but that doesn’t mean I’m happy to continually tweak and maintain a brittle set of checks that have weak oracles and weak coverage.
  • Any vendor who designs tools by guessing what will impress top managers in large companies who know nothing about testing. In other words: tools to support ceremonial software testing. Cem and I once got a breathless briefing about a “risk-based test management” tool from Compuware. Cem left the meeting early, in disgust. I lingered and tried to tell them why their tool was worthless. (Have you ever said that to someone, and they reacted by saying “I know it’s not perfect” and you replied by saying “Yes, it’s not perfect. I said it’s worthless, therefore it would follow that it’s also not perfect. You could not pay me to use this tool. This tool further erodes my faith in the American public education system, and by extension the American experiment itself. I’m saying that you just ruined America with your stupid stupid tool. So yeah, it’s not perfect.”) I think what bugged Cem and me the most is that these guys were happy to get our endorsement, if we wanted to give it, but they were not at all interested in our advice about how the tool could be re-designed into being a genuine risk-based testing tool. Ugh, marketers.
  • Vendors who want to sell me a tool that I can code up in Perl in a day. I don’t see the value of Cucumber. I don’t need FIT (although to his credit, the creator of FIT also doesn’t see the big deal of FIT). But if I did want something like that, it’s no big deal to write a tool in Perl. And both of those tools require that you write code, anyway. They are not tools that take coding out of our hands. So why not DIY?

Tool vendors I like:

  • Vendors who care what working testers think of their tools and make changes to impress them. Blueberry, Hexawise, and Sirius Software have all done that.
  • Vendors who have tools that give me vast new powers. I love the idea of virtual test labs. VMWare, for instance.
  • Vendors who don’t shackle me to restrictive licenses. I love ActivePerl, which I can use all over the place. And I happily pay for things like their development kit.
  • Vendors who enjoy testing. Justin Hunter, of Hexawise, is like that. He’s the only vendor speaking at CAST, this year, you know.

The Euthyphro Dilemma in New Zealand

I recently had the opportunity to converse about tester certification with Carol Cornelius, who’s on the board of the New Zealand version of the ISTQB. The discussion went well in one respect: she did not physically run away. (Oh, and she concurred with me on the subject of Stuart Reid, which was nice to hear.)

[Considering that they misrepresent my ideas, perhaps she should have run away. To give two examples, my work on SBTM has been plagiarized in their Test Management Syllabus (page 47 of the Advanced Level Syllabus) and of course, they have taken it out of context and gotten it wrong. And my definition of exploratory testing has also been plagiarized and corrupted (page 51). It’s mischaracterized as a cute little informal test technique when in fact it is an approach, universal to good testing, that applies to any technique, and the citation they give (to a chapter I wrote in an out of date book) directly contradicts what they say about it.]

Well, Carol stood her ground, at least physically. But as the debate developed, she made an odd-sounding claim. She said words to the effect that the ISTQB Syllabus is what it is, and is not subject to her criticism. This shocked me, especially since this admission was made in a rather off-handed way– somewhat like commenting that the moon is in the sky.

The reason for my shock was the sudden realization that I was putting a lot of energy into arguing with a person who was, essentially, behaving as a puppet. Oh dear. I was doing the equivalent of yelling at my television instead of engaging the guy ON the television.

If someone defends a principle that he has not originated and is not free to change, reject, or even criticize, then he is not defending it rationally. He cannot. No rational defense can be made under those circumstances. Rationality, in fact, loses its meaning. What he is doing is simply advertising his commitment. And that has no more weight in an argument than have the words of a baseball (or netball) fan who predicts his team will win the big game.

This triggered a niggling memory in me, and afterward it popped fully to mind: the Euthyphro Dilemma.

The dilemma occurs in the Platonic Dialogue of Euthyphro. Socrates is examining Euthyphro about the source of the notion of piety, or good behavior in humans. Euthyphro says that what is loved by the Gods (all of the Gods) is good. Then Socrates asks:

“Just consider this question: Is that which is holy loved by the gods because it is holy, or is it holy because it is loved by the gods?”

Now this isn’t just a question, but also an attack on the whole notion of appeal to authority. That’s why Carol’s offhanded comment triggered the memory.

Let me apply it to the case at hand: Is the ISTQB syllabus good because it’s a powerful, helpful set of ideas that define genuine testing professionalism, or is it good simply because the ISTQB organization says so?

This question is a dilemma for ISTQB supporters, because if they go with the first option, then they must:

  1. Avoid making any claim or behaving in any way that suggests they believe the syllabus just because they are unable or unwilling to study testing for themselves. (Saying that the ISTQB Syllabus was beyond her criticism violates this one.)
  2. Be prepared to explain and justify any aspect of the syllabus on when challenged by a colleague, just as anyone is normally expected to do for a personal opinion in a professional context. (I felt that Carol did not do this. At one point I cited my definition of coverage, and she agreed with it. Then I began to point out implications of my view that contradicted ISTQB dogma about measuring coverage, and I didn’t hear a coherent response after that.)
  3. Be prepared to change their views (thus breaking with the ISTQB) in the face of compelling evidence or reasoning. (I did not witness Carol do this, but I can understand that she wouldn’t necessarily consider my word as a tester to be evidence. What I can’t understand is why she seemed unaware of the work that my community has done and published, over the years, that does comprise compelling argument and evidence. This is not a new or novel debate. The issues have been clearly and repeatedly and publicly established.)
  4. Admit that they don’t need the ISTQB to learn testing, nor to be recognized as a good tester. (I don’t think this applies to Carol, since my understanding is that she doesn’t consider herself to be a tester, strictly speaking.)

And if they go with the second option, then they are choosing to be zombie non-combatants and can be safely ignored in the Great Conversation of testing.

Ideological commitment can be a bitch. That’s why, in the Context-Driven world, we keep that part pretty simple. There are seven principles that define our program. Of course there are many other common patterns and beliefs beyond those seven, but there is tremendous flexibility, because our whole focus is on DENYING a One True Way of testing.

We see testing professionalism as a matter of vigorous personal study and development. We reject any universal syllabus of testing. That’s why it hardly matters whether some particular definition or claim in the syllabus is also one I hold– because the nature of my commitment and the way I understand is completely different from that of an ISTQB board member.

This is why I say that supporting the ISTQB is, in and of itself, inconsistent with the goal of being a testing professional. A professional tester must own his craft.

Shiva is Annoyed with My Questions

A person named Shiva contacted me on Skype back in May. Then he didn’t say anything for several months, until yesterday we had this exchange.


Hi James!




I wanted to set up some time with you to chat about an idea and interest you in it…what would be a good time to chat? Would 1130 AM PST Mon work?


What is it about?


Well, I am on the advisory board of a company in China that does phenomenal work and they are very good in testing and SW dev. wanted to see if you wanted to take advantage of their low rates and high quality and make some money? Especially if you are planning to enter that business?


I run a testing company.


I know.


So… Why would I need a testing company? I already have one!


To improve your margins. Scale.


I don’t see how that would be possible. I do a certain kind of testing that requires a high level of skill. I doubt that any other testing company in the world has that skill. Well, there are a couple, but they are expensive.


That is exactly the point I wanted to walk you through as this company has a phenomenal ability to learn, get resources due to their location and yet do high quality work. It is an idea. I am willing to explore it with you if you are interested. If not, I completely understand.


I would have to see examples of the quality work you say that your company does. Is it posted online? Most companies that say they do quality work don’t do quality work at all. They do terrible work. So, I would need to see what you can do.

Also, I would need to know your training practices. I’d want to see them in writing. If you forward your training materials to me, I could review them.


Sure. Before I go forward, it would be good to have a three-way chat with their president/co-founder, myself and you and I am happy to arrange for samples for you to review – work samples, training practices.


I won’t be interested in talking unless you can show me some basic evidence that we have anything to talk about.

Let me ask you just a few questions:

– Does your test lab document all of its testing in detail? Is every test procedure and action documented?

– Do you have complete expected results documented, too?

– Do you maintain statistics on passed/failed tests? Do you graph them?

– Are your testers ISTQB certified?

Your answers to these questions will allow me to quickly assess your capability. Otherwise, I worry you will waste your time.


James, all these are great questions and I have answers that you will like but I need to think about what you said first. In my opinion, business is not just some Q&A, it is also building cross-company relationships and getting to know each other. I need to think about whether we will be a fit that way at all with each other. Please don’t take this personally but I need to give this some careful thought, or else it may not even work even though we are able to deliver capabilities.


You are right. Business is about relationships. I want a relationship with people who can answer basic questions. These are questions that I routinely answer for my clients.

I’m a testing company, too, right? You know that, right? I know what my clients demand of me, and I’m demanding that of you, too. If what you want is an uncritical client who is easily impressed, you came to the wrong guy.

I’m not taking it personally. I’m taking it as an indication that you are a bit over your head.


There is no need to be rude, James.


I’m not being rude. I’m being honest. It’s not my problem if you can’t handle that.


(Skype indicates Shiva has gone offline.)


It is not rude for a potential customer to challenge your corporate capability. That’s normal due diligence. Your job is to speak honestly and forthrightly about what you can and can’t do. Don’t dodge questions.

The reason I’m skeptical is that almost no test lab actually knows what it is doing. And there’s no excuse. Test labs ought to know how to test, but mostly I see labs much better at faking testing than doing it.

Two “Scoops” of “Bugs”

I have often said something like “We found a hundred bugs!” Lots of people have heard me say it. Statements like that are very valuable to me. But we should ask some vital questions about them.

Consider Raisin Bran cereal. If you lived in America and weren’t in solitary confinement during the 80’s an 90’s you would have seen this commercial for Raisin Bran at some point (or one like it):

Two scoops of raisins!

Huh? Two scoops of raisins? What does that mean?

Perhaps the conversation went like this:

“I want my Bran to have MANY raisins!” barked Boss Kellogg.

“But, Mr. Kellogg, we already include nearly one full standard scoop.” replied the Chief Cereal Mixer. “No one has more raisins than we do.”

“Increase to maximum scoop!”

“But sir! that would violate every–”

“TWO SCOOPS! And damn the consequences.”

“The skies will be black with raisins!”

“Then we shall eat in the shade.”

I doubt anything like that happened, though. I suspect what happened is that somebody mixed some raisins with some bran flakes until it tasted pretty good. Maybe he adjusted it a little to optimize cost of goods (and perhaps they adjust the bran/raisin ratio as cost of goods change). Later, I bet, and completely unrelated to the engineering and manufacturing process, Kellogg’s advertising agency decided to create the impression that customers are getting a lot of value for the money, so they invented a distinguishing characteristic that actually makes no sense at all: an absolute measurement called a “scoop”. And began to speak of it AS IF it were meaningful.

The reason the measurement makes no sense at all is that the “Two Scoops” slogan was pasted onto boxes of substantially different sizes. But even if the measurement makes no sense, the pretentious claim makes a lot of sense, because we humans don’t think through the rational basis of measurements like this unless we are A) rather well trained, and more importantly B) highly motivated. So our unconscious lizard brain says to itself “two means yummy. two means yummy. means two yummy. yummy two…”

At some point, someone (an intern, perhaps) may have asked “But are there actually two scoops of raisins in those boxes?” and the answer was much laughing. Because it could be argued that if there are at least two raisins in the box, then there are two scoops of raisins in the box. It could be argued that if there is one raisin in the box and you used two scoops to measure it (“measure twice and cut once”) then there are two scoops of raisins in the box. If you make up your own measuring unit, such as, say, “scoop”, you can go on to make any other claim you want. This is exactly the point of Jerry Weinberg’s famous dictum “If quality doesn’t matter, you can achieve any other goal you want.”

I was thinking about doing a scientific analysis of this, but someone beat me to it.

Oh What Silliness… OR IS IT?

We have a real problem in testing, and no good solution for it. We are supposed to report the ground truth. Concrete reality. But this turns out to be a very difficult matter. Apart from all the problems of observation and interpretation, we have to summarize our findings. When we do that we are tempted to use scientific tropes (such as nonsensical measurements)  to bolster our reports, even when they are poorly founded. We are often encouraged to do this by managers who were raised on Kelloggs commercials and therefore confuse numbers with food.

Let’s look once again at the Raisin Bran situation and consider what might be the reasonable communication hidden there:

Maybe “two scoops” is intended to mean “ample” or “amply supplied with raisins.” In other words they are saying “You won’t regret buying our Raisin Bran, which always has enough raisins for you. While you’re eating it, we predict you will hum the ‘two scoops of raisins!’ song instead of calling a lawyer or becoming a cereal killer.”

I think there’s a scale built into all of us. It’s a comparative scale. It goes like this:

  • Minimum Possible
  • Nothing
  • Hardly any
  • Some
  • Enough
  • Plenty
  • Remarkable
  • “OMG! That must be a record!”
  • Maximum Possible

This scale is a bit of a mess. The italicized values move around (e.g. maximum possible may be not enough in some situation). The others although fixed relative to each other, aren’t fixed in any way more definite than their ordering. The scale is highly situational. It’s relative to our understanding of the situation. For instance you might be impressed to learn that the Colonia cable ship, which was the largest cable ship in the world in 1925, could carry 300 miles of cable in her hold. If so you would be very easily impressed, because I just lied to you… According to that article it actually could hold 3,000 miles of cable. (However, bonus points if you were thinking “what KIND of cable?”)

What I do with bug numbers, etc.

I want you to notice my first paragraph in this post. Notice that every sentence in that paragraph invokes an unspecified quantity.

  • “I have often…” Often compared to what?
  • “Lots of…” Lots compared to what?
  • “Very…” Very compared to what?
  • “Vital…” Vital compared to what?

You could say “He’s not saying anything definite in those sentences.” I agree, I’m not. I’m just giving an impression. My point is this: an impression is a start. An impression might be reasonable. An impression may make conversation possible. An impression may make conversation successful.

Most engineering statements like this don’t stand alone. Like flower buds, they blossom under the sunlight of questioning. And that’s why I can’t take any engineer seriously who gets offended when his facts are questioned. They cry: “Don’t you believe me?” I answer: “I don’t know what you mean, so belief has no meaning, yet.”

So, as a professional tester who prides himself on self-examination, I am ready for the probing perspective question that might follow my attempt to send an impression: “compared to what?” I am ready for the data question, too: “what did you see or hear that leads you to say this?”

I strive (meaning I consciously and consistently work on this) to be reasonable and careful in my use of qualifiers, quantifiers, quantities, and intensifiers. For instance, you will notice that I just used the word “reasonable”, by which I intend to invoke images of normal professional practice in your mind (A LOT like invoking the image of two healthy reasonable scoops of delicious raisins).

One important and definite thing that is accomplished by this relatively loose use of language is that it allows us to talk to each other without bogging down the conversation with ALL the specifics RIGHT NOW.

Kelloggs used the method mostly to trick you into buying their bran smothered raisin products. They didn’t have any reasoning behind “two scoops.” But we can use the same technique wisely and ethically, if we choose. We can be ready to back up our claims.

For Bugs: If I tell you I “found X bugs!!” in your product, the number of exclamation points indicates the true message. An exclamation point means “remarkable” or “lots.” If I tell you I found a lot of bugs in your product, I mean I found substantially more than I expected to find in the product, and more than a reasonable and knowledgeable person in this situation would consider acceptable. And by “more” I don’t mean quantity of bug reports, I mean the totality of diversity of problems, impact of problems, and frequency of occurrence of problems. The headline for that is “lots of bugs” or maybe I should say “two scoops of bugs!”

Stuart Reid’s Bizarre Plea

Stuart Reid is planning to do a talk on how we should use “evidence” in our debates about what works and doesn’t work in testing.

A funny thing about that is Stuart once spent 30 minutes trying to convince me that the number “35,000” was evidence of how great the ISEB certification is, as in “35,000 happy customers can’t be wrong.” Such a concept of “evidence” wouldn’t not pass muster in a freshman course in logic and rhetoric. How does he know that the 35,000 people are happy? How does he know that they are qualified to judge the quality of the certification? How does he explain the easily checked fact that you can pick out any three ISEB or ISTQB certified testers, ask them if they think the certification has made them better testers or indicates that they are better testers, and at least two of them will do the equivalent of rolling their eyes and smirking? (Don’t believe me? I understand. So TRY IT, as I do on a regular basis in my classes)

You might think Stuart is attempting a bold and classic rhetorical move: attempting to control the terms of the debate. The problem he has is that he will lose the debate even faster if he actually engages on the question of evidence. This is because there is plenty of evidence from other fields and the history of thought itself to justify the positions of the Context-Driven School of testing. We are winning the debates because we are better informed and better educated than the Factory Schoolers, for instance, represented by Reid. For instance, Rikard Edgren (who says he’s not in the Context-Driven School, but looks like a duck to me) wrote about applying Grounded Theory to testing. I wonder if Stuart Reid has ever heard of Grounded Theory. He probably has, because I probably mentioned it at least once in the hours of debate that Stuart and I have had. He didn’t respond or react. My impression was that he wasn’t listening.

There’s something far more important than evidence that we need in our industry: engagement. People need to listen to and respond to the arguments and evidence that are already out there.

Here’s one sort of evidence I put in front of Stuart, in a debate. I claimed that my school of testing represents a different paradigm of thinking about testing than his does. After giving him examples of specific words that we define differently and concepts that we arrange differently, it became clear that the deeper problem is that he thought I was pretending to believe things that I don’t believe, just to be difficult. He actually said that to me!

This is the last resort of the determined idealogue: poke your own eyes out so that you don’t risk seeing contrary evidence. Stuart’s case rests on pretending that no one else is making a case! His demand for evidence is meant to give the impression that the evidence is not already sitting in front of him being ignored.

Cem Kaner, Michael Bolton, and I have been marshaling evidence, pointing out the lack of evidence against our ideas, and demonstrating our methods for many years. Next week it will be exactly 23 years since I first became a full-time software tester, and nearly 17 years since the first time I stood up at a conference and pointed out the absurdity of “traditional” testing methods.

BTW, here some of the kinds of evidence I offer when challenged about my work:

  • The Sciences of the Artificial, by Herbert Simon (this establishes, based on a body of research for which he won the Nobel Prize in 1978, the heuristic nature of engineering)
  • Collaborative Discovery in a Scientific Domain, Takeshi Okada, Herbert Simon, 1997, (this is an experiment that observed the behaviors of scientists attempting to create and perform experiments together in an exploratory way)
  • The Processes of Scientific Discovery: The Strategy of Experimentation, Deepak Kulkarni, Herbert Simon, 1988 (this study analyzes the basic exploratory processes of science)

The first item here is a book, the next two are papers published in the journal Cognitive Science. See, if Stuart wants evidence, he has to look beyond the desert that is Computer Science. He needs to get serious about his scholarship. That will require him to find, in his heart, a passion to learn about testing.

CNN Believes Whatever Computers Say

(CNN) — Early evidence points to driver error as the reason a 2005 Prius sped into a stone wall on March 9, according to federal investigators.

“Information retrieved from the vehicle’s onboard computer systems indicated there was no application of the brakes and the throttle was fully open,” according to a statement from the National Highway Traffic Safety Administration.

The statement suggests the driver may have been stepping on the accelerator, instead of the brake, as she told police.

Note to CNN: Information retrieved from an onboard computer is not reliable in cases where the computer itself is a suspect in the crime. No one is claiming that ghosts are causing the cars to go out of control. And there is no evidence yet that a mechanical failure is the culprit, despite the NHTSA looking hard for that evidence. Alcohol is also not a factor in these specific cases (or else we wouldn’t even be talking about them). That leaves two big possibilities: sober experienced people are suddenly forgetting how to drive OR something is wrong with the software or electronics that controls the cars.

If the software that reads the control inputs has the right kind of fault in it, it may occasionally lose the ability to read and react to control inputs. It is easy for a computer programmer to imagine a situation where the part of the system that records the control settings is working fine, while the part that acts on them is failing. That would be consistent with the facts of this case. Moreover, the problem may be transient, leaving no evidence that it happened.

We don’t know what happened. We may never know. But there is no reason to assume that the computer is infallible. Computers are designed by fallible people.