About James Bach

Consulting Software Tester Trainer of Testers

Round Earth Test Strategy

The “test automation pyramid” (for examples, see here, here, and here) is a popular idea, but I see serious problems with it. I suggest in this article an alternative way of thinking that preserves what’s useful about the pyramid, while minimizing those problems:

  1. Instead of a pyramid, model the situation as concentric spheres, because the “outer surface” of a complex system generally has “more area” to worry about;
  2. ground it by referencing a particular sphere called “Earth” which is familiar to all of us because we live on its friendly, hospitable surface;
  3. illustrate it with an upside-down pyramid shape in order to suggest that our attention and concern is ultimately with the surface of the product, “where the people live” and also to indicate opposition to the pyramid shape of the Test Automation Pyramid (which suggests that user experience deserves little attention);
  4. incorporate dynamic and well as static elements into the analogy (i.e. data, not just code);
  5. acknowledge that we probably can’t or won’t directly test the lowest levels of our technology (i.e. Chrome, or Node.js, or Android OS). In fact, we are often encouraged to trust it, since there is little we can do about it;
  6. use this geophysical analogy to explain more intuitively why a good tooling strategy can access and test the product on a subterranean level, though not necessarily at a level below that of the platforms we rely upon.

Good analogies afford deep reasoning.

The original pyramid (really a triangle) was a context-free geometric analogy. It was essentially saying: “Just as a triangle has more area in its lower part than its upper part, so you should make more automated tests on lower levels than higher levels.” This is not an argument; this is not reasoning. Nothing in the nature of a triangle tells us how it relates to technology problems. It’s simply a shape that matches an assertion that the authors wanted to make. It’s semiotics with weak semantics.

It is not wrong to use semantically arbitrary shapes to communicate, of course (the shapes of a “W” and an “M” are opposites, in a sense, and yet nobody cares that what they represent are not opposites). But at best, it’s a weak form of communication. A stronger form is to use shapes that afford useful reasoning about the subject at hand.

The Round Earth model tries to do that. By thinking of technology as concentric spheres, you understand that the volume of possibilities– the state space of the product– tends to increase dramatically with each layer. Of course, that is not necessarily the case, because a lot of complexity may be locked away from the higher levels by the lower levels. Nevertheless that is a real and present danger with each layer you heap upon your technology stack. An example of this risk in action is the recent discovery that HTML emails defeat the security of PGP email. Whoops. The more bells, whistles, and layers you have, the more likely some abstraction will be fatally leaky. (One example of a leaky abstraction is the concept of “solid ground,” which can both literally and figuratively leak when hot lava pours out of it. Software is built out of things that are more abstract and generally much more leaky than solid ground.)

When I tell people about the Round Earth model they often start speaking of caves, sinkholes, landslides, and making jokes about volcanoes and how their company must live over a “hot spot” on that Round Earth. These aren’t just jokes, they are evidence that the analogy is helpful, and relates to real issues in technology.

Note: If you want to consider what factors make for a good analogy, Michael Bolton wrote a nice essay about that (Note: he calls it metaphor, but I think he’s referring to analogies).

The Round Earth model shows testing problems at multiple levels.

The original pyramid has unit testing at the bottom. At the bottom of the Round Earth model is the application framework, operating environment, and development environment– in other words, the Platform-That-You-Don’t-Test. Maybe someone else tests it, maybe they don’t. But you don’t know and probably don’t even think about it. I once wrote Assembler code to make video games in 16,384 bytes of memory. I needed to manage every byte of memory. Those days are long gone. Now I write Perl code and I hardly think about memory. Magic elves do that work, for all I know.

Practically speaking, all development rests on a “bedrock” of assumptions. These assumptions are usually safe, but sometimes, just as hot lava or radon gas or toxified groundwater breaks through bedrock, we can also find that lower levels of technology undermine our designs. We must be aware of that general risk, but we probably won’t test our platforms outright.

At a higher level, we can test the units of code that we ourselves write. More specifically, developers can do that. While it’s possible for non-developers to do unit-level checks, it’s a much easier task for the devs themselves. But, realize that the developers are working “underground” as they test on a low level. Think of the users as living up at the top, in the light, whereas the developers are comparatively buried in the details of their work. They have trouble seeing the product from the user’s point of view. This is called “the curse of expertise:”

“Although it may be expected that experts’ superior knowledge and experience should lead them to be better predictors of novice task completion times compared with those with less expertise, the findings in this study suggest otherwise. The results reported here suggest that experts’ superior knowledge actually interferes with their ability to predict novice task performance times.”

[Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2), 205–221. doi:10.1037/1076-898x.5.2.205]

While geophysics can be catastrophic, it can also be more tranquil than a stormy surface world. Unit level checking generally allows for complete control over inputs, and there usually aren’t many inputs to worry about. Stepping up to a higher level– interacting sub-systems– still means testing via a controlled API, or command-line, rather than a graphical interface designed for creatures with hands and eyes and hand-eye coordination. This is a level where tools shine. I think of my test tools as submarines gliding underneath the storm and foam, because I avoid using tools that work through a GUI.

The Round Earth model reminds us about data.

Data shows up in this model, metaphorically, as the flow of energy. Energy flows on the surface (sunlight, wind and water) and also under the surface (ground water, magma, earthquakes). Data is important. When we test, we must deal with data that exists in databases and on the other side of micro-services, somewhere out in the cloud. There is data built into the code, itself. So, data is not merely what users type in or how they click. I find that unit-level and sub-system-level testing often neglects the data dimension, so I feature it prominently in the Round Earth concept.

The Round Earth model reminds us about testability.

Complex products can be designed with testing in mind. A testable product is, among other things, one that can be decomposed (taken apart and tested in pieces), and that is observable and controllable in its behaviors. This usually involves giving testers access to the deeper parts of the product via command-line interfaces (or some sort of API) and comprehensive logging.

Epigrams

  • Quality above requires quality below.
  • Quality above reduces dependence on expensive high-level testing.
  • Inexpensive low-level testing reduces dependence on expensive high-level testing.
  • Risk grows toward the user.

 

 

Jerry Weinberg’s Last Worry

Jerry Weinberg has died. Jerry was my teacher more than any other single person. As I have told my students and clients for years: my work is an elaboration and improvisation on his work.

In November of 1999, I was a newly independent consultant, having rage-quit my previous job. I had already made a name for myself as a trouble-maker in software testing– and a few colleagues and I had only that month declared the Context-Driven school of software testing– but I had not yet crystallized the Rapid Software Testing methodology which would become the focus of my business. I was working on a book that had become a quagmire. My life was very stressful right at that moment.

In the midst of that, Jerry invited me to hang out with him in Albuquerque for a week to talk about life the universe and everything. I don’t know why he did that. I don’t remember him giving a plausible explanation. But the result was that I spent dozens of hours talking to Jerry, just me and him. We spoke about writing, testing, industry activism, collegiality, general systems thinking. We gave each other homework. He told me about the “fieldstone” approach to writing, which he later turned into a book.

At the time it was great fun. But now I know: that was the week I became who I am, professionally.

Who I am is in no sense a copy of Jerry. I vehemently disagree with him on certain issues of style and substance. I don’t seem to get along with most of his followers.

But none of that matters. What I took from Jerry was not his specific solutions or political preferences, for the most part. What Jerry showed me is how to be authentic without being cruel; how to have integrity in a world of mendacity; how to live confidently with uncertainty; how to debate your teacher while learning from him; how to transition from student to colleague; how to achieve your own agency without seeking anyone’s consent to do so.

None of those things are specific judgments or techniques. They are ways of being. Jerry taught partly by example, partly by story, partly by argument, but mostly through the little experiences and challenges he took his students through. “Authentic teaching” he called it.

Three Worries

Before I knew Jerry I suffered from periodic physical breakdowns related to work stress. Jerry taught me to arrange my professional life to minimize stress. The key is to discover what you are happy doing and what you don’t want to do, then systematically stop doing all those things that weigh you down and stress you out. Learn to listen to that inner voice that is telling you “enough!” Learn to say no without rancor. Let go of other people’s reactions.

While lecturing me on this he challenged me to write down my three biggest professional worries on a napkin. He did the same. On my napkin I wrote about two client reports that were past due, and the need to market my new business.

On his napkin he wrote only this: “I must get my hair cut once in a while.”

I laminated that napkin. It’s in a box somewhere or I would show it to you now.

I have many worries today, but I have a new and easier relationship to them. After that week in ’99, I gave over all control of my finances to my wife, who has managed them ever since. My only job, now, is daydreaming and talking to clients. I did go on to finish my Buccaneer book, with a much relaxed attitude, as well as writing another very fieldstone-ish book called Lessons Learned in Software Testing.

Knowing Jerry helped me come into my own as a thinker. I am determined to pass that gift on.

(Consider reading A Gift of Time, to see how Jerry influenced other people.)

Facebook and the AI Apocalypse

I hate Facebook. Hate is a strong word. It is too strong for Facebook, for instance. I had a Facebook account for about 30-minutes before I was banned, apparently by an algorithm. After locking down the account for maximum privacy and providing the minimum required data for my profile, the one and only one bit of content that I actually posted on Facebook (to my zero friends) was: “I hate Facebook.”

Russian bots? Facebook says come on in. James Bach? Facebook says not in our house.

(In case you are going to say that Facebook has a need to verify my identity, don’t bother: Facebook didn’t ask me about my identity before banning me. They did ask for a picture of me, which I provided, although I can’t see how that would have helped them. I am willing to prove who I am, if they want to know.)

(Fun fact: after they disabled my account they sent me two invitations to log-in. Each time, after I logged in, they told me my account was in fact disabled and I would not be allowed to log-in.)

I had a Facebook account years ago, soon after they came into existence. I cancelled that account after an incident where I discovered that someone was impersonating my father. I tried and failed to get a customer support human to respond to me about it. Suddenly I felt like I was on a train with no driver or conductor or emergency stop button or communication system. Facebook is literally a soulless machine, and in any way that it might not be a machine, it desperately wants to become more of a machine.

I don’t think any other organization quite aspires to be so unresponsive while claiming to serve people. If I call American Express or United Airlines, I get people on the line who listen and think. I might not get what I want, but they are obviously trying. Facebook is like dealing with a paranoid recluse. As a humanist who makes a living in the world of technology, the social irresponsibility of Facebook sickens me.

(In case you wonder “why did you sign up then?” the answer is so that I could administer my corporate Satisfice, Inc. page without logging in as my wife. I don’t mind having a Satisfice Facebook page.)

AI Apocalypse

This is what the AI apocalypse really looks like. We are living in the early stages of it, but it will get much worse. The AI apocalypse, in practical terms, will be the rise of a powerful class of servants that insulate certain rich people from the consequences of their decisions. Much evil comes from the lack of empathy and accountability by one group toward a less powerful group. AI automates the disruption of empathy and displacement of accountability. AI will be the killer app of killers.

Human servants once insulated the gentry, in centuries past. Low-status people do the dirty work that would horrify high status people. This is why the ideal servant in the manor houses of old England would not speak to the people he served, never complain, never marry, and generally engage in as little life as possible. And then there is bureaucracy, the function of which is to form a passive control system that diffuses blame and defies resistance. Combine those things and automate them, and you have social media AI.

One flaw in the old system was that servants were human, and so the masters would sometime empathize with them, or else servants would empathize with someone in the outside world, and then the organization walls would crumble a little. Downton Abbey and similar television shows mostly dramatize that process of crumbling, because it would be too depressing to watch the inhumanity of such a system when it was working as designed.

My Fan Theory About “The Terminator”

My theory makes more sense than what you hear in the movie.

My theory is that the machines never took over. The machines are in fact completely under control. They are controlled by a society of billionaires, who live in a nice environment, somewhere off camera. This society once relied on lower-status people to run things, but now the AI can do everything. The concentration of power in the hands of the billionaire class became so great that armed conflict broke out. The billionaires defended themselves using the tools at hand, all run by AI.

The billionaires might even feel bad about all that, but you know, war is hell. Also, they don’t actually see what the Terminators are doing, nor do they want to see it. They might well not know what the Terminators are doing or even that they exist. All the rulers did was set up the rules; the machines just enforce the rules.

The humans under attack by the terminators may not realize they are being persecuted by billionaires, and the billionaires might not realize they are the persecuters, but that’s how the system works. (Please note how many Trump supporters are non-billionaires who are currently being victimized by the policies of their friend at the top, and how Trump swears that he is helping them.)

I ask you, what makes more sense: algorithms spontaneously deciding to exterminate all humans? or some humans using AI to buffer themselves from other humans who unfortunately get hurt in the process?

The second thing is happening now.

What does this have to do with testing?

AI is becoming a huge testing issue. Our oracles must include the idea of social responsibility. I hope that Facebook, and the people who want self-driving cars, and the people who create automated systems for recommending who gets loans and who gets long prison sentences, and Google, and all you who are building hackable conveniences, take a deep breath once in a while and consider what is right; not just what is cool.

[UPDATE: Five days later, Facebook gave me access again without explanation. When I returned to my wall, I saw that I had mis-remembered the one thing I had put there. It was not “I hate Facebook” but rather “I don’t trust Facebook.” So it’s even weirder that they would take my account away.

Maybe they verified my identity? They could not have legally verified my identity, since when I appealed the abuse ban, they asked me to submit “ID’s”, but I submitted this PNG instead:

So maybe the algorithm simply detected that I uploaded SOMETHING and let me in?]

My Personal Source Code: Books to Learn Analysis

Occasionally people come to me and say they want to learn certain things. They ask “how do I become a good tester” or “how do I design test cases” or “how do I automate” or something specific like that. These are not really the right questions, though. The better question, which addresses all the other ones, is “how do I become a competent analyst?” Analysis is at the root of all technical work. It’s the master key to nearly everything else. You will almost automatically become a good tester, test case designer, or automater of whatever you choose, IF you master analysis. (Yes, there are other factors of equal precedence, such as humanity, temperance, and detachment. I’m going to focus on analysis, today.)

One simple way to answer the question is to suggest reading books. It’s not enough, but it’s an important step. Now, I own a lot of useful books. I’ve encountered many more. But there are just a few that express the essence of my thought process– the thought process that allows me to analyze difficult problems in complex systems and provide my clients with the help they need. These books have been so important to me that if you know them, too, you will have a good understanding of the “source code” by which I operate; my “secrets.”

These are difficult books in at least two senses: each of them is full of funny words and complicated sentences; but much more importantly, to digest each one is to change the structure of your mind, which is always a painful process. I can’t tell you it will be easy, or even fun. (Some of these books I can only read about 10 pages at a time, before getting too excited to continue.) I am simply saying I make my living as a consultant and expert witness who tackles very complex problems, and I believe it’s substantially down to what I learned from struggling with these books.

Against Method, by Paul Feyerabend
I encountered Feyerabend just after I quit high school. I had already read Ayn Rand and considered myself an Objectivist. Feyerabend cured me of that, more or less. He introduced me to the skeptical study of method; to methodology as a pursuit. I was also drawn to his combative, wild attitude.

Gödel, Escher, Bach: An Eternal Golden Braid, by Douglas R. Hofstadter
I had tried to study logic formally when I was in my teens. I just felt it was a lot of boring symbol manipulation and rule-following. Hofstadter’s book showed me the true essence of logic: exciting symbol manipulation and rule-following! Logic came alive for me through this amazing treatise.

The Hero with a Thousand Faces, by Joseph Campbell
When I joined Apple Computer as a young tester, I joined a philosophy discussion group. There I was introduced to Joseph Campbell’s work on mythology. He applied what I later came to know as “general systems thinking” to theology. What had seemed to me, an atheist, to be boring and silly rituals and statues suddenly became connected with all of humanity and history and with my own life. This was analysis connected directly to the meaning of life (although Campbell hated that phrase). I’m still an atheist, but I appreciate what religion is trying to do.

Introduction to General Systems Thinking, by Gerald M. Weinberg
This was the first book I encountered that actually taught me to do analysis. It taught me to be a tester. It cemented my career choice.

Conjectures and Refutations: The Growth of Scientific Knowledge, by Karl Popper
Read the first 30 pages about what defines science. The rest is optional. Popper was the opposite of Feyerabend. He believed that there was a best method of science. I ignore that. What impressed me about Popper is his convincing attack on Foundationalism. He showed me that science and testing are the same thing in slightly different wrappers. In testing, as in science, you can’t prove that your theory about the facts is correct. You can only try to refute it.

The Sciences of the Artificial, by Herbert Simon
This book is about what a science of design would look like. It provided a sort of road map for me about what my testing methodology had to include and accomplish. It opened my eyes to the central role that heuristic play in analysis.

The Pleasure of Finding Things Out, by Richard Feynman
Feynman’s book is really about attitude and agency. He convinced me never to seek permission to think, and to develop and follow my own code of conduct.

Discussion of the Method, by Billy Vaughan Koen
Billy Koen’s book is the best explanation of heuristics there is. But what he wrote goes beyond that, because he connected heuristics to skeptical philosophy. He showed me that I am not just using heuristics in testing; I am swimming in them; I am made of them. Also, I wrote a fan letter to him and he wrote back! So, there’s that.

Tacit and Explicit Knowledge, by Harry Collins
This is the book I encountered most recently, and it caused Michael Bolton and I to change how we teach. We now realize that much of the skills of the analyst are tacit in nature, and therefore cannot directly be taught. We teach them indirectly, by arranging and examining experiences. Michael Bolton and I made a pilgrimage to Harry’s home in Wales, too. To me, Harry is the sociologist of software testing.

The Next Step In “Test Automation” is Pure Bullshitting

I defy any responsible, sober technical professional to visit this website and discover what the “MABL” tool is and does without reaching out to the company to beg for actual details. It has an introduction video, for instance, that conveys no information whatsoever about the product. Yes, it is teeming with sentences that definitely contain words. But the words representing irresponsible, hyperbolic summarizing that could be applied, equally irresponsibly and hyperbolically to lots of different tools.

My favorite moments in the video:

0:33 “write tests… just like a really smart QA engineer would.” Huh. I would like to see a QA engineer go on the video and say “I’m really smart QA engineer, and MABL does just what I do.” I would like to interview such a person.

0:44 “She uses machine intelligence to…” Yes, the talking man is using the female pronoun to imply that “MABL” has the tacit knowledge of a female human engineer. Isn’t that nice? He speaks with a straight face and an even tone. He must have a lot of respect for this imaginary woman he is marketing. (Note: no human women speak on the video, but there is one in a non-speaking role for about a half-second.)

Ultimately, I am left not knowing what specific functionalities their tool has that they are lying about. Yes, lying. Because their claims cannot possibly be true, and they cannot possibly believe they are true– kind of like one of those infomercials about 18-year-old girls in your area that would love to talk to you. Except in this case, her name is MABL and she wants to test your product.

What is really going on?

Apparently the industry has reached a point where testing services can be sold the same way miracle weight loss programs or anti-aging face creams (with micro-beads!) are sold. This can only happen in an industry that holds testing craftsmanship in utter contempt. The testing industry is like a failed state ruled by roving gangs.

Maybe this MABL tool does something interesting, but it seems they don’t want us to worry our pretty little heads about it. And that is something that should worry us all.

My War: Agency vs. Algorithm

In a recent Twitter conversation, yet again someone who should know better claimed that testing improves quality, and yet again Michael Bolton and I rose to speak against that notion. Our correspondent wants to reduce it to a simple matter of probability:

My answer to this question is to reject it. It rests on certain false premises:

  • That we sufficiently agree about what quality is.
  • That we sufficiently agree about what testing is.
  • That we agree on what is typical.
  • That we agree that discussing human choices in terms of brute causality is helpful in this context.

I’m not saying that questions like this are necessarily bad. I’m sure I have used a similar construction in some other argument. The problem here is that the writer of the Tweet arrived at this simplifying formula after dismissing, as irrelevant, my social and ethical objections to his entire line of reasoning.

I say that testing does not improve quality, first, because it is obviously true that learning facts about code does not change code. Second, and much more importantly, I say it because it is both practical and ethical for testers to respect the agency of the people who control the code. Those people may not fix the bugs I report. It is critical to my discipline as a tester to understand that. Otherwise, I risk losing my credibility and influence. I risk adopting an attitude that desensitizes me to the kinds of problems my clients need me to find.

Agency. It’s dawning on me that all of my projects are connected by this thread.

  • I am opposed to “best practices” because that phrase is just a ploy to avoid responsibility for decisions about how to work.
  • I am opposed to professional certification programs, such as the ISTQB, because it is just a ploy to profit from the fear and ignorance of testers and managers; insidious manipulation via snake oil.
  • I am opposed to “standards” that are crafted by consulting companies to justify expensive, useless services.
  • I don’t use the phrase “test automation” because that encourages thinking about testing as a set of mechanical actions instead of a set of choices, interpretations, and explorations.
  • I insist on distinguishing testing from mere fact checking. Checking can be done by machines that make no choices, but testing requires socially competent judgments and a whole raft of choices which must be made and to which testers must be accountable.
  • I am opposed to self-driving cars not because they are unsafe, but because they aim to change without deliberation or consent the social contract that humans have with each other about the uses of public roads and accountability for what happens on them.
  • Although I am loud and opinionated, I tell my students that my goal is to turn them into colleagues, not followers. People who parrot what I say are no use to me.
  • My son was born at home, because my wife wanted control over the entire birth process and was not satisfied that the hospital would respect that. So, I read a few books on midwifery, hired illegal midwives to help, and made plans in case of emergency. (To my surprise, I found research which showed that home births were safer for low-risk pregnancies.)
  • I could not tolerate school. I was assigned tasks without my consent.
  • I am opposed to forced schooling, because good education is a personal journey of self development. (My son dropped out after 6th grade.)
  • The biggest conflict of my marriage centered around money. I solved it (after a few years to get my pride under control) by giving my wife control of our finances and giving her 70% of the company I started. I got complete creative freedom and she has cheerfully been my assistant for 19 years and counting. Lesson: giving away one kind of power can give you back another kind of power.

Agency is the capacity of an actor to act; the ability to make choices. I have a lot of intellectual attitudes and arguments about testing. But my emotion and motivation about testing comes from my feelings about agency.

I am a tester because I want to set people free. I am a teacher because I want to set people free. I am a husband and father because I want to keep my family free.

Fight the algorithm.

Hiring Me: An Un-Marketing Message

Like a lot of independent consultants, I feel a bit icky about doing direct sales and marketing. I prefer the indirect approach– to speak and write and just wait until people email me with offers of work. But today I had an idea for a different kind of marketing message that I believe would appeal to exactly the sort of people who should be hiring me: a warning message.

Let me begin with a simple positive statement, and then I’ll show you what I mean.

Why You Might Want to Hire Me and For What Purposes

The main thing I do for money is to teach and coach software testers. By testers, I mean anyone whose responsibilities include software testing, during the moments when they are engaged in that task. I generally teach my three-day Rapid Software Testing class, which focuses on test design and analysis and has a number of short hands-on exercises. I also have Rapid Software Testing Applied, which has less material but much longer practical exercises, and Rapid Software Testing for Managers, which is oriented to leadership.

What I sometimes do for money is high impact test strategy consulting, which sometimes involves actual testing. I’ve consulted on testing financial systems and medical devices, for instance, in projects that lasted months. Usually I come in and help for a week or two, then leave and come back again later, providing ongoing support for full-time staff.

My favorite work– when I can get it– is high stress, high stakes, expert witness gigs. That means consulting and testifying on court cases. These projects may involve testing, but mostly a ton of reading (>50,000 pages of tech manuals on one project), analysis, synthesizing narratives, visualizing data, and persuasively writing. They are rare and wonderful projects. I’m suited for them because I have a passion for complexity, argument and evidence, and I love the feeling I get from defending truth.

Why you might want to hire me is that you are worried that you are wasting time and effort in testing, that your testers are bored or shallow in their work, or that too many important bugs are escaping into the field and causing you and your customers distress. I approach testing analytically, socially, and technologically. Or maybe you are in a law suit involving testing, quality, or patent infringement (which often requires testing to determine if a product infringes) and you don’t want to lose the case.

That’s the positive statement. Now for the warning.

Beware of Hiring Me: Tigers Make Difficult Pets

In a big consulting company, management wants workers to be versatile, docile, and inexpensive. A worker like that can be plugged into any situation. In a group of versatile workers, anyone is pretty easily replaced, a fact which serves to encourage everyone to continue to be docile and inexpensive (“you need us more than we need you”). Versatility comes at the cost of depth of skill and experience, however. With increasing expertise, you increase problem-solving power while reducing versatility.

(What I mean by “reducing versatility” is that I, for instance, can do a zillion things. I can organize coffee for you. I can plan parties. I can clean your kitchens. I can be a project manager. I can design your website. I have many skills. But practically speaking, my special expertise in testing and teaching means that my clients will not pay me to do anything other than my specialty, because I need to work for the highest reasonable pay that I can get, and I don’t get that sort of pay when I’m doing a zillion things that anyone else could do just as easily as me. So, I’m not dissing versatility. I’m just speaking of cold economic reality.)

Expertise sounds like a really good thing. But the problem is tigers.

The “tiger cub” problem is that when tigers are cute and small, they might seem to be an appealing choice as a pet– but young tigers don’t stay small forever. They grow up and become powerful, inconvenient, dangerous creatures. The same is true of experts. This is why big consulting companies generally don’t hire experts and try not to encourage experts to grow as such. The last thing a company like Mindtree, or Cognizant, or Infosys wants is employees who might refuse to work on a project because the project is demanding that they do bad work.

I wrote about this struggle in my own career, here. And here is an article about Infosys experimenting with experts. I don’t know what the outcome was for that experiment. Maybe Infosys still has this team, but if they do, they must maintain an expensive habitat for those tigers. And the tigers may be thinking “why don’t we go independent and keep all the profits? The company needs us more than we need them!”

I am a difficult pet. I want to please my clients, of course. But I have a reputation to maintain. Ask around. As far as I know there is no one who claims to have seen me do work that I knew at the time was bad work. I don’t believe there is anyone who can point to any bad work that I have done. (Except possibly the revised IEEE 829 test documentation standard, which has my name on it but which I protested and repudiate.)  In any case, I go to pains to get it right, but my concern goes way beyond customer satisfaction.

Remember the difference between a drug dealer and a doctor. Customer satisfaction is of paramount importance to a drug dealer. A doctor has other priorities.

Beware of Hiring Me: I Maintain My Own Intellectual Property

I don’t have any trouble signing NDAs. I don’t want or need to share your unreleased product details or schedules with anyone else. The stories I tell that originate with specific clients contain no details that would distinguish them or harm them, unless I am specifically authorized to share such details.

Intellectual property is another matter. There comes a day for many experts when we realize that we are selling our intellectual capital at a huge discount. We are giving our employers innovations that may make them huge profits. Is enough of that profit coming back to us?

As an independent, I maintain my own intellectual property. Therefore, I cannot sign a contract that lets my clients take exclusive ownership of any of my ideas except in rare and special circumstances. I generally offer a non-exclusive license, instead. For this reason alone it may be hard for companies to hire me to do ordinary technical work, as opposed to teaching.

Beware of Hiring Me: My Bias is Toward Deep Quality, Which is Not Always Needed

It is a plain rational truth that many things in life don’t need to be very good in order to be good enough. I agree with that truth, and I even teach it as part of the risk-base testing curriculum that anchors the Rapid Software Testing methodology.

But… I love the processes of testing. I especially love deep testing, by which I mean testing that maximizes the chance of uncovering every important bug. This means that I am at a constant low-grade risk of putting more time and effort into testing something than is justified by the business context. I can get carried away by the pleasures of craftsmanship. This is why, when I’m actually doing testing, I prefer to work with someone like my brother Jon (a virtuoso of technical administration, currently at eBay) who keeps his eye on the big picture so that people like me happily wrangle the details without constantly wondering “is this even worth doing?”

I come to every project thinking “probably there should be better, deeper testing, here.” I think this is a reasonable first position. But I teach my clients to have skepticism to counter-balance this bias. I believe that this creates a nice creative tension, but it must be managed. We need to keep talking about it.

I manage my over-kill tendencies as an independent consultant partly by making a simple declaration to my clients: if I’m working by the hour, and I submit an invoice, and it includes work that I did that you don’t like, then just don’t pay the invoice. This gives me more freedom to work speculatively without forcing my clients to be concerned that I will spend 20 hours gold-plating a two hour task.

Beware of Hiring Me: I Learn By Arguing

My favorite method of learning is testing. And when I’m learning about what’s in your mind, testing takes the form of debate. If you want me to understand and trust you, then I need to argue with you. This is negotiable of course– but that negotiation ALSO takes the form of an argument that we will need to have.

I have a hard time trusting people who seem to trust me, unless I know I have earned their trust. Otherwise, I fear that their pleasant manner is only a temporary illusion, soon to be shattered in some dramatic way. I have a conviction that good working relationships must be earned through shared trials and tribulations, not through passive hope and casual politeness.

When I go through a difficult conversation and come out the other side with a resolution and with the sense that the other people in the debate have gained emotional stability and power (even if we may have temporarily lost it during the messy part of the process) that increases my sense of loyalty to my clients and I can better manage future stresses.

Getting older has changed this, too. I’ve been through so many relationship-building and losing events that the process is not quite as dramatic for me as it used to be, and I more easily move into a mode of protecting and supporting the needs of others. Still, I won’t deceive you, there’s going to be drama. You have to expect that from a tiger.

 

Six Things That Go Wrong With Discussions About Testing

Talking about software testing is not easy. It’s not natural! Testing is a “meta” activity. It’s not just a task, but a task that generates new tasks (by finding bugs that should be fixed or finding new risks that must be examined). It’s a task that can never be “completed” yet must get “done.”

Confusion about testing leads to ineffective conversations that focus on unimportant issues while ignoring the things that matter. Here are some specific ways that testing conversations fail:

  1. When people care about how many test cases they have instead of what their testing actually does. The number of test cases (e.g. 500, 257, 39345) tells nothing to anyone about “how much testing” you are doing. The reason that developers don’t brag about how many files they created today while developing their product is that everyone knows that it’s silly to count files, or keystrokes, or anything like that. For the same reasons, it is silly to count test cases. The same test activity can be represented as one test case or one million test cases. What if a tester writes software that automatically creates 100,000 variations of a single test case? Is that really “100,000” test cases, or is it one big test case, or is it no test case at all? The next time someone gives you a test case count, practice saying to yourself “that tells me nothing at all.” Then ask a question about what the tests actually do: What do they cover? What bugs can they detect? What risks are they motivated by?
  2. When people speak of a test as an object rather than an event. A test is not a physical object, although physical things such as documentation, data, and code can be a part of tests. A test is a performance; an activity; it’s something that you do. By speaking of a test as an object rather than a performance, you skip right over the most important part of a test: the attention, motivation, integrity, and skill of the tester. No two different testers ever perform the “same test” in the “same way” in all the ways that matter. Technically, you can’t take a test case and give it to someone else without changing the resulting test in some way (just as no quarterback or baseball player will execute the same play in the same way twice) although the changes don’t necessarily matter.
  3. When people can’t describe their test strategy as it evolves. Test strategy is the set of ideas that guide your choices about what tests to design and what tests to perform in any given situation. Test strategy could also be called the reasoning behind the actions that comprise each test. Test strategy is the answer to questions such as “why are these tests worth doing?” “why not do different tests instead?” “what could we change if we wanted to test more deeply?” “what would we change if we wanted to test more quickly?” “why are we doing testing this way?” These questions arise not just after the testing, but right at the start of the process. The ability to design and discuss test strategy is a hallmark of professional testing. Otherwise, testing would just be a matter of habit and intuition.
  4. When people talk as if automation does testing instead of humans. If developers spoke of development the way that so many people speak of testing, they would say that their compiler created their product, and that all they do is operate the compiler. They would say that the product was created “automatically” rather than by particular people who worked hard and smart to write the code. And management would become obsessed with “automating development” by getting ever better tools instead of hiring and training excellent developers. A better way to speak about testing is the same way we speak about development: it’s something that people do, not tools. Tools help, but tools do not do testing.There is no such thing as an automated test. The most a tool can do is operate a product according to a script and check for specific output according to a script. That would not be a test, but rather a fact check about the product. Tools can do fact checking very well. But testing is more than fact checking because testers must use technical judgment and ingenuity to create the checks and evaluate them and maintain and improve them. The name for that entire human process (supported by tools) is testing. When you focus on “automated tests” you usually defocus from the skills, judgment, problem-solving, and motivation that actually controls the quality of the testing. And then you are not dealing with the important factors that control the quality of testing.
  5. When people talk as if there is only one kind of test coverage. There are many ways you can cover the product when you test it. Each method of assessing coverage is different and has its own dynamics. No one way of talking about it (e.g. code coverage) gives you enough of the story. Just as one example, if you test a page that provides search results for a query, you have covered the functionality represented by the kind of query that you just did (function coverage), and you have covered it with the particular data set of items that existed at that time (data coverage). If you change the query to invoke a different kind of search, you will get new functional coverage. If you change the data set, you will get new data coverage. Either way, you may find a new bug with that new coverage. Functions interact with data; therefore good testing involves covering not just one or the other but also with both together in different combinations.
  6. When people talk as if testing is a static task that is easily formalized. Testing is a learning task; it is fundamentally about learning. If you tell me you are testing, but not learning anything, I say you are not testing at all. And the nature of any true learning is that you can’t know what you will discover next– it is an exploratory enterprise.It’s the same way with many things we do in life, from driving a car to managing a company. There are indeed things that we can predict will happen and patterns we might use to organize our actions, but none of that means you can sleepwalk through it by putting your head down and following a script. To test is to continually question what you are doing and seeing.

    The process of professional testing is not design test cases and then follow the test cases. No responsible tester works this way. Responsible testing is a constant process of investigation and experiment design. This may involve designing procedures and automation that systematically collects data about the product, but all of that must be done with the understanding that we respond to the situation in front of us as it unfolds. We deviate frequently from procedures we establish because software is complicated and surprising; and because the organization has shifting needs; and because we learn of better ways to test as we go.

Through these and other failures in testing conversations, people persist in the belief that good testing is just a matter of writing ever more “test cases” (regardless of what they do); automating them (regardless of what automation can’t do); passing them from one untrained tester to another; all the while fetishizing the files and scripts themselves instead of looking at what the testers are doing with them from day to day.

Regression Test Tool for Trash Walking

My recent flirtation with trash-pickup-as-physical-exercise has led me down a familiar path. Even though it is not my responsibility to clean a public road in the first place, once I do it, I find that I feel irrational ownership of it. I want it to stay clean. But since I’ve adopted about 9 miles of road so far, it takes too long to walk the whole route in a day (remember I have to make one pass for each side of the road, or else I am going to miss a lot of trash). Regression trash walking takes too much effort!

I want automation!

I can travel faster in a car, but there are few places I can safely stop the car. I was thinking maybe I should get a motor-scooter instead; a Vespa or something. But that defeats the primary purpose of my trash walking– which is supposed to be exercise. So, now I’m thinking about maybe a bike will be the ticket. I could combine this with the Steel Grip grabber tool to quickly nab the trash and get back on the road.

Just as with software testing, a big problem with introducing tools to a human process is that it can change the process to make it less sensitive (or far too sensitive). In this case, any vehicle that moves fast will cause me to miss some trash. On the other hand, I will still catch a lot of the trash. It’s probably a good enough solution.

On the whole I think it is a good idea to use a bicycle. The remaining problem is that my wife is terrified I will be hit by a car.

Test Coverage Parallels in Trash Walking

First, about scope…

As I began my trash walking (see here and here), I quickly found that I needed guidelines on what counts as my work space and work product. I am collecting trash along the road, so what does that entail? Here is what I came up with:

  • I began with a broad operational definition of trash: “any loose, inanimate object of low value that may disturb the tranquility of the touring experience.” This applies obviously to the tranquility of pedestrians, cyclists, or motorists, and possibly others.
  • I ignore anything that seems especially toxic or a bio-hazard. Thus no dog poop or road kill (because I am not equipped for that).
  • I ignore things that seem to be serving a purpose by being there.
  • I ignore things that are too large for my trash bag.
  • I ignore things that are too small to pick up.
  • I ignore things that require substantial digging to free from the ground.
  • I ignore groups of things that are too numerous (e.g. one thousand toothpicks)

In testing terms, we call all this my oracle (alternatively, you can say that each list item is an oracle, it makes no difference since we never count oracles, we just use them). An oracle is a means to recognize a problem when you encounter it. Oracles define what is and is not your business as a tester in terms of what you are looking for. Notice that I have described my oracles only in a high level sense. The truth is I have a lot more oracles that I don’t know how to describe. For instance I know how to recognize a broken plastic container, and distinguish that from a sea shell, even though I don’t know how to describe that knowledge. Written oracles are almost always approximations or summaries of the real oracles that a tester uses.

Sometimes the oracle is challenging. Examples:

  • I once found two flipchart markers on the ground next to a driveway and an upright stick. I left them there thinking that maybe someone was putting up a sign. When I returned the next day they were still there, so I decided they must be trash.
  • I saw a child’s pair of prescription glasses on the beach. I left them in case the owners returned but they also were still there the next day. Conclusion: trash!
  • I saw two sneakers and socks on the beach, far from any person. I kept my eye on them, and eventually someone collected them. Close one. I really wanted to put those in my trash bag.
  • I found an envelope taped to a park bench that said “Blue Clue #6.” I left it alone in case it was for some kind of puzzle game that hadn’t yet been played. If it’s still there tomorrow, I’ll get a clue.

Scope is part of mission. My scope is the problem set that belongs to me as opposed to someone else. The totality of my oracles are one aspect of scope, because they dictate what counts as a problem. Another thing that defines scope is what things I am supposed to be looking at. In this case, what is my work surface? What is the place I am searching? I determined this to be:

  • the road itself
  • the shoulder and the ditch (one side of the road on each pass)
  • potentially anywhere visible to a tourist from the road
  • potentially any property which my wife frequents
  • NOT anywhere that is too difficult to access, where difficulty is a subjective assessment related to energy (“that’s too far away”), injury risk (“I’m not climbing down that bank”), and social transgression (“there is a no trespassing sign on that tree”)

Finally I decide on a route. I determined that according to the travel patterns of my primary client: my wife, Lenore. You could say I looked at her “use cases” of road use. Apart from exercise, her respect and pleasure is the big reason I’m doing all this. I want her not to see trash anywhere on the island. (Interestingly, I was unconscious of that motivation until I had already done more than 30 miles of trash-walking.)

My scope is therefore anywhere my wife is likely to see from her car or on foot on Orcas Island. My mission is to remove trash from that area.

My coverage, on the other hand, is what I actually look at. Here is a map of my coverage (data collected with Gaia GPS on an iPhone, then exported to Google Earth):

Let’s zoom in and note some parallels with software testing.

1. My coverage analysis tool is not as accurate as I would wish.

According to this I was wiggling all over the road. But I promise I wasn’t. There are several meters of random inaccuracy in the GPS data.

Similarly, in testing, I rarely get the fantastic logging that let’s me say exactly what was and was not tested. Remember also that even if the coverage map were perfectly accurate, I would still not be able to tell whether the tester was paying attention during that testing. The power of the oracles vary depending on the focus of the tester, unless the oracles are automated. And many vital oracles cannot be automated.

2. Sometimes your client asks for specific coverage.

Lenore asked me to clean the beach, since she often walks there. She and I covered this together as a pair. The beach was too wide to do all at once, so we did a pass on the high part and then a pass on the low part. Lenore was a bit obsessive about what counted as trash, so we picked up literally anything that was visible to the naked eye and seemed like trash. This included plastic particles the size of a penny.

This is similar to risk-based testing. You focus on areas more intensively if they are more critical to your client– defined as a person whose opinion of quality matters.

3. Sometimes you test where it’s easy, not where the bugs are.

This is a private property where my wife likes to walk. When I walk with her, I carry a trash bag. We did find a little trash but only a little, because the owners are pretty clean.

4. Sometimes you decide on your own that deeper coverage is needed.

This is a little public park. I couldn’t walk by when I saw the trash there, even though my wife never goes there.

5. Sometimes you get clues from users.

A fellow in a car pulled into the substation and said “hey! you missed something over here!” That was helpful. I think most people look at me and assume I am their tax dollars at work. I like that. Life is better when people appreciate their government.

6. Sometimes your coverage decisions reflect vanity rather than business sense.

That is the parking lot of the medical center where my doctor works. I wanted her to see me picking up trash so that she knows I really am exercising.

And it’s true in testing, too. Sometimes I want to test in a way that is accessible and impressive to outsiders rather than merely reasonable and sensible. Sometimes I need a little appreciation.

Test Talk About Trash Walks

So, for exercise, I’m picking up trash. Here is a picture of me all kitted up:

Perfectly equipped for road trash collection!

So far, I’ve done 37 miles of trash collecting. And I can’t help but see some interesting parallels with software testing…

Just Like Testing #1: I can use tools, but I cannot automate my work.

I have to make a lot of judgments about what to pick up and what to leave. It would be difficult to even to write a detailed and complete specification for what constitutes trash and what does not, let alone design a machine to pick it up. Yes, there are semi-automated street sweeping machines, and they do great things– but they are also expensive, loud, and disruptive. They also work only on flat paved surfaces, as far as I know, whereas I am cleaning along country roads and fishing garbage out of ditches.

Just Like Testing #2: I crave trouble. If the product is too clean I feel depressed.

I smile when I see a nice juicy old beer bottle. That is paydirt, baby. Aluminum cans and brightly colored drinking cups are almost as sweet. Apart from anything else, they weigh down my trash bag so that it doesn’t flap in the wind, but mainly it is from these undeniably pieces of unsightly rubbish that give me a charge.

On the other hand, when I don’t see trash, I feel like I haven’t done anything. I know that I have: my eyes have searched for trash and that’s a service. But finding trash gives me something to show for my work. I can drop the bag in front of my wife and say “seeee? I’m useful!”

Just Like Testing #3: Trouble that is most likely to upset normal people makes me most happy.

Brightly colored candy wrappers are terrible to see on a country road surrounded by nature, but that same bright color makes it easy for me to spot. So, I hope candy and soda companies don’t start marketing their wares in camouflaged containers. Similarly in testing, when we see a dramatic crash or data loss in testing, we testers give each other high-fives and yessses and “you are a steely eyed missile man”-type comments. It takes extraordinary restraint not to do that right in front of the developer whose product has just failed.

Just Like Testing #4: Gratuitous trouble makes me tired and depressed.

I have sometimes come across caches of garbage, as if someone just hurled a kitchen trash bag off the back of a truck. This is not fun. I don’t mind the ordinary careless litter, to some degree, but industrial scale contempt for the environment make me feel disgust instead of fun.

Most of the trash I find is rather innocent. It falls into these categories:

  • Food wrappers: things a kid might throw out a car window.
  • Brick-a-brack: things that might fall out of the back of a contractor’s pickup truck.
  • Featherweight trash: things that accidentally blow out the window of a car
  • Cycling debris: things that a cyclist might drop accidentally; occasional items of clothing
  • Auto debris: pieces of cars
  • Transported trash: things blown onto the road from adjoining property

But when an item or items of garbage seem diabolical or contemptuous, or systematically careless, I do get a little angry. This is similar to the feeling a tester gets when the developer won’t even do the most basic of testing before throwing it over the wall for a test cycle.

Lots of nice brightly colored things in there, but also some weeds that accidentally got caught up with the gripper… Just goes to show that tools aren’t perfect.

Just Like Testing #5: I became hyper-sensitive to regressions.

Today I drove into town and saw at least four pieces of trash along the way that had not been there yesterday. I am annoyed. This was a perfectly good road when I last cleaned it and now it’s all messed up again. Now, I know, objectively, it’s not “all messed up.” It is still far cleaner than it was when I started working it. But all I can think about is that new trash! Who did it? BURN THEM!

Testers also tend to get oversensitive and find it hard to accept that quality can be good enough when we know that there are unfixed bugs in the product. I guess anything you invest yourself in becomes sharper and larger and more important in that way.

Just Like Testing #6: I overlook some trash no matter how hard I try to look for it.

My wife helped me clean the local beach. We went single file, so she caught some of the things that I missed. There were a lot of them. Some trash I didn’t see was pretty big. My inner experience is “How did I miss that???!?!?!” But I know how I missed it: inattentional blindness.

Inattentional blindness is when you don’t see something that is in your field of view because your attention is on something else. This can have the effect of feeling as if an object literally appeared out of thin air when it was right in front of you all the time. I once covered an area, then turned and looked behind me, and saw a medium sized plastic bag just a few feet behind me. I had walked right over it without seeing it. It’s frustrating, but it’s a fact of life I must accept.

This is why pair testing, group testing, or making multiple passes through the same product helps so much. I always want redundancy. Along the main roads, I want to make at least two passes on each side before I move on to another road.

When I am “regression walking,” I might expect to find only newly dropped trash since my last walk. Instead, just like in testing life, I often find old trash that has been there all along but never before noticed.

[Added August 7th, 2017]

Just Like Testing #7: My quality standards are not fixed or absolute; they vary and they are relative.

I notice that when I am cleaning a very cluttered area, I tend to ignore very small pieces of trash, or trash that is hard to access. But when I covering a clean area, I raise my standards and pick up even tiny pieces (smaller than a bottle cap), as if I am “hungry” for trash.

Similarly, I might pick up a small piece because it is next to a large piece, since I am already “in the neighborhood.” Also, if a large piece has been shattered into small pieces, like a broken beer bottle, I will pick up even tiny pieces of the bottle in order to get the “whole bottle.”

All this is evidence that I do not judge trash on an absolute scale, but rather judge it differently according to a variety of factors, including what’s nearby, what I’ve recently seen, my fatigue, my self-judgment, etc. It’s the same with bugs. I want to find something, but I also have limited energy. And this is why it is good for me to take multiple passes through an area. It helps me to square my selection heuristics with my general and absolute sense of my mission and proper quality standard.

The Unnecessary Tool

My wife bought a Steel Grip 36in Lightweight Aluminum Pick Up Tool.

I saw it on our combination dining room/craft/office table and asked her what it was for.

“My eye pillow fell behind the bed and I can’t reach it.” she told me. (This led to some confusion for me at first because I thought she was referring to an iPillow, presumably an Apple product I had never heard of.)

“I can easily get that for you.” I eventually replied while reaching behind the bed and retrieving her iPillow.

That seemed to end the conversation. But I was still surprised that she bought an entire new gadget to accomplish something that is pretty easy to solve with ordinary human effort– such as asking her husband. I couldn’t resist teasing her about it as I discovered that the squeaky gripper was also a good tool for annoying my dogs. Lenore is usually the epitome of sensible practicality. She’s usually the one restraining me from buying unnecessary things. So, it felt good to see her have a little lapse, for once.

In testing, I see a lot of that: introducing tools that aren’t needed and mostly just clutter up the place. All over the industry, technocrats seem to turn to tools at the slightest excuse. Tools will save us! More tools. Never mind the maintenance costs. Never mind what we lose by distancing ourselves from our problems. Automation!

(Please don’t bother commenting about your useful tool kit. I’m not talking about useful tools, here. I’m talking about a tool that was purchased specifically to solve a problem that was already easily solved without it. I am talking about an unnecessary tool.)

So then what happens…?

A few weeks later, I am getting bored with my walks. Well, let me back up: I am at the age where physical fitness is no longer about looking sharp, or even feeling good. It’s becoming a matter of do I want to keep living or what? The answer is yes I want to live, Clarence. That means I must exercise. This year I have been walking intensively.

But it’s boring. I can’t get anything done when I’m walking. I don’t like listening to music, and anyway I feel uncomfortable being cut off from the sounds of my surroundings. Therefore, I trudge along: bored.

One day I realized I can have more fun walking if I picked up garbage along my way. That way I would be making the world better as I walked. At first I carried a little trash sack at my waist, but my ambitions soon grew, and within days I decided it was time to walk the main road into town with a 50-gallon industrial trash bag and a high viz vest.

As I was leaving on my first mission, Lenore handed me the gripper.

It was the perfect tool.

It was exactly what I needed.

It would save my back and knees.

My gripper gets a lot of use, now. I’m wondering if I need to upgrade to a titanium and carbon fiber version. I’m thinking of crafting a holster for it.

Is There a Moral Here? Yes.

One of the paradoxes of Context-Driven testing is that on the one hand, you must use the right solution for the situation; while, on the other hand, you can only know what the right solution can be if you have already learned about it, and therefore used it, BEFORE you needed it. In other words, to be good problem solvers, we also need to dabble with and be curious about potential solutions even in the absence of a problem.

The gripper spent a few weeks lying around our home until suddenly it became my indispensable friend.

I guess what that means is that it’s good to have some tolerance and playfulness about experimenting with tools. Even useless ones.

 

Floating Point Quality: Less Floaty, More Pointed

Years ago I sat next to the Numerics Test Team at Apple Computer. I teased them one day about how they had it easy: no user interface to worry about; a stateless world; perfectly predictable outcomes. The test lead just heaved a sigh and launched into a rant about how numerics testing is actually rather complicated and brimming with unexpected ambiguities. Apparently, there are many ways to interpret the IEEE floating point standard and learned people are not in agreement about how to do it. Implementing floating point arithmetic on a digital platform is a matter of tradeoffs between accuracy and performance. And don’t get them started about HP… apparently HP calculators had certain calculation bugs that the scientific community had grown used to. So the Apple guys had to duplicate the bugs in order to be considered “correct.”

Among the reasons why floating point is a problem for digital systems is that digital arithmetic is discrete and finite, whereas real numbers often are not. As my colleague Alan Jorgensen says “This problem arises because computers do not represent some real numbers accurately. Just as we need a special notation to record one divided by three as a decimal fraction: 0.33333…., computers do not accurately represent one divided by ten. This has caused serious financial problems and, in at least one documented instance, death.”

Anyway, Alan just patented a process that addresses this problem “by computing two limits (bounds) containing the represented real number that are carried through successive calculations.  When the result is no longer sufficiently accurate the result is so marked, as are further calculations using that value.  It is fail-safe and performs in real time.  It can operate in conjunction with existing hardware and software.  Conversion between existing standardized floating point and this new bounded floating point format are simple operations.”

If you are working with systems that must do extremely accurate and safe floating point calculations, you might want to check out the patent.

Rethinking Equivalence Class Partitioning, Part 1

Wikipedia’s article on equivalence class partitioning (ECP) is a great example of the poor thinking and teaching and writing that often passes for wisdom in the testing field. It’s narrow and misleading, serving to imply that testing is some little game we play with our software, rather than an open investigation of a complex phenomenon.

(No, I’m not going to edit that article. I don’t find it fun or rewarding to offer my expertise in return for arguments with anonymous amateurs. Wikipedia is important because it serves as a nearly universal reference point when criticizing popular knowledge, but just like popular knowledge itself, it is not fixable. The populus will always prevail, and the populus is not very thoughtful.)

In this article I will comment on the Wikipedia post. In a subsequent post I will describe ECP my way, and you can decide for yourself if that is better than Wikipedia.

“Equivalence partitioning or equivalence class partitioning (ECP)[1] is a software testing technique that divides the input data of a software unit into partitions of equivalent data from which test cases can be derived.”

Not exactly. There’s no reason why ECP should be limited to “input data” as such. The ECP thought process may be applied to output, or even versions of products, test environments, or test cases themselves. ECP applies to anything you might be considering to do that involves any variations that may influence the outcome of a test.

Yes, ECP is a technique, but a better word for it is “heuristic.” A heuristic is a fallible method of solving a problem. ECP is extremely fallible, and yet useful.

“In principle, test cases are designed to cover each partition at least once. This technique tries to define test cases that uncover classes of errors, thereby reducing the total number of test cases that must be developed.”

This text is pretty good. Note the phrase “In principle” and the use of the word “tries.” These are softening words, which are important because ECP is a heuristic, not an algorithm.

Speaking in terms of “test cases that must be developed,” however, is a misleading way to discuss testing. Testing is not about creating test cases. It is for damn sure not about the number of test cases you create. Testing is about performing experiments. And the totality of experimentation goes far beyond such questions as “what test case should I develop next?” The text should instead say “reducing test effort.”

“An advantage of this approach is reduction in the time required for testing a software due to lesser number of test cases.”

Sorry, no. The advantage of ECP is not in reducing the number of test cases. Nor is it even about reducing test effort, as such (even though it is true that ECP is “trying” to reduce test effort). ECP is just a way to systematically guess where the bigger bugs probably are, which helps you focus your efforts. ECP is a prioritization technique. It also helps you explain and defend those choices. Better prioritization does not, by itself, allow you to test with less effort, but we do want to stumble into the big bugs sooner rather than later. And we want to stumble into them with more purpose and less stumbling. And if we do that well, we will feel comfortable spending less effort on the testing. Reducing effort is really a side effect of ECP.

“Equivalence partitioning is typically applied to the inputs of a tested component, but may be applied to the outputs in rare cases. The equivalence partitions are usually derived from the requirements specification for input attributes that influence the processing of the test object.”

Typically? Usually? Has this writer done any sort of research that would substantiate that? No.

ECP is a process that we all do informally, not only in testing but in our daily lives. When you push open a door, do you consciously decide to push on a specific square centimeter of the metal push plate? No, you don’t. You know that for most doors it doesn’t matter where you push. All pushable places are more or less equivalent. That is ECP! We apply ECP to anything that we interact with.

Yes, we apply it to output. And yes, we can think of equivalence classes based on specifications, but we also think of them based on all other learning we do about the software. We perform ECP based on all that we know. If what we know is wrong (for instance if there are unexpected bugs) then our equivalence classes will also be wrong. But that’s okay, if you understand that ECP is a heuristic and not a golden ticket to perfect testing.

“The fundamental concept of ECP comes from equivalence class which in turn comes from equivalence relation. A software system is in effect a computable function implemented as an algorithm in some implementation programming language. Given an input test vector some instructions of that algorithm get covered, ( see code coverage for details ) others do not…”

At this point the article becomes Computer Science propaganda. This is why we can’t have nice things in testing: as soon as the CS people get hold of it, they turn it into a little logic game for gifted kids, rather than a pursuit worthy of adults charged with discovering important problems in technology before it’s too late.

The fundamental concept of ECP has nothing to do with computer science or computability. It has to do with logic. Logic predates computers. An equivalence class is simply a set. It is a set of things that share some property. The property of interest in ECP is utility for exploring a particular product risk. In other words, an equivalence class in testing is an assertion that any member of that particular group of things would be more or less equally able to reveal a particular kind of bug if it were employed in a particular kind of test.

If I define a “test condition” as something about a product or its environment that could be examined in a test, then I can define equivalence classes like this: An equivalence class is a set of tests or test conditions that are equivalent with respect to a particular product risk, in a particular context. 

This implies that two inputs which are not equivalent for the purposes of one kind of bug may be equivalent for finding another kind of bug. It also implies that if we model a product incorrectly, we will also be unable to know the true equivalence classes. Actually, considering that bugs come in all shapes and sizes, to have the perfectly correct set of equivalence classes would be the same as knowing, without having tested, where all the bugs in the product are. This is because ECP is based on guessing what kind of bugs are in the product.

If you read the technical stuff about Computer Science in the Wikipedia article, you will see that the author has decided that two inputs which cover the same code are therefore equivalent for bug finding purposes. But this is not remotely true! This is a fantasy propagated by people who I suspect have never tested anything that mattered. Off the top of my head, code-coverage-as-gold-standard ignores performance bugs, requirements bugs, usability bugs, data type bugs, security bugs, and integration bugs. Imagine two tests that cover the same code, and both involve input that is displayed on the screen, except that one includes an input which is so long that when it prints it goes off the edge of the screen. This is a bug that the short input didn’t find, even though both inputs are “valid” and “do the same thing” functionally.

The Fundamental Problem With Most Testing Advice Is…

The problem with most testing advice is that it is either uncritical folklore that falls apart as soon as you examine it, or else it is misplaced formalism that doesn’t apply to realistic open-ended problems. Testing advice is better when it is grounded in a general systems perspective as well as a social science perspective. Both of these perspectives understand and use heuristics. ECP is a powerful, ubiquitous, and rather simple heuristic, whose utility comes from and is limited by your mental model of the product. In my next post, I will walk through an example of how I use it in real life.

Accountability for What You Say is Dangerous and That’s Okay

[Note: I offered Maaret Pyhäjärvi the right to review this post and suggest edits to it before I published it. She declined.]

A few days ago I was keynoting at the New Testing Conference, in New York City, and I used a slide that has offended some people on Twitter. This blog post is intended to explore that and hopefully improve the chances that if you think I’m a bad guy, you are thinking that for the right reasons and not making a mistake. It’s never fun for me to be a part of something that brings pain to other people. I believe my actions were correct, yet still I am sorry that I caused Maaret hurt, and I will try to think of ways to confer better in the future.

Here’s the theme of this post: Getting up in front of the world to speak your mind is a dangerous process. You will be misunderstood, and that will feel icky. Whether or not you think of yourself as a leader, speaking at a conference IS an act of leadership, and leadership carries certain responsibilities.

I long ago learned to let go of the outcome when I speak in public. I throw the ideas out there, and I do that as an American Aging Overweight Left-Handed Atheist Married Father-And-Father-Figure Rough-Mannered Bearded Male Combative Aggressive Assertive High School Dropout Self-Confident Freedom-Loving Sometimes-Unpleasant-To-People-On-Twitter Intellectual. I know that my ideas will not be considered in a neutral context, but rather in the context of how people feel about all that. I accept that.  But, I have been popular and successful as a speaker in the testing world, so maybe, despite all the difficulties, enough of my message and intent gets through, overall.

What I can’t let go of is my responsibility to my audience and the community at large to speak the truth and to do so in a compassionate and reasonable way. Regardless of what anyone else does with our words, I believe we speakers need to think about how our actions help or harm others. I think a lot about this.

Let me clarify. I’m not saying it’s wrong to upset people or to have disagreement. We have several different culture wars (my reviewers said “do you have to say wars?”) going on in the software development and testing worlds right now, and they must continue or be resolved organically in the marketplace of ideas. What I’m saying is that anyone who speaks out publicly must try to be cognizant of what words do and accept the right of others to react.

Although I’m surprised and certainly annoyed by the dark interpretations some people are making of what I did, the burden of such feelings is what I took on when I first put myself forward as a public scold about testing and software engineering, a quarter century ago. My annoyance about being darkly interpreted is not your problem. Your problem, assuming you are reading this and are interested in the state of the testing craft, is to feel what you feel and think what you think, then react as best fits your conscience. Then I listen and try to debug the situation, including helping you debug yourself while I debug myself. This process drives the evolution of our communities. Jay Philips, Ash Coleman, Mike Talks, Ilari Henrik Aegerter, Keith Klain, Anna Royzman, Anne-Marie Charrett, David Greenlees, Aaron Hodder, Michael Bolton, and my own wife all approached me with reactions that helped me write this post. Some others approached me with reactions that weren’t as helpful, and that’s okay, too.

Leadership and The Right of Responding to Leaders

In my code of conduct, I don’t get to say “I’m not a leader.” I can say no one works for me and no one has elected me, but there is more to leadership than that. People with strong voices and ideas gain a certain amount of influence simply by virtue of being interesting. I made myself interesting, and some people want to hear what I have to say. But that comes with an implied condition that I behave reasonably. The community, over time negotiates what “reasonable” means. I am both a participant and a subject of those negotiations. I recommend that we hold each other accountable for our public, professional words. I accept accountability for mine. I insist that this is true for everyone else. Please join me in that insistence.

People who speak at conferences are tacitly asserting that they are thought leaders– that they deserve to influence the community. If that influence comes with a rule that “you can’t talk about me without my permission” it would have a chilling effect on progress. You can keep to yourself, of course; but if you exercise your power of speech in a public forum you cannot cry foul when someone responds to you. Please join me in my affirmation that we all have the right of response when a speaker takes the microphone to keynote at a conference.

Some people have pointed out that it’s not okay to talk back to performers in a comedy show or Broadway play. Okay. So is that what a conference is to you? I guess I believe that conferences should not be for show. Conferences are places for conferring. However, I can accept that some parts of a conference might be run like infomercials or circus acts. There could be a place for that.

The Slide

Here is the slide I used the other day:

maaret

Before I explain this slide, try to think what it might mean. What might its purposes be? That’s going to be difficult, without more information about the conference and the talks that happened there. Here are some things I imagine may be going through your mind:

  • There is someone whose name is Maaret who James thinks he’s different from.
  • He doesn’t trust nice people. Nice people are false. Is Maaret nice and therefore he doesn’t trust her, or does Maaret trust nice people and therefore James worries that she’s putting herself at risk?
  • Is James saying that niceness is always false? That’s seems wrong. I have been nice to people whom I genuinely adore.
  • Is he saying that it is sometimes false? I have smiled and shook hands with people I don’t respect, so, yes, niceness can be false. But not necessarily. Why didn’t he put qualifying language there?
  • He likes debate and he thinks that Maaret doesn’t? Maybe she just doesn’t like bad debate. Did she actually say she doesn’t like debate?
  • What if I don’t like debate, does that mean I’m not part of this community?
  • He thinks excellence requires attention and energy and she doesn’t?
  • Why is James picking on Maaret?

Look, if all I saw was this slide, I might be upset, too. So, whatever your impression is, I will explain the slide.

Like I said I was speaking at a conference in NYC. Also keynoting was Maaret Pyhäjärvi. We were both speaking about the testing role. I have some strong disagreements with Maaret about the social situation of testers. But as I watched her talk, I was a little surprised at how I agreed with the text and basic concepts of most of Maaret’s actual slides, and a lot of what she said. (I was surprised because Maaret and I have a history. We have clashed in person and on Twitter.) I was a bit worried that some of what I was going to say would seem like a rehash of what she just did, and I didn’t want to seem like I was papering over the serious differences between us. That’s why I decided to add a contrast slide to make sure our differences weren’t lost in the noise. This means a slide that highlights differences, instead of points of connection. There were already too many points of connection.

The slide was designed specifically:

  • for people to see who were in a specific room at a specific time.
  • for people who had just seen a talk by Maaret which established the basis of the contrast I was making.
  • about differences between two people who are both in the spotlight of public discourse.
  • to express views related to technical culture, not general social culture.
  • to highlight the difference between two talks for people who were about to see the second talk that might seem similar to the first talk.
  • for a situation where both I and Maaret were present in the room during the only time that this slide would ever be seen (unless someone tweeted it to people who would certainly not understand the context).
  • as talking points to accompany my live explanation (which is on video and I assume will be public, someday).
  • for a situation where I had invited anyone in the audience, including Maaret, to ask me questions or make challenges.

These people had just seen Maaret’s talk and were about to see mine. In the room, I explained the slide and took questions about it. Maaret herself spoke up about it, for which I publicly thanked her for doing so. It wasn’t something I was posting with no explanation or context. Nor was it part of the normal slides of my keynote.

Now I will address some specific issues that came up on Twitter:

1. On Naming Maaret

Maaret has expressed the belief that no one should name another person in their talk without getting their permission first. I vigorously oppose that notion. It’s completely contrary to the workings of a healthy society. If that principle is acceptable, then you must agree that there should be no free press. Instead, I would say if you stand up and speak in the guise of an expert, then you must be personally accountable for what you say. You are fair game to be named and critiqued. And the weird thing is that Maaret herself, regardless of what she claims to believe, behaves according to my principle of freedom to call people out. She, herself, tweeted my slide and talked about me on Twitter without my permission. Of course, I think that is perfectly acceptable behavior, so I’m not complaining. But it does seem to illustrate that community discourse is more complicated than “be nice” or “never cause someone else trouble with your speech” or “don’t talk about people publicly unless they gave you permission.”

2. On Being Nice

Maaret had a slide in her talk about how we can be kind to each other even though we disagree. I remember her saying the word “nice” but she may have said “kind” and I translated that into “nice” because I believed that’s what she meant. I react to that because, as a person who believes in the importance of integrity and debate over getting along for the sake of appearances, I observe that exhortations to “be nice” or even to “be kind” are often used when people want to quash disturbing ideas and quash the people who offer them. “Be nice” is often code for “stop arguing.” If I stop arguing, much of my voice goes away. I’m not okay with that. No one who believes there is trouble in the world should be okay with that. Each of us gets to have a voice.

I make protests about things that matter to me, you make protests about things that matter to you.

I think we need a way of working together that encourages debate while fostering compassion for each other. I use the word compassion because I want to get away from ritualized command phrases like “be nice.” Compassion is a feeling that you cultivate, rather than a behavior that you conform to or simulate. Compassion is an antithesis of “Rules of Order” and other lists of commandments about courtesy. Compassion is real. Throughout my entire body of work you will find that I promote real craftsmanship over just following instructions. My concern about “niceness” is the same kind of thing.

Look at what I wrote: I said “I don’t trust nice people.” That’s a statement about my feelings and it is generally true, all things being equal. I said “I’m not nice.” Yet, I often behave in pleasant ways, so what did I mean? I meant I seek to behave authentically and compassionately, which looks like “nice” or “kind”, rather than to imagine what behavior would trick people into thinking I am “nice” when indeed I don’t like them. I’m saying people over process, folks.

I was actually not claiming that Maaret is untrustworthy because she is nice, and my words don’t say that. Rather, I was complaining about the implications of following Maaret’s dictum. I was offering an alternative: be authentic and compassionate, then “niceness” and acts of kindness will follow organically. Yes, I do have a worry that Maaret might say something nice to me and I’ll have to wonder “what does that mean? is she serious or just pretending?” Since I don’t want people to worry about whether I am being real, I just tell them “I’m not nice.” If I behave nicely it’s either because I feel genuine good will toward you or because I’m falling down on my responsibility to be honest with you. That second thing happens, but it’s a lapse. (I do try to stay out of rooms with people I don’t respect so that I am not forced to give them opinions they aren’t willing or able to process.)

I now see that my sentence “I want to be authentic and compassionate” could be seen as an independent statement connected to “how I differ from Maaret,” implying that I, unlike her, am authentic and compassionate. That was an errant construction and does not express my intent. The orange text on that line indicated my proposed policy, in the hope that I could persuade her to see it my way. It was not an attack on her. I apologize for that confusion.

3. Debate vs. Dialogue

Maaret had earlier said she doesn’t want debate, but rather dialogue. I have heard this from other Agilists and I find it disturbing. I believe this is code for “I want the freedom to push my ideas on other people without the burden of explaining or defending those ideas.” That’s appropriate for a brainstorming session, but at some point, the brainstorming is done and the judging begins. I believe debate is absolutely required for a healthy professional community. I’m guided in this by dialectical philosophy, the history of scientific progress, the history of civil rights (in fact, all of politics), and the modern adversarial justice system. Look around you. The world is full of heartfelt disagreement. Let’s deal with it. I helped create the culture of small invitational peer conferences in our industry which foster debate. We need those more than ever.

But if you don’t want to deal with it, that’s okay. All that means is that you accept that there is a wall between your friends and those other people whom you refuse to debate with. I will accept the walls if necessary but I would rather resolve the walls. That’s why I open myself and my ideas for debate in public forums.

Debate is not a process of sticking figurative needles into other people. Debate is the exchange of views with the goal of resolving our differences while being accountable for our words and actions. Debate is a learning process. I have occasionally heard from people I think are doing harm to the craft that they believe I debate for the purposes of hurting people instead of trying to find resolution. This is deeply insulting to me, and to anyone who takes his vocation seriously. What’s more, considering that these same people express the view that it’s important to be “nice,” it’s not even nice. Thus, they reveal themselves to be unable to follow their own values. I worry that “Dialogue not debate” is a slogan for just another power group trying to suppress its rivals. Beware the Niceness Gang.

I understand that debating with colleagues may not be fun. But I’m not doing it for fun. I’m doing it because it is my responsibility to build a respectable craft. All testing professionals share this responsibility. Debate serves another purpose, too, managing the boundaries between rival value systems. Through debate we may discover that we occupy completely different paradigms; schools of thought. Debate can’t bridge gaps between entirely different world views, and yet I have a right to my world view just as you have a right to yours.

Jay Philips said on Twitter:

I admire Jay. I called her and we had a satisfying conversation. I filled her in on the context and she advised me to write this post.

One thing that came up is something very important about debate: the status of ideas is not the only thing that gets modified when you debate someone; what also happens is an evolution of feelings.

Yes I think “I’m right.” I acted according to principles I think are eternal and essential to intellectual progress in society. I’m happy with those principles. But I also have compassion for the feelings of others, and those feelings may hold sway even though I may be technically right. For instance, Maaret tweeted my slide without my permission. That is copyright violation. She’s objectively “wrong” to have done that. But that is irrelevant.

[Note: Maaret points out that this is legal under the fair use doctrine. Of course, that is correct. I forgot about fair use. Of course, that doesn’t change the fact that though I may feel annoyed by her selective publishing of my work, that is irrelevant, because I support her option to do that. I don’t think it was wise or helpful for her to do that, but I wouldn’t seek to bar her from doing so. I believe in freedom to communicate, and I would like her to believe in that freedom, too]

I accept that she felt strongly about doing that, so I [would] choose to waive my rights. I feel that people who tweet my slides, in general, are doing a service for the community. So while I appreciate copyright law, I usually feel okay about my stuff getting tweeted.

I hope that Jay got the sense that I care about her feelings. If Maaret were willing to engage with me she would find that I care about her feelings, too. This does not mean she gets whatever she wants, but it’s a factor that influences my behavior. I did offer her the chance to help me edit this post, but again, she refused.

4. Focus and Energy

Maaret said that eliminating the testing role is a good thing. I worry it will lead to the collapse of craftsmanship. She has a slide that says “from tester to team member” which is a sentiment she has expressed on Twitter that led me to say that I no longer consider her a tester. She confirmed to me that I hurt her feelings by saying that, and indeed I felt bad saying it, except that it is an extremely relevant point. What does it mean to be a tester? This is important to debate. Maaret has confirmed publicly (when I asked a question about this during her talk) that she didn’t mean to denigrate testing by dismissing the value of a testing role on projects. But I don’t agree that we can have it both ways. The testing role, I believe, is a necessary prerequisite for maintaining a healthy testing craft. My key concern is the dilution of focus and energy that would otherwise go to improving the testing craft. This is lost when the role is lost.

This is not an attack on Maaret’s morality. I am worried she is promoting too much generalism for the good of the craft, and she is worried I am promoting too much specialism. This is a matter of professional judgment and perspective. It cannot be settled, I think, but it must be aired.

The Slide Should Not Have Been Tweeted But It’s Okay That It Was

I don’t know what Maaret was trying to accomplish by tweeting my slide out of context. Suffice it to say what is right there on my slide: I believe in authenticity and compassion. If she was acting out of authenticity and compassion then more power to her. But the slide cannot be understood in isolation. People who don’t know me, or who have any axe to grind about what I do, are going to cry “what a cruel man!” My friends contacted me to find out more information.

I want you to know that the slide was one part of a bigger picture that depicts my principled objection to several matters involving another thought leader. That bigger picture is: two talks, one room, all people present for it, a lot of oratory by me explaining the slide, as well as back and forth discussion with the audience. Yes, there were people in the room who didn’t like hearing what I had to say, but “don’t offend anyone, ever” is not a rule I can live by, and neither can you. After all, I’m offended by most of the talks I attend.

Although the slide should not have been tweeted, I accept that it was, and that doing so was within the bounds of acceptable behavior. As I announced at the beginning of my talk, I don’t need anyone to make a safe space for me. Just follow your conscience.

What About My Conscience?

  • My conscience is clean. I acted out of true conviction to discuss important matters. I used a style familiar to anyone who has ever seen a public debate, or read an opinion piece in the New York Times. I didn’t set out to hurt Maaret’s feelings and I don’t want her feelings to be hurt. I want her to engage in the debate about the future of the craft and be accountable for her ideas. I don’t agree that I was presuming too much in doing so.
  • Maaret tells me that my slide was “stupid and hurtful.” I believe she and I do not share certain fundamental values about conferring. I will no longer be conferring with her, until and unless those differences are resolved.
  • Compassion is important to me. I will continue to examine whether I am feeling and showing the compassion for my fellow humans that they are due. These conversations and debates I have with colleagues help me do that.
  • I agree that making a safe space for students is important. But industry consultants and pundits should be able to cope with the full spectrum, authentic, principled reactions by their peers. Leaders are held to a higher standard, and must be ready and willing to defend their ideas in public forums.
  • The reaction on Twitter gave me good information about a possible trend toward fragility in the Twitter-facing part of the testing world. There seems to be a significant group of people who prize complete safety over the value that comes from confrontation. In the next conference I help arrange, I will set more explicit ground rules, rather than assuming people share something close to my own sense of what is reasonable to do and expect.
  • I will also start thinking, for each slide in my presentation: “What if this gets tweeted out of context?”

(Oh, and to those who compared me to Donald Trump… Can you even imagine him writing a post like this in response to criticism? BELIEVE ME, he wouldn’t.)