Tyranny of the Innocents, Exhibit A: Gary Cohen

Gary Cohen, Deputy Administrator and Director, Center for Consumer Information and Insurance Oversight, Centers for Medicare and Medicaid
Services, is not a computer guy. He’s a lawyer. He knows the insurance industry. And yet he was put in charge of a very large and important project: Healthcare.gov.

Here’s what BusinessWeek said about him on August 22nd:

“Gary Cohen seems awfully calm for a man whose job is to make sure Obamacare doesn’t flop. As head of CCIIO (awkward pronunciation: Suh-SIE-O), he oversees the complex, politically fraught system of state health insurance exchanges that will begin signing up uninsured Americans starting on Oct. 1. It hasn’t exactly been a smooth rollout. Many Americans still have no idea the exchanges exist, and the administration has struggled to explain who’s eligible for coverage under the Affordable Care Act and how they enroll. Cohen is convinced the confusion will clear up once things are up and running. “We’re going to get to the point where the discussions we’re having today will fade into the background,” he says.

He should have known that the system wasn’t going to work, at that point. But he’s not a technology guy, so perhaps he thought some big-brained hacker from the movies was going to pull it together at the last minute?

Here’s what he was asked and what he answered at a House Committee on Oversight hearing on May 21st:

Ms. DUCKWORTH. …Could you speak a little bit on the Administration’s readiness to
reach out to this huge number of people so that they can enroll in
time? Basically, you say that you are going to be ready to go on
October 1st, and you need to be. If not, what do you need in order
to get ready and have a successful rollout of these provisions?

Mr. COHEN. So we have a plan in place that basically is timed
so that people are getting the information close to the time in
which there is something that they can do with it. So right now we
are in what we call the education phase, which began in January
and proceeds through June, where we are just putting out information.
We are in the process of re-purposing the HealthCare.gov site
to be really a consumer information site. Our call center will be
going live in June, where people will be able to call and get information
that way. And then starting in the summer we will begin
what we call the anticipation, or get ready phase. And I am not an
expert in these things, but what I understand is that if you start
too early and then people say, well, what do I do, and then there
is nothing that they can do because it is too soon, then you may
end up having people who get a little bit kind of frustrated or disappointed.
So we really are gearing towards making sure the people get the
information they need in time for October, when they actually can
take action and begin to get enrollment coverage.

Hmm. He was asked directly if he needed anything to make sure he was ready to go on October 1st. His answer was basically: no thank you.

Did he really think everything was on track? Why didn’t his people prepare him to set expectations better?

Mr. GOSAR. Mr. Cohen, how closely is HHS working with IRS on Obamacare
implementation?
Mr. COHEN. We are working closely with IRS on those aspects of
implementation where we have to work together, so, for example,
as you know, in determining whether a person is eligible for Medicaid
or CHIP on the one hand, or tax credits in the marketplaces
on the other, income is a test, and we are working with IRS on
verifying people’s income when they apply.
Mr. GOSAR. So the IRS is going to be gathering and sending this
enormous amount of taxpayer information to all the 50 exchanges.
All 50 exchanges are to be ready by October 1st, right?
Mr. COHEN. Yes.
Mr. GOSAR. So will there be any problems with this massive
amount of data sharing?
Mr. COHEN. No. And data sharing may not be exactly the right
way to look at it. Basically what will happen is people will put information
about their income in an application; that information
will be verified by data that comes from the IRS, but there is no
exchange of information from the IRS to the exchange; the information
goes out, it is verified, and it comes back.
Mr. GOSAR. But it is still from the exchange going to the IRS,
and that is where I am going.
Mr. COHEN. It is going to the data hub. Information is coming
from the IRS to the data hub and from the exchange to the data
hub, and there is a comparison and then there is an answer back.
But the tax information isn’t actually going to the exchange.

What a refreshingly blunt answer to the question of whether there will be any trouble with data exchange: No. Unfortunately, we now know there are massive problems with that. Why didn’t he give a more nuanced answer? Why didn’t he hedge? This is why I think he’s an innocent– a child put in charge of the chocolate factory. He didn’t need to be, but that’s how he played it. I guess he was distracted by other duties and trusted the technologists? Or maybe he dismissed the concerns of the technologists as mere excuses? I wonder.

 

 

 

 

Healthcare.gov and the Tyranny of the Innocents

The failure of Healthcare.gov is probably not because of sinister people. It’s probably because of the innocents who run the project. These well-intentioned people are truly as naive as little children. And they must be stopped.

They are, of course, normal intelligent adults. I’m sure they got good grades in school– if you believe in that sort of thing– and they can feed and clothe themselves. They certainly look normal, even stately and wise. It’s just that they are profoundly ignorant about technology projects while being completely oblivious to and complacent about that ignorance. That is the biggest proximal cause of this debacle. It’s called the Dunning-Kruger syndrome (which you can either look up or confidently assure yourself that you don’t need to know about): incompetence of a kind that makes you unable to assess your own lack of competence.

Who am I talking about? I’m talking to some extent about everyone above the first level of management on that project, but mostly I’m talking about anyone who was in the management chain for that project but who has never coded or tested before in their working lives. The non-technical people who created the conditions that the technical people had to work under.

I also blame the technical people in a different way. I’ll get to that, below.

How do I come to this conclusion? Well, take a look at the major possibilities:

Maybe it didn’t fail. Maybe this is normal for projects to have a few glitches? Oh my, no. Project failures are not often clear cut. But among failures, this one is cut as clearly as the Hope Diamond. This is not a near miss. This is the equivalent of sending Hans away to sell the family cow and he comes back with magic beans. It’s the equivalent of going to buy a car and coming back with a shopping cart that has a cardboard sign on which someone has written “CAR” in magic marker. It’s a swing and a miss when the batter was not even holding a bat. It’s so bad, I hope criminal charges are being considered. Make no mistake, the people who ran this project scammed the US government.

Did it fail because it’s too hard a project to do? It’s a difficult project, for sure. It may have been too hard to do under the circumstances prescribed. If so, then we should have heard that message a year ago. Loudly and publicly. We didn’t hear that? Why? Could it have been that the technical people kept their thoughts and feelings carefully shrouded? That’s not what’s being reported. It’s come out that technical people were complaining to management. Management must have quashed those complaints.

Did politics prevent the project from succeeding? No doubt that created a terrible environment in which to produce the system. So what? If it’s too hard, just laugh and say “hey this is ridiculous, we can’t commit to creating this system” UNLESS, of course, you are hoping to hide the problem forever, like a child who has wet the bed and dumps the sheets out the back window. I suppose it’s possible that Republican operatives secretly conspired to make the project fail. If so, I hope that comes out. Doesn’t matter, though. Management could still have seen it coming, unless the whole development team was in on the fix.

Were the technical people incompetent? Probably. It’s likely that many of the programmers were little better than novices, from what I can tell by looking at the bug reports coming through. It was a Children’s Crusade, I guess. But again, so what? The purpose of management, at each of the contracting agencies and above them, is to assess and assure the general competence and performance of the people working on the job. That comes first. I’m sure there were good people mixed in there, somewhere. I have harsh feelings for them, however. I would say to them: Why didn’t you go public? Why didn’t you resign? You like money that much? Your integrity matters that little to you?

Management created the conditions whereby this project was “delivered” in a non-working state. Not like the Dreamliner. The 787 had some serious glitches, and Boeing needs to shape that up. What I’m talking about is boarding an aircraft for a long trip only to be told by the captain “Well, folks it looks like we will be stuck here at the gate for a little while. Maintenance needs to install our wings and engines. I don’t know much about aircraft building, but I promise we will be flying by November 30th. Have some pretzels while you wait.”

Management must bear the prime responsibility for this. I’m not sure that Obama himself is to blame. Everyone under him though? Absolutely.

What About Testing?

Little testing happened on the site. The testing that happened seems to have confirmed what everyone knew. Now this article has come out, about what’s happening behind the scenes. I sure hope they have excellent Rapid Testers working on that, because there is no time for TDD or much of any unit testing and certainly no time to write bloated nonsensical “test case specs” that usually infect government efforts like so much botfly larvae.

Notice the bit at the end?

“It’s a lot of work but people are committed to it. I haven’t heard anyone say it’s not a doable job,” the source said of the November 30th deadline to fix the online portal to purchase insurance on the federal exchange.

Exactly. That’s exactly the problem, Mr. Source. This is what I mean by the tyranny of the innocents. If no one is telling you that the November 30th deadline is not doable, and you think that’s a good sign, then you are an innocent. If you are managing to that expectation then you are a tyrant. It’s probably not doable. I already know that this can’t possibly leave enough time for reasonable testing of the system. Even if it is doable, only a completely dysfunctional project has no one on it speaking openly about whether it is doable.

What Can Be Done?

Politics will ruin everything. I have no institutional solution for this kind of problem. “Best practices” won’t help. Oversight committees won’t help. I can only say that each of us can and should foster a culture of personal ethical behavior. I was on a government project, briefly, years ago. I concluded it was an outlandish waste of taxpayer money and I resigned. I wanted the money. But I resigned anyway. It wasn’t easy. I had car payments and house payments to make. Integrity can be hard. Integrity can be lonely. I don’t always live up to my highest ideals for my own behavior, and when that happens I feel shame. The shame I feel spurs me to be better. That’s all I’m hoping for, really. I hope the people who knew better on this project feel shame. I hope they listen to that shame and go on to be better people.

I do have advice for the innocents. I’ll speak directly to you, Kathy Sebelius, since you are the most public example of who I am talking about…

Hi Kathy,

You’re not a technology person. You shouldn’t have to be. But you need people working for you who are, because technology is opaque. It may surprise you to know that unlike building bridges and monuments, the status of software can be effectively hidden from anyone more than one level above (or sideways from) the programmer or tester who is actually working on that particular piece of it. It’s like managing a gold mine without being able to go down into the mine yourself.

This means you are in a weak position, as an executive. You can pound the table and threaten to fire people, sure. It won’t help. The way in which an executive can use direct power will only make a late software project even later. Every use of direct power weakens your influence. Use indirect power, instead. Imagine that you are taming wild birds. I used to do that as a kid in Vermont. It requires quietness and patience. The first part is to stand for an hour holding birdseed in your hand. Stand quietly and eventually they are landing in your hand.

To have managed this project well, you needed to have created an environment where people could speak without fear. You needed to work with your direct reports to make sure they weren’t filtering out too much of the bad news. You needed to visit the project on a regular basis, and talk to the lowest level people. Then you needed to forgive their managers for not telling you all the bad news. It’s a maddeningly slow process. If you notice, the Pope is currently doing something very similar. Hey, I’m an atheist and yet I find myself listening to that guy. He’s a master of indirect leadership.

You did have the direct power to set expectations. I’m sure you realize you could have done a much better job of that, but perhaps you felt fear, yourself. As your employer (a taxpaying citizen), I bear a little of that responsibility. The country is getting the Healthcare.gov site that it deserves, in a sense.

If you are going to continue in public service, please do yourself and all of us a favor and take a class on software project management. Attend a few lectures. Get smart about what kinds of dodges and syndromes contractors use.

Don’t be an innocent, marching to the slaughter, while millions of dollars line the pockets of the people who run CGI and all those other parasite companies.

— Sincerely, James

My Political Agenda

I have $200,000 of unpaid medical bills due to the crazy jacked up prices and terrible insurance situation for individual citizens in the United States. I am definitely a supporter of the concept of health care reform, even the flawed Obamacare system, if that’s the best we can do for now.

I was pleased to see the failure of the Healthcare.gov website, at first. A little failure helps me make my arguments about how hard it is to do technology well; how getting it right means striving to better ourselves, and no formula or textual incantation will do that for us.

This is too much failure! I want it to stop now. Still, I’m an adult, a software project expert and not in any way an innocent. I know it’s not going to be resolved soon. No Virginia, there won’t be a Healthcare.gov website this Christmas.

Addendum:

From cnn.com:

Summers wrote a memo to the President in 2010 suggesting that HealthCare.gov was not something the government could handle and he needed to bring in experts.

While Summers would not provide details about internal discussions, he said Tuesday, “You need experts. You need to trust but you need to verify. You can’t go rushing the schedule when you get behind or you end up making more errors.”

Damn straight. If this is true then I’m sure glad someone around Obama had basic wisdom. I guess nobody listened to him.

What Testers Find

While testing at eBay, recently, it occurred to me that we need a deeper account of what testers find. It’s not just bugs. Here’s my experimental list:

Testers find bugs. In other words, we look for anything that threatens the value of the product. (This ties directly into Jerry Weinberg’s famous dictum that quality means value to some person, at some time, who matters.) Some people like to say that testers find “defects.” That is also true, but I avoid that word. It tends to make programmers and lawyers upset, and I have trouble enough. Example: a list of countries in a form is missing “France.”

Testers also find risks. We notice situations that seem likely to produce bugs. We notice behaviors of the product that look likely to go wrong in important ways, even if we haven’t yet seen that happen. Example: A web form is using a deprecated HTML tag, which works fine in current browsers, but may stop working in future browsers. This suggests that we ought to do a validation scan. Maybe there are more things like that on the site.

Testers find issues. An issue is something that threatens the value of the project, rather than the product itself. Example: There’s a lot of real-time content on eBay. Ads and such. Am I supposed to test that stuff? How should I test it?

Testers find testability problems. It’s a kind of issue, but it’s worth highlighting. Testers should point out aspects of the product that make it hard to observe and hard to control. There may be small things that the developers can do (adding scriptable interfaces and log files, for instance) that can improve testability. And if you don’t ask for testability, it’s your fault that you don’t get it. Example: You’re staring at a readout that changes five times a second, wondering how to tell if it’s presenting accurate figures. For that, you need a log file.

Testers find artifacts, too. Also a kind of issue, but also worth highlighting, we sometimes see things that look like problems, but turn out to be manifestations of how we happen to be testing. Example: I’m getting certificate errors on the site, but it turns out to be an interaction between the site and Burp Proxy, which is my recording tool.

Testers find curios. We notice surprising and interesting things about our products that don’t threaten the value of the product, but may suggest hidden features or interesting ways of using the product. Some of them may represent features that the programmers themselved don’t know about. They may also suggest new ways of testing. Example: Hmm. I notice that a lot of complex content is stored in Iframes on eBay. Maybe I can do a scan for Iframes and systematically discover important scripts that I need to test.

Maybe there are other things you think should be added to this list. The point is that the outcomes of testing can be quite diverse. Keep your eyes and your mind open.

Introducing Thread-Based Test Management

Most of the testing world is managed around artifacts: test cases, test documents, bug reports. If you look at any “test management” tool, you’ll see that the artifact-based approach permeates it. “Test” for many people is a noun.

For me test is a verb. Testing is something that I do, not so much something that I create. Testing is the act of exploration of an unknown territory. It is casting questions, like Molotov cocktails, into the darkness, where they splatter and burst into bright revealing fire.

How to Manage Such a Process?

My brother Jon and I created a way to control highly exploratory testing 10 years ago, called session-based test management (SBTM). I recently returned from an intense testing project in Israel, where I used SBTM. But I also experimented with a new idea: thread-based test management (TTM).

Like many of my new ideas, it’s not really new. It’s the christening (with words) and sharpening (with analysis) of something many of us already do. The idea is this: organize management around threads of activity rather than test sessions or artifacts.

Thread-based testing is a generalized form of session-based testing, in that sessions are a form of thread, but a thread is not necessarily a session. In SBTM, you test in uninterrupted blocks of time that each have a charter. A charter is a mission for that session; a light sort of commitment, or contract. A thread, on the other hand, may be interrupted, it may go on and on indefinitely, and does not imply a commitment. Session-based testing can be seen as an extension of thread-based testing for especially high accountability and more orderly situations.

I define a thread as a set of one or more activities intended to solve a problem or achieve an objective. You could think of a thread as a very small project within a project.

Why Thread-Based Test Management?

Because it can work under even the most chaotic and difficult conditions. The only formalism required for TBTM is a list of threads. I use this form of test management when I am dropped into a project with as little a day or two to get it done.

What Does Thread-Based Test Management Looks Like?

It’s simple. Thread-based test management looks like a todo list, except that we organize the todo items into an outline that matches the structure of the testing process. Here’s a mocked-up example:

Test Facilities

  • Power meter calibration method
  • Backup test jig validation
  • Create standard test images

Test Strategy

  • Accuracy Testing
    • Sampling strategy
    • Preliminary-testing
    • Log file analysis program
  • Transaction Flow Testing
  • Essential Performance Testing
  • Safety Testing
    • warnings and errors FRS review
    • tool for forcing errors
  • Compliance Testing
  • Test Protocol V1.0 doc.

Test Management

  • Change protocol definition
  • Build protocol definition
  • Test cycle protocol definition
  • Bug reporting protocol definition
  • Bug triage
  • Fix verifications

This outline describes the high level threads that comprise the test project. I typically use a mind-mapping program like MindManager to organize and present them.

So, you should be thinking, “Is that it? Todo lists?” right about now. Well, no. That’s not it. But that’s one face of it.

What Else Does Thread-Based Test Management Look Like?

It looks like testers gathered around a todo list, talking about what they are going to work on that afternoon. Then they split up and go to work. Several times day they might come together like that. If the team is not co-located, then this meeting is done over instant messaging, email, or perhaps through a wiki.

Is That All it Looks Like?

Well, there is also the status report. Whether written or spoken, the thread-based test management version of a status report lists the threads, who is working on the threads, and the outlook for each thread. It typically also includes an issues list.

Other documentation may be produced, of course. TBTM doesn’t tell you what documents to create. It simply tells you that threads are the organizing principle we use for managing the project.

Where Do Threads Come From?

Threads are first spawned from our model of the testing problem. The Satisfice Heuristic Test Strategy Model is an example of such a model. By working through those lists, we get an idea of the kinds of testing we might want to do: those are the first of the threads. After that, threads might be created in many ways, including splitting off of existing threads as we gain a deeper understanding of what testing needs to be done. Of course, in an Agile environment, each user story kicks off a new testing thread.

Which Threads Do We Work On?

Think priority and progress. We might frequently drop threads, switch threads, and pick them up again. In general, we work on the highest priority threads, but we also work on lower priority threads many times, when we see the possibility for quick and inexpensive progress. If I’m trying to finish a sanity check on the new build, I might interrupt that to discuss the status of a particular known bug if the developer happens to wander by.

Major ongoing threads often become attached to specific people. For instance “client testing” or “performance testing” often become full-time jobs. Testing itself, after all, can be thought of as a thread so challenging to do well, and so different from programming, that most companies have seen fit to hire dedicated testers.

How Do Threads End?

A thread ends either in a cut or knot. Cutting a thread means to cancel that task. A knot, however, is a milestone; an achievement of some kind. This is exactly the meaning of the phrase “tying up the loose ends” and marks either the end of the thread (or group of threads) or a good place to drop it for a while.

How Do We Estimate Work?

In thread-based test management, there is no special provision or method for estimating work, except that this is done on a thread-by-thread basis. Session-based test management may be overlaid onto TBTM in order to estimate work in terms of sessions.

How Do We Evaluate Progress?

In thread-based test management, there is no special provision or method for evaluating progress, either, except that this is done on a thread-by-thread basis, and status reports may be provided frequently, perhaps at the end of each day. Session-based test management is also helpful for that.

So What?

This form of management is actually quite common. But, to my knowledge, no one has yet named and codified it. Without a convenient way to talk about it, we have a hard time explaining and justifying it. Then when the “process improvement” freaks come along, they act like there’s no management happening at all. This form of management has been “illegible” up to now (meaning that it’s there but no one notices it) and my brother and I are going to push to make it fully legible and respectable in the testing arena.

From now on, when asked about my approach to test management, I can say “I practice Rapid Testing methodology, which I track in either a thread-based or session-based manner, depending on the stage of the project, and in a risk-based manner at all times.”

How is TBTM Any Different From Using a TO-DO List?

Michel Kraaij questions the substance of TBTM by wondering how it’s different from the age-old idea of a “to-do” list? See his post here.

This is a good question. Yes, TBTM is different than just using a to-do list, but even so, I don’t think I’ve ever read an article about to-do list based test management (TDBTM?). Most textbooks focus on artifacts, not the activity of testing. Thread-based test management is trying to capture the essence of managing with to-do lists, plus some other things in addition to that.

The main additional element, beyond just making a to-do list, is that a traditional to-do list contains items that can be “done”, whereas many threads might not ever be “done.” They might be cut (abandoned) or knotted (temporarily parked at some level of completion). Some threads maybe tied up with a bow and “done” like a normal task, but not the main ones that I’m thinking of. As I practice testing, for instance, I’m rarely “done” with test strategy. I tinker with the test strategy all the way through the project. That’s why it makes sense to call it a thread.

Once again: Thread-based management is not focused on getting things “done.” In this way it is different from KanBan, Scrum, ToDo lists, Session-based test management,  etc., all of which are big into workflow management of definite units of work.

Another thing to recognize is that the main concern of TBTM is how to know what to put on your thread list. The answer to that invokes the entire framework of Rapid Software testing. So, yeah, it’s more than having an outline of threads, which does look very much like a to-do list– it’s the activity (and skills) of making the list and managing it. If you want to talk about to-do list based test management, then you would have to invent that lore as well. You couldn’t just say “make a to-do list” and claim to have communicated the methodology.

[You can find Jonathan’s take on TBTM here.]

[I credit Sagi Krupetski, the test lead on my recent project, for helping me get this idea. His clockwork status reporting and regular morning question “Where are we on the project and what do you think you need to work on today?” caused me to see the thread structure of our project clearly for the first time. He’s back on the market now (Chicago area), in case you need a great tester or test manager.]

Putting Subtitles to Testing

I’ve released a new video, which is a whimsical look at a serious subject: explaining exploratory testing.

In the video, my brother and I independently test an “Easy Button” for 10 minutes. Neither of us had seen the other’s test session. Then I edited the 20 minutes of total testing down to a 4 minute highlight reel and added subtitles.

The subtitles are important. One of the core skills of excellent testing is being able to reflect upon, describe, explain, and defend your work. The rhetoric of testing is a big part of Rapid Testing methodology.

So, everything we did, we can explain. If someone stops me when I’m testing, I can give a report on the spot, in oral or written form, and I can put specific technical terminology to it. In my experience, most testers are not able to do that, and there’s one major reason– they don’t practice. It does take practice, friends. While you were enjoying your Sunday, my brother and I were challenging each other to a testing duel.

You might quibble with me about the specific terminology that I used in the video. Indeed, there is a great deal of leeway. One single test activity might simultaneously be a function test, a happy path test, a scenario test, a claims test, and a state-transition test! There’s no clean orthogonality to be found. And as you already know if you read my blog, I reject any “official” lexicon of testing. But I’m not just throwing these terms around, I can explain each one, and say what is and is not an example of it.

What about the Easy Button?

Our principal finding is that the Easy Button is extremely durable. I’m surprised at the high quality of the fit and finish. Also it feels solid (I discovered why when I disassembled it and found apparently lead weights inside. Plus, the button surface is amazingly resilient to repeated hard blows with a rock hammer).

But I’m also surprised that it claims not to be a “toy.” Of course it’s a toy. Of course little kids will play with it.

If I were seriously consulting about testing it, I would probably suggest that its physical qualities were more important to validate than its functional qualities. There appears to be little risk associated with its functionality. On the other hand, there appears to be little risk with its physical qualities either.

I would suggest that it’s far more important to test the web version of the “Easy Button” than the physical version. I would move on to that.

Quality is Dead #2: The Quality Creation Myth

One of the things that makes it hard to talk about quality software is that we first must overcome the dominating myth about quality, which goes like this: The quality of a product is built into it by its development team. They create quality by following disciplined engineering practices to engineer the source code so that it will fulfill the requirements of the user.

This is a myth, not a lie. It’s a simplified story that helps us make sense of our experience. Myths like this can serve a useful purpose, but we  must take care not to believe in them as if they were the great and hoary truth.

Here are some of the limitations of the myth:

  1. Quality is not a thing and it is not built. To think of it as a thing is to commit the “reification fallacy” that my colleague Michael Bolton loves to hate. Instead, quality is a relationship. Excellent quality is a wonderful sort of relationship. Instead of “building” quality, it’s more coherent to say we arrange for it. Of course you are thinking “what’s the difference between arrange and build? A carpenter could be said to arrange wood into the form of a cabinet. So what?” I like the word arrange because it shifts our attention to relationships and because arrangement suggests less permanence. This is important because in technology we are obliged to work with many elements that are subject to imprecision, ambiguity and drift.
  2. A “practice” is not the whole story of how things get done. To say that we accomplish things by following “practices” or “methods” is to use a figure of speech called a synecdoche– the substitution of a part for the whole. What we call practices are the public face of a lot of shadowy behavior that we don’t normally count as part of the way we work. For instance, joking around, or eating a salad at your desk, or choosing which email to read next, and which to ignore. A social researcher examining a project in progress would look carefully at who talks to whom, how they talk and what they talk about. How is status gained or lost? How do people decide what to do next? What are the dominant beliefs about how to behave in the office? How are documents created and marketed around the team? In what ways do people on the team exert or accept control?
  3. Source code is not the product. The product is the experience that the user receives. That experience comes from the source code in conjunction with numerous other components that are outside the control and sometimes even the knowledge of product developers. It also comes from documentation and support. And that experience plays out over time on what is probably a chaotic multi-tasking computing environment.
  4. “Requirements” are not the requirements, and the “users” are not the users. I don’t know what my requirements are for any of the software I have ever used. I mean, I do know some things. But for anything I think I know, I’m aware that someone else may suggest something that is different that might please me better. Or maybe they will show me how something I thought was important is actually harmful. I don’t know my own requirements for certain. Instead, I make good guesses. Everyone tries to do that. People learn, as they see and work with products, more about what they want. Furthermore, what they want actually changes with their experiences. People change. The users you think you are targeting may not be the users you get.
  5. Fulfillment is not forever and everywhere. The state of the world drifts. A requirement fulfilled today may no longer be fulfilled tomorrow, because  of a new patch to the operating system, or because a new competing product has been released.  Another reason we can’t count on a requirement being fulfilled is that can does not mean will. What I see working with one data set on one computer may not work with other data on another computer.

These factors make certain conversations about quality unhelpful. For instance, I’m impatient when someone claims that unit testing or review will guarantee a great product, because unit testing and review do not account for system level effects, or transient data occurring in the field, or long chains of connected transactions, or intermittent failure of third-party components. Unit testing and review focus on source code. But source code is not the product. So they can be useful, but they are still mere heuristic devices. They provide no guarantee.

Once in a while, I come across a yoho who thinks that a logical specification language like “Z” is the great solution. Because then your specification can be “proven correct.” The big problems with that, of course, is that correctness in this case simply means self-consistency. It does not mean that the specification corresponds to the needs of the customer, nor that it corresponds to the product that is ultimately built.

I’m taking an expansive view of products and projects and quality, because I believe my job is to help people get what they want. Some people, mainly those who go on and on about “disciplined engineering processes” and wish to quantify quality, take a narrower view of their job. I think that’s because their overriding wish is that any problems not be “their fault” but rather YOUR fault. As in, “Hey, I followed the formal spec. If you put the wrong things in the formal spec, that’s YOUR problem, stupid.”

My Take on the Quality Story

Let me offer a more nuanced version of the quality story– still a myth, yes– but one more useful to professionals:

A product is a dynamic arrangement, like a garden that is subject to the elements. A high quality product takes skillful tending and weeding over time. Just like real gardeners, we are not all powerful or all knowing as we grow our crop. We review the conditions and the status of our product as we go. We try to anticipate problems, and we react to solve the problems that occur. We try to understand what our art can and cannot do, and we manage the expectations of our customers accordingly. We know that our product is always subject to decay, and that the tastes of our customers vary. We also know that even the most perfect crop can be spoiled later by a bad chef. Quality, to a significant degree, is out of our hands.

After many years of seeing things work and fail (or work and THEN fail), I think of quality as ephemeral. It may be good enough, at times. It may be better than good enough. But it fades; it always fades, like something natural.

Or like sculpture by Andy Goldsworthy.  (Check out this video.)

This is true for all software, but the degree to which it is a problem will vary. Some systems have been built that work well over time. That is the result of excellent thinking and problem solving on the part of the development team. But I would argue it is also the result of favorable conditions in the surrounding environment. Those conditions are subject to change without notice.

“Mipping”: A Strategy for Reporting Iffy Bugs

When I first joined ST Labs, years ago, we faced a dilemma. We had clients telling us what kind of bugs we should not report. “Don’t worry about the installer. Don’t report bugs on that. We have that covered.” No problem, dear customer, we cheerfully replied. Then after the project we would hear complaints about all the installation bugs we “missed”.

So, we developed a protocol called Mention In Passing, or “mipping”. All bugs shall be reported, without exception. Any bug that seems questionable or prohibited we will “mention in passing” in our status reports or emails. In an extreme case we mention it by voice, but I generally want to have a written record. That way we are not accused of wasting time investigating and reporting the bug formally, but we also can’t be accused of missing it entirely.

If a client tells me to stop bothering him about those bugs, even in passing, I might switch to batching them, or I might write a memo to all involved that I will henceforth not report that kind of problem. But if there is reasonable doubt in my mind that my client and I have a strong common understanding of what should and should not be reported, I simply tell them that I “mip” bugs to periodically check to see if I have accidentally misconstrued the standard for reporting, or to see if the standard has changed.