The Bestselling Software Intended for People Who Couldn’t Use It.

In 1983, my boss, Dale Disharoon, designed a little game called Alphabet Zoo. My job was to write the Commodore 64 and Apple II versions of that game. Alphabet Zoo is a game for kids who are learning to read. The child uses a joystick to move a little character through a maze, collecting letters to spell words.

We did no user testing on this game until the very day we sent the final software to the publisher. On that day, we discovered that our target users (five year-olds) did not have the ability to use a joystick well enough to play the game. They just got frustrated and gave up.

We shipped anyway.

This game became a bestseller.

alphabet-zoo

It was placed on a list of educational games recommended by the National Education Association.

alphabet-zoo-list

Source: Information Please Almanac, 1986

So, how to explain this?

Some years later, when I became a father, I understood. My son was able to play a lot of games that were too hard for him because I operated the controls. I spoke to at least one dad who did exactly that with Alphabet Zoo.

I guess the moral of the story is: we don’t necessarily know the value of our own creations.

Tyranny of the Innocents, Exhibit A: Gary Cohen

Gary Cohen, Deputy Administrator and Director, Center for Consumer Information and Insurance Oversight, Centers for Medicare and Medicaid
Services, is not a computer guy. He’s a lawyer. He knows the insurance industry. And yet he was put in charge of a very large and important project: Healthcare.gov.

Here’s what BusinessWeek said about him on August 22nd:

“Gary Cohen seems awfully calm for a man whose job is to make sure Obamacare doesn’t flop. As head of CCIIO (awkward pronunciation: Suh-SIE-O), he oversees the complex, politically fraught system of state health insurance exchanges that will begin signing up uninsured Americans starting on Oct. 1. It hasn’t exactly been a smooth rollout. Many Americans still have no idea the exchanges exist, and the administration has struggled to explain who’s eligible for coverage under the Affordable Care Act and how they enroll. Cohen is convinced the confusion will clear up once things are up and running. “We’re going to get to the point where the discussions we’re having today will fade into the background,” he says.

He should have known that the system wasn’t going to work, at that point. But he’s not a technology guy, so perhaps he thought some big-brained hacker from the movies was going to pull it together at the last minute?

Here’s what he was asked and what he answered at a House Committee on Oversight hearing on May 21st:

Ms. DUCKWORTH. …Could you speak a little bit on the Administration’s readiness to
reach out to this huge number of people so that they can enroll in
time? Basically, you say that you are going to be ready to go on
October 1st, and you need to be. If not, what do you need in order
to get ready and have a successful rollout of these provisions?

Mr. COHEN. So we have a plan in place that basically is timed
so that people are getting the information close to the time in
which there is something that they can do with it. So right now we
are in what we call the education phase, which began in January
and proceeds through June, where we are just putting out information.
We are in the process of re-purposing the HealthCare.gov site
to be really a consumer information site. Our call center will be
going live in June, where people will be able to call and get information
that way. And then starting in the summer we will begin
what we call the anticipation, or get ready phase. And I am not an
expert in these things, but what I understand is that if you start
too early and then people say, well, what do I do, and then there
is nothing that they can do because it is too soon, then you may
end up having people who get a little bit kind of frustrated or disappointed.
So we really are gearing towards making sure the people get the
information they need in time for October, when they actually can
take action and begin to get enrollment coverage.

Hmm. He was asked directly if he needed anything to make sure he was ready to go on October 1st. His answer was basically: no thank you.

Did he really think everything was on track? Why didn’t his people prepare him to set expectations better?

Mr. GOSAR. Mr. Cohen, how closely is HHS working with IRS on Obamacare
implementation?
Mr. COHEN. We are working closely with IRS on those aspects of
implementation where we have to work together, so, for example,
as you know, in determining whether a person is eligible for Medicaid
or CHIP on the one hand, or tax credits in the marketplaces
on the other, income is a test, and we are working with IRS on
verifying people’s income when they apply.
Mr. GOSAR. So the IRS is going to be gathering and sending this
enormous amount of taxpayer information to all the 50 exchanges.
All 50 exchanges are to be ready by October 1st, right?
Mr. COHEN. Yes.
Mr. GOSAR. So will there be any problems with this massive
amount of data sharing?
Mr. COHEN. No. And data sharing may not be exactly the right
way to look at it. Basically what will happen is people will put information
about their income in an application; that information
will be verified by data that comes from the IRS, but there is no
exchange of information from the IRS to the exchange; the information
goes out, it is verified, and it comes back.
Mr. GOSAR. But it is still from the exchange going to the IRS,
and that is where I am going.
Mr. COHEN. It is going to the data hub. Information is coming
from the IRS to the data hub and from the exchange to the data
hub, and there is a comparison and then there is an answer back.
But the tax information isn’t actually going to the exchange.

What a refreshingly blunt answer to the question of whether there will be any trouble with data exchange: No. Unfortunately, we now know there are massive problems with that. Why didn’t he give a more nuanced answer? Why didn’t he hedge? This is why I think he’s an innocent– a child put in charge of the chocolate factory. He didn’t need to be, but that’s how he played it. I guess he was distracted by other duties and trusted the technologists? Or maybe he dismissed the concerns of the technologists as mere excuses? I wonder.

 

 

 

 

Healthcare.gov and the Tyranny of the Innocents

The failure of Healthcare.gov is probably not because of sinister people. It’s probably because of the innocents who run the project. These well-intentioned people are truly as naive as little children. And they must be stopped.

They are, of course, normal intelligent adults. I’m sure they got good grades in school– if you believe in that sort of thing– and they can feed and clothe themselves. They certainly look normal, even stately and wise. It’s just that they are profoundly ignorant about technology projects while being completely oblivious to and complacent about that ignorance. That is the biggest proximal cause of this debacle. It’s called the Dunning-Kruger syndrome (which you can either look up or confidently assure yourself that you don’t need to know about): incompetence of a kind that makes you unable to assess your own lack of competence.

Who am I talking about? I’m talking to some extent about everyone above the first level of management on that project, but mostly I’m talking about anyone who was in the management chain for that project but who has never coded or tested before in their working lives. The non-technical people who created the conditions that the technical people had to work under.

I also blame the technical people in a different way. I’ll get to that, below.

How do I come to this conclusion? Well, take a look at the major possibilities:

Maybe it didn’t fail. Maybe this is normal for projects to have a few glitches? Oh my, no. Project failures are not often clear cut. But among failures, this one is cut as clearly as the Hope Diamond. This is not a near miss. This is the equivalent of sending Hans away to sell the family cow and he comes back with magic beans. It’s the equivalent of going to buy a car and coming back with a shopping cart that has a cardboard sign on which someone has written “CAR” in magic marker. It’s a swing and a miss when the batter was not even holding a bat. It’s so bad, I hope criminal charges are being considered. Make no mistake, the people who ran this project scammed the US government.

Did it fail because it’s too hard a project to do? It’s a difficult project, for sure. It may have been too hard to do under the circumstances prescribed. If so, then we should have heard that message a year ago. Loudly and publicly. We didn’t hear that? Why? Could it have been that the technical people kept their thoughts and feelings carefully shrouded? That’s not what’s being reported. It’s come out that technical people were complaining to management. Management must have quashed those complaints.

Did politics prevent the project from succeeding? No doubt that created a terrible environment in which to produce the system. So what? If it’s too hard, just laugh and say “hey this is ridiculous, we can’t commit to creating this system” UNLESS, of course, you are hoping to hide the problem forever, like a child who has wet the bed and dumps the sheets out the back window. I suppose it’s possible that Republican operatives secretly conspired to make the project fail. If so, I hope that comes out. Doesn’t matter, though. Management could still have seen it coming, unless the whole development team was in on the fix.

Were the technical people incompetent? Probably. It’s likely that many of the programmers were little better than novices, from what I can tell by looking at the bug reports coming through. It was a Children’s Crusade, I guess. But again, so what? The purpose of management, at each of the contracting agencies and above them, is to assess and assure the general competence and performance of the people working on the job. That comes first. I’m sure there were good people mixed in there, somewhere. I have harsh feelings for them, however. I would say to them: Why didn’t you go public? Why didn’t you resign? You like money that much? Your integrity matters that little to you?

Management created the conditions whereby this project was “delivered” in a non-working state. Not like the Dreamliner. The 787 had some serious glitches, and Boeing needs to shape that up. What I’m talking about is boarding an aircraft for a long trip only to be told by the captain “Well, folks it looks like we will be stuck here at the gate for a little while. Maintenance needs to install our wings and engines. I don’t know much about aircraft building, but I promise we will be flying by November 30th. Have some pretzels while you wait.”

Management must bear the prime responsibility for this. I’m not sure that Obama himself is to blame. Everyone under him though? Absolutely.

What About Testing?

Little testing happened on the site. The testing that happened seems to have confirmed what everyone knew. Now this article has come out, about what’s happening behind the scenes. I sure hope they have excellent Rapid Testers working on that, because there is no time for TDD or much of any unit testing and certainly no time to write bloated nonsensical “test case specs” that usually infect government efforts like so much botfly larvae.

Notice the bit at the end?

“It’s a lot of work but people are committed to it. I haven’t heard anyone say it’s not a doable job,” the source said of the November 30th deadline to fix the online portal to purchase insurance on the federal exchange.

Exactly. That’s exactly the problem, Mr. Source. This is what I mean by the tyranny of the innocents. If no one is telling you that the November 30th deadline is not doable, and you think that’s a good sign, then you are an innocent. If you are managing to that expectation then you are a tyrant. It’s probably not doable. I already know that this can’t possibly leave enough time for reasonable testing of the system. Even if it is doable, only a completely dysfunctional project has no one on it speaking openly about whether it is doable.

What Can Be Done?

Politics will ruin everything. I have no institutional solution for this kind of problem. “Best practices” won’t help. Oversight committees won’t help. I can only say that each of us can and should foster a culture of personal ethical behavior. I was on a government project, briefly, years ago. I concluded it was an outlandish waste of taxpayer money and I resigned. I wanted the money. But I resigned anyway. It wasn’t easy. I had car payments and house payments to make. Integrity can be hard. Integrity can be lonely. I don’t always live up to my highest ideals for my own behavior, and when that happens I feel shame. The shame I feel spurs me to be better. That’s all I’m hoping for, really. I hope the people who knew better on this project feel shame. I hope they listen to that shame and go on to be better people.

I do have advice for the innocents. I’ll speak directly to you, Kathy Sebelius, since you are the most public example of who I am talking about…

Hi Kathy,

You’re not a technology person. You shouldn’t have to be. But you need people working for you who are, because technology is opaque. It may surprise you to know that unlike building bridges and monuments, the status of software can be effectively hidden from anyone more than one level above (or sideways from) the programmer or tester who is actually working on that particular piece of it. It’s like managing a gold mine without being able to go down into the mine yourself.

This means you are in a weak position, as an executive. You can pound the table and threaten to fire people, sure. It won’t help. The way in which an executive can use direct power will only make a late software project even later. Every use of direct power weakens your influence. Use indirect power, instead. Imagine that you are taming wild birds. I used to do that as a kid in Vermont. It requires quietness and patience. The first part is to stand for an hour holding birdseed in your hand. Stand quietly and eventually they are landing in your hand.

To have managed this project well, you needed to have created an environment where people could speak without fear. You needed to work with your direct reports to make sure they weren’t filtering out too much of the bad news. You needed to visit the project on a regular basis, and talk to the lowest level people. Then you needed to forgive their managers for not telling you all the bad news. It’s a maddeningly slow process. If you notice, the Pope is currently doing something very similar. Hey, I’m an atheist and yet I find myself listening to that guy. He’s a master of indirect leadership.

You did have the direct power to set expectations. I’m sure you realize you could have done a much better job of that, but perhaps you felt fear, yourself. As your employer (a taxpaying citizen), I bear a little of that responsibility. The country is getting the Healthcare.gov site that it deserves, in a sense.

If you are going to continue in public service, please do yourself and all of us a favor and take a class on software project management. Attend a few lectures. Get smart about what kinds of dodges and syndromes contractors use.

Don’t be an innocent, marching to the slaughter, while millions of dollars line the pockets of the people who run CGI and all those other parasite companies.

— Sincerely, James

My Political Agenda

I have $200,000 of unpaid medical bills due to the crazy jacked up prices and terrible insurance situation for individual citizens in the United States. I am definitely a supporter of the concept of health care reform, even the flawed Obamacare system, if that’s the best we can do for now.

I was pleased to see the failure of the Healthcare.gov website, at first. A little failure helps me make my arguments about how hard it is to do technology well; how getting it right means striving to better ourselves, and no formula or textual incantation will do that for us.

This is too much failure! I want it to stop now. Still, I’m an adult, a software project expert and not in any way an innocent. I know it’s not going to be resolved soon. No Virginia, there won’t be a Healthcare.gov website this Christmas.

Addendum:

From cnn.com:

Summers wrote a memo to the President in 2010 suggesting that HealthCare.gov was not something the government could handle and he needed to bring in experts.

While Summers would not provide details about internal discussions, he said Tuesday, “You need experts. You need to trust but you need to verify. You can’t go rushing the schedule when you get behind or you end up making more errors.”

Damn straight. If this is true then I’m sure glad someone around Obama had basic wisdom. I guess nobody listened to him.

A Satisfying News Story

Here is the difference between happiness and satisfaction:

Consider this quote is from How Zynga went from social gaming powerhouse to has-been.

“During my five-month mark, it started turning sour when we were pushing a lot of code that was destroying the ecosystem—they were not fixing bugs,” he said. “At one time, I had 10,000 players trapped inside a quest. 10,000! The attitude was ‘Don’t worry about them.’ [Management] would rather grab new players, keep them for three months or so, get $5 to $10 from them, and those players would quit and leave.”

When I read this, I am not happy. It’s sad to see hundreds of jobs lost because– among other things– a company loses interest in producing a product that works well.

But I am satisfied when yet another example of a high flying wipeout shows that testing (and bug fixing) matters.

When Does a Test End?

The short answer is: you never know for sure that a test has ended.

Case in point. The license plate on my car is “tester.” It looks like this:

On December 20th, I received this notice in the mail:

As you see, it seems that the city of Everett, which is located between Orcas Island (where I live) and Seattle (where I occasionally visit) felt that I owed them for a parking violation. This is strange because I have never before parked in Everett, much less received a ticket there. A second reason this is strange is that the case number, apparently, is “111111111”.

At first I thought this was a hoax, but the phone number and address is real. The envelope was sent from Livonia, Michigan, and that turns out to be where Alliance One Receivables Management, Inc. is based. They collect money on behalf of many local governments, so that makes sense. It all looked legitimate, except that I’m not guilty, and the case number is weird.

Then it occurred to me that this may have been a TEST! Imagine a tester checking out the system. He might type “tester” for a license plate, not realizing (or not caring) that someone in Washington actually has that plate. He keys in a fake case number of “111111111” because that’s easy to type, and then he forgets to remove that test data from the database.

Praise the Humans

I called the county clerk’s office to ask about this. At first I was worried, because they used an automated phone service. But I quickly got through to a competent human female. What can humans do? Troubleshoot. She told me that there indeed was a record in their system that I owed them money, but that the case number did not refer to a real case. In fact, she said that the number was incorrectly formatted: all their case number start with a “10.”

“This can’t be right,” she said.

“Could it be test data? Are you just starting to use Alliance One?” I asked.

“We’ve been using Alliance One for years. Oh, but we’re just starting to use their electronic ticketing system.”

She told me I was probably right about it being a test, but that she would investigate and get back to me.

A few days later I received this notice:

So, there you have it. Someone ran a test on November 9th that did not conclude until December 23rd when it is stopped via a court order! Thank you, Judge Timothy B. Odell.

I’m sure this will appear on an episode of Law and Order: Clerical Intent one of these days.

Just imagine if this hadn’t been a parking ticket program, but rather something that told the FBI  to go and break down my door…

Morals of the Story

  1. Beware of testing on the production system.
  2. Always give the humans a way to correct the automation when it goes out of control. (Hear that, Skynet?)
  3. You never know when your test is over.
  4. If your name is “tester” or “test” or “testing”, eventually you will show up as test data in somebody’s project. Beware also if your name is “12345”, “asdf”, “qwerty”, “foobar”, or “999999999999999999999999.”

CNN Believes Whatever Computers Say

(CNN) — Early evidence points to driver error as the reason a 2005 Prius sped into a stone wall on March 9, according to federal investigators.

“Information retrieved from the vehicle’s onboard computer systems indicated there was no application of the brakes and the throttle was fully open,” according to a statement from the National Highway Traffic Safety Administration.

The statement suggests the driver may have been stepping on the accelerator, instead of the brake, as she told police.

Note to CNN: Information retrieved from an onboard computer is not reliable in cases where the computer itself is a suspect in the crime. No one is claiming that ghosts are causing the cars to go out of control. And there is no evidence yet that a mechanical failure is the culprit, despite the NHTSA looking hard for that evidence. Alcohol is also not a factor in these specific cases (or else we wouldn’t even be talking about them). That leaves two big possibilities: sober experienced people are suddenly forgetting how to drive OR something is wrong with the software or electronics that controls the cars.

If the software that reads the control inputs has the right kind of fault in it, it may occasionally lose the ability to read and react to control inputs. It is easy for a computer programmer to imagine a situation where the part of the system that records the control settings is working fine, while the part that acts on them is failing. That would be consistent with the facts of this case. Moreover, the problem may be transient, leaving no evidence that it happened.

We don’t know what happened. We may never know. But there is no reason to assume that the computer is infallible. Computers are designed by fallible people.

Toyota Story Analysis

I’m not against Toyota, really. I’m against manufacturers who try to weasel out of their responsibilities after they put people at risk with poor design. I do not know if Toyota is guilty of that, here.

I also don’t know what the real story is with this Sikes fellow who was in the car that he claims went out of control. Maybe he’s a gold digger out to exploit Toyota’s bad fortune.

Finally, I don’t know much about the design of a Prius, except that my understanding is that there are no physical cable linkages. It’s software driven. When you press the brakes you are essentially double clicking on the “brake” icon with your foot mouse, hoping that the operating system agrees to apply the real brakes. A Prius is basically a video game console connected to a car. You drive the console, not the car.

What I’m trying to do is use this situation to teach testing. So, I want to break down a news story I found on cnn.com and show you how a tester would think about it:

Toyota takes aim at California runaway Prius story
by Peter Valdes-Dapena, senior writerMarch 15, 2010: 8:24 PM ET

NEW YORK (CNNMoney.com) — Toyota challenged a California driver’s story of an out-of-control Prius at a press conference Monday afternoon.

Toyota held a press conference about this. That is interesting. Of course that means Toyota’s marketing people are actively involved in this investigation, pacing and fretting for good news to send out. This does not create a conducive atmosphere for an investigation.

Public relations people want to rush out with good news, but keep bad news longer to study it and make very sure it’s really correct. This creates a sort of chromatic distortion of the truth in the near term. We must beware of that. You might say that it’s likely to be “green shifted” truth.

Company executives detailed preliminary findings of a joint investigation conducted by Toyota and the National Highway Traffic Safety Administration into the incident.

I like hearing that the NHTSA is involved in the investigation. Are they leading it, I wonder? You know there are several ongoing investigations into Toyota vehicles by the NHTSA (see their website) including one involving the momentary loss of braking on uneven surfaces (which my father has experienced several times on the dirt road going to his house.)

I feel there’s more credibility if the NHTSA is involved, but the press conference seems to have been a Toyota thing, not an NHTSA thing.

Prius owner Jim Sikes made national headlines last week with claims that his car’s accelerator got stuck as he sped up to pass a car while traveling on California’s I-8 highway outside of San Diego, and that he was unable to stop the car.

“As I was going, I was trying the brakes … and it just kept speeding up,” he said.

Reports from non-technical, non-expert users always must be taken with skepticism, even leaving aside the possibility that the guy is just telling lies. Perhaps he suffers from Munchhausen syndrome. You never know.

But let’s say he’s not lying, just for a moment. The phrase “it just kept speeding up” suggests that the brakes were completely inoperative. I wouldn’t use that phrase, as a driver, if the brakes had engaged and were fighting the motor.

Someone needs to sit the guy down (I assume they’ve done this) and walk through the whole incident moment by moment. Several times. Get him to clarify this.

Inconsistencies alone should not worry us too much. It happened quickly, it was a traumatic event, and his personal account may not be reliable just for that reason. But we still need to wring every bit of information we can from his memory.

Sikes story is at odds with the findings of the investigation, according to Toyota and to a draft congressional memo obtained by CNN.

“While a final report is not yet complete, there are strong indications that the driver’s account of the event is inconsistent with the findings of the preliminary analysis,” Toyota said in a prepared statement.

Sikes said he called 9-1-1 for help as he was traveling in excess of 90 mph on a winding, hilly portion of the interstate. He said dispatchers tried to talk him through ways to stop the car, but nothing helped.

I’d like to hear that 911 call in its entirety.

Eventually, a California Highway Patrol officer was able to catch up to Sikes and used the patrol car’s public address system to instruct Sikes to apply the brakes and the emergency brake at the same time. That tactic worked, and he was able to stop the car.

I’d like to know if Sikes had tried the emergency brake alone, before this. Had he tried both at once before this.

I can understand if he did not try both at once, because a normal driver would think if they don’t work individually, why would they suddenly work together? But  in a drive-by-wire system, everything is mediated by software, and software can get into strange states. It’s technically plausible that only with both brake controls activated the software could be bounced out of whatever strange trance it got into.

However, because driving a hybrid car like the Prius with both the gas pedal and the brakes simultaneously depressed would cause serious damage to the car’s electric motor and, possibly other systems, Toyota says the Prius is designed to prevent that from happening.

Of course it’s designed prevent that. But here’s a testing lesson: designing for prevention is not the same as preventing, because your design may have a bug in it.

All Toyota can say is that it was their intent to design the system to prevent that, and to the best of their knowledge that is how the system works… except in this case it didn’t work– unless the guy is simply lying or insane.

If the brake is pressed at the same time as the gas pedal, power to the engine will be reduced just as if the gas pedal had been released, the automaker said.

Unless, of course, there’s a malfunction of the system, which is exactly the issue under consideration.

During driving tests on Sikes’ Prius and on an identical 2008 Prius, the system operated as expected, according the report, preventing the car from pushing forward while braking.

“The system operating as expected” is not actually possible to determine, because they can’t see inside all the software and hardware to detect that every bit and electron is in the right place.

What they can say is that they detected no problems. Problems may be there, they just did not detect them.

If the visible, detectable problems we want to see are triggered by a transient event, such as a specific combination of foot presses, or perhaps there’s a two microsecond window of opportunity for two software events to happen simultaneously (such as have dogged the Mars rover missions ), then of course driving it around a parking lot is probably not going to reproduce the problem.

It is also possible that part of the problem involves a piece of physical equipment that was lodged or worn in a certain way at a specific temperature, and that condition no longer exists on the car in question.

When we try to reproduce problems, we often have to guess at the causes, and we may guess wrong.

If I were Toyota, I would treat this like an epidemiology problem. You interview people and make a list of absolutely everything that was going on. Did they have a cell phone? What kind? Where was it in the car? Where they using the cup holder? What drink was in the cup? Hot or cold? Was the air conditioner on or off? What was the setting?

Then you put all the data into a database and mine it for patterns.

Investigators are extremely meticulous when taking apart a car in a case like this, said Ed Higgins, a Michigan personal injury attorney who has been involved in automobile defect cases. They are aware their work will be subject to intense scrutiny, so they measure and document everything, he said.

That kind of care takes a lot of time. But it hasn’t been very long since the incident occurred. Have they also taken the software apart? Have they comprehensively reviewed the code? I seriously doubt that.

“I would think that any mechanical defect that would have allowed something to happen that otherwise could not have happened would have stood out like a sore thumb,” he said.

Unless it’s a transient interaction between a mechanical defect and an invisible state within the software.

The car also did not show damage consistent with the engine having been running at full throttle while the brakes were on, according to the report.

That suggests the brakes weren’t on, but not that Sikes wasn’t pushing on the brakes.

“Toyota engineers believe that it would be extremely difficult for the Prius to be driven at a continuous high speed with more than light brake-pedal pressure, and that the assertion that the vehicle could not be stopped with the brakes is fundamentally inconsistent with basic vehicle design and the investigation observations,” Toyota said in a statement.

Again, this all assumes normal circumstances and no transient failures. For the purposes of investigation, that belief is irrelevant.

It is already fundamentally inconsistent with the design of the product that ANY failure could occur. We’ve crossed that bridge, guys.

Remember, when flight 427 crashed, Boeing maintained for years that their rudder mechanism could not possibly have failed– until a new form of failure was discovered (“thermal shock”) and the specific failure reproduced in that very rudder assembly.

The car’s front brakes showed significant wear and overheating, Toyota said. That kind of wear and heat would be consistent with the brakes being lightly applied over a long period of time, executives said at the press event.

Data from on-board computers indicated that Sikes had applied the brakes, to some degree, at least 250 times during the 23 mile incident, Toyota executives said, and that the brakes worked normally each time.

Ooh, I love log files. I wonder what other patterns they can mine from that log file?

If the computers indicate that Sikes had applied the brakes, that shows they were getting some kind of signal from the brake mechanism, but not necessarily the correct signal. Therefore saying “the brakes worked normally each time” is completely unwarranted. Part of the system may have been working normally while another part was going haywire. There’s not way to tell after the fact, because “working normally” is not a detectable condition that gets logged in computer.

Every time you experience a problem in your software, your software, on some level, thinks it is doing the right thing. Software doesn’t “know” it’s misbehaving. It just does what it is told.

Edmunds has independently tested Prius cars similar to Sikes’ and confirmed that the engine would stay engaged if the brakes were only pressed lightly, but not hard enough to actually stop or slow the car, said Dan Edmunds, head of auto testing for the automotive Web site Edmunds.com.

He says “would”, but he should say “would, assuming that there is nothing wrong with the car that would cause it not to”

“If you’re just riding the brakes, it will ride the brakes,” he said.

“These findings certainly raise new questions surrounding the veracity of the sequence of events that has been reported by Mr. Sikes,” said Kurt Bardella, spokesman for Rep. Darrell Issa, R-Calif., and ranking member of the committee.

Sikes’ attorney, John Gomez, denied that the report proves his client was wrong about what happened to his car.

“The notion that they weren’t able to replicate it in this particular case tells us nothing,” he said. “They haven’t been able to replicate a single one of these.”

That’s right. Also, learn the phrase “transient failure mode” and press that point. There are plenty of examples in space missions and airliners of such failures.

Sikes has no plans to sue Toyota, Gomez said.

Gomez is also representing the family of Mark Saylor, a California Highway Patrolman who was killed, along with members of his family in a Lexus sedan that accelerated out of control. A preliminary investigation has found that the accelerator pedal in that car probably became trapped on an all-weather floor mat that had been incorrectly installed in the vehicle.

Toyota has issued a recall for several models, including Sikes’ Prius, to address possible floor mat entrapment. Sikes’ floor mat was not interfering with the accelerator, investigators found, and there were no signs the pedal had become stuck in any way, according to the report.

The investigators findings “suggest that there should be further examination of Mr. Sikes account of the events of March 8,” Toyota said in its statement.

Toyota spokesman Mike Michels also took issue with media coverage of the Sikes incident. Journalists sensationalized an admittedly dramatic event, he said, but the public would have been better served had reporters waited for all the facts.

“We need to let investigations take their course,” he said.

Yes indeed. And this investigation has not done that. Be careful what you wish for, Mr. Michels.

Advice to Lawyers Suing Toyota

A press release by Toyota recently stated:

Toyota’s electronic systems have multiple fail-safe mechanisms to shut off or reduce engine power in the event of a system failure. Extensive testing of this system by Toyota has not found any sign of a malfunction that could lead to unintended acceleration.

Here are some notes for the lawyers suing Toyota. Here is what your testing experts should be telling you:

  • Whoever wrote this, even if he is being perfectly honest, is not in a position to know the status of the testing of Toyota’s acceleration, braking, or fault handling systems. The press release was certainly not written by the lead tester on the project. Toyota would be crazy to let the lead tester anywhere near a keyboard or a microphone.
  • Complete testing of complex hardware/software systems is not possible. But it is possible to do a thorough and responsible job of testing, in conjunction with hazard analysis, risk mitigation, and post-market surveillance. It is also quite expensive, difficult, and time consuming. So it is normal for management in large companies to put terrible pressure of the technical staff to cut corners. The more management levels between the testers and the CEO, the more likely this is to occur.
  • “Extensive testing” has no fixed meaning. To management, and to anyone not versed in testing, ALL testing LOOKS extensive. This is because testing bores the hell out of most people, and even a little of it seems like a lot. You need to find out exactly what the testing was. Look at the production-time testing but focus on the design-time testing. That’s where you’ll be most likely to find the trouble.
  • Even if testing is extensive in general, you need to find out the design history of the software and hardware, because the testing that was done may have been limited to older versions of the product. Inadequate retesting is a common problem in the industry.
  • If Toyota is found to have used an automated “regression suite” of tests, then you need to look for the problem of inadequate sampling. What happens is that the tests are only covering a tiny fraction of the operational space of the product (a fraction of the states it can be in), and then they just run those over and over. It looks like a lot of testing, but it’s really just the same test again and again. Excellent testing requires active inquiry at all times, not just recycling old actions.
  • If Toyota is found not to have used test automation at all, look for a different kind of sampling problem: limited human resources not being able to retest very extensively.
  • Most testers are not very ambitious and not well trained in testing. No university teaches a comprehensive testing curriculum. Testing is an intellectually demanding craft. In some respects it is an art. Examine the training and background of the testing staff.
  • Examine the culture of testing, too. If the corporate environment is one in which initiative is discouraged or all actions are expected to be explicitly justified (especially using metrics such as test case counts, pass/fail rates, cyclomatic complexity, or anything numerical), then testing will suffer. During discovery, subpoena the actual test reports and test documentation and evaluate that.
  • Any argument Toyota makes about extensiveness of testing that is based on numbers can be easily refuted. Numbers are a smoke-screen.
  • Examine the internal defect tracking systems and specifically look to see how intermittent bugs were handled. A lack of intermittent bug reports certainly would indicate something fishy going on.
  • Examine how the design team handled reports from the field of unintended acceleration. Were they systematically reviewed and researched?
  • Depositions of the testers will be critical (especially testers who left the company). It is typical in large organizations for testers to feel intimidated into silence on critical quality matters. It is typical for them to be cut off from the development team. You want to specifically look for the “normalization of risk” problem that was identified in both the Columbia and Challenger shuttle disasters.
  • If the depositions or documentation show that no one raised any concerns about the acceleration or braking systems, that is a potential smoking gun. What you expect in a healthy organization is a lot of concerns being raised and then dealt with forthrightly.
  • Find out what specific organizational mechanisms were used for “bug triage”, which is the process of examining problems reported and decided what to do about them. If there was no triage process, that is either a lie or a gross form of negligence.
  • If Toyota claims to have used “proofs of correctness” in their development of the software controllers, that means nothing. First, obviously they would have to have correctly used proofs of correctness. But secondly, proofs of correctness are simply the modern Maginot line of software safety: defects drive right around them. Imagine that the makers of the Titanic provided “proof” that water cannot penetrate steel plates, and therefore the Titanic cannot sink. Yes steel isn’t porous, but so what? It’s the same with proofs of correctness. They rely on confusing a very specific kind of correctness with the general notion of “things done right.”
  • The anecdotal evidence surrounding unintended acceleration is that it does not only involve acceleration, but also a failure of braking. Furthermore, it’s a very rare occurrence, therefore it’s probably a combination of factors that work together to cause the problem. It’s not surprising at ALL that internal testing under controlled conditions would not reproduce the problem. Look at the history of the crash of US Air flight427, which for years went unsolved until the transient mechanism of thermal shock was discovered.
  • You need to get hold of their code and have it independently inspected. Look at the comments in the code, and examine any associated design documentation.
  • Look at how the engineering team was constituted. Were there dedicated full-time testers? Were they co-located with the development team or stuffed off in another location? How often did the testers and developers speak?
  • What were the change control and configuration management processes? How was the code and design modified over time? Were components of it outsourced? Is it possible that no one was responsible for testing all the systems as a whole?
  • What about testability? Was the system designed with testing in mind. Because, if it wasn’t, the expense and difficulty of comprehensive testing would have been much much higher. Ask if simulators, log files, or any other testability interfaces were used.
  • How did their testing process relate to applicable standards? Was the technical team aware of any such standards?
  • In medical device development, manufacturers are required to do “single-fault condition” testing, where specific individual faults are introduced into the product, and then the product is tested. Did Toyota do this?
  • What specific test techniques and tools did Toyota employ? Compare that to the corpus of commonly known techniques.
  • Toyota cars have “black box” logs that record crucial information. Find out what those logs contain, how to read them, and then subpoena the logs from all cars that may have experienced this problem. Compare with logs from similar unaffected cars.

The best thing would be to reproduce the problem in an unmodified Toyota vehicle, of course. In order to do that, you not only need an automotive engineer and an electrical engineer and a software engineer, you need someone who thinks like a tester.

The unfortunate fact of technological progress is that companies are gleefully plunging ahead with technologies that they can’t possibly understand or fully control. They hope they understand them, of course, but only a few people in the whole company are even competent to decide if that understanding is adequate for the task at hand. Look at the crash of Swiss Air flight 111, for instance: a modern aircraft brought down by its onboard entertainment system, killing all aboard. The pilots had no idea it was even possible for an electrical fire to occur in the entertainment system. Nothing on their checklists warned them of it, and they had no way in the cockpit to disable it even if they’d had the notion to. This was a failure of design; a failure of imagination.

Toyota’s future depends on how they take seriously the possibility of novel, multivariate failure modes, and aggressively update their ideas of safe design and good testing. Sue them. Sue their pants off. This is how they will take these problems seriously. Let’s hope other companies learn from no-pants Toyota.

Tester Pilot

Richard drove up to the hangar just as I was checking the oil on the Husky, his prized baby float plane. Nuts. He was right on time. I was late. I’m supposed to have the plane ready to go when he arrives.

“Hey Dad, looks like a good day for flying. I’m just in the middle of the pre-flight.”

“Where are we going today?” He asked.

“I haven’t been up for a few months, so I figured just a sightseeing tour around the islands and then some pattern work at Friday Harbor.” I hate pattern work: landing and taking off while talking on the radio to the other pilots. That’s exactly why I need to do it, though. I must get over my nerves; must become a safe pilot. It’s a lesson from testing: focus on the risk.

“How much fuel do we need for that?”

“There’s about 18 or 20 gallons on board. That’s actually enough, but I figure it would be better to bring it to 35, just in case.”

“How much will you put in each tank, then?”

“7 gallons.”

“7 plus 7 plus 18 doesn’t add up to 35. Decide on the fuel you want and get that. If you’re going to fudge, fudge toward having more fuel. What are the four most useless things in the world?”

Oh I know this… “Um, altitude above you… runway behind you… gas in the gas truck, and… Um–”

“–and a tenth of a second ago” he finished. “But you remembered the important one. Gas. We don’t want that terrible feeling when they close the airport for 30 minutes to clean something up on the runway, and we don’t have the fuel to divert.”

I could have quibbled with him. What we would actually do in that situation is land 10 minutes away at Friday Harbor airport, or heck, anywhere, because we’re a float plane in the midst of an archipelago. But that’s not the point. The point was the habit of precision; of being conscious and calculated about the risks I take, wherever reasonable. Again this relates to testing. When I’m testing, the habit of precision comes out in considering and controlling the states of my test platforms and test data, and of knowing what to look for in test results.

Dad called the flight center for a briefing. He already knew what they’d say, since he always checked the weather before he left home, but Richard Bach is an especially fastidious pilot. He’s not exactly “by the book.” It’s more that he prides himself on safety. Nothing surprises him without itself being surprised at how prepared he is for it. Yes he knew you were coming; here’s your cake.

Dad and me

The Tester’s Attitude in the Air

My father’s philosophy dovetails perfectly with the tester’s mindset. We expect things to go wrong. We know we will be surprised at times, but we anticipate the kinds of surprises that might occur. That means we perform checks that might seem unnecessary at times. And we keep our eyes open.

I was almost done with the walkaround when he got off the phone.

“Three knots at zero six zero over at Friday,” he announced.

I paused to visualize, then tried to sound authoritative. “That’s a crosswind favoring runway three four.”

“Yes. We have the same conditions here.”

Cool. I got it right. I’m supposed to pretend to be the pilot. Officially, Dad is the pilot-in-command, but I do everything he would do, while he supervises and is ready to take over in case there’s a problem. While I’m doing the preflight, he’s doing it too, but not telling me anything– unless I miss something important. Each time we fly, I’m determined to find a problem with the aircraft that he hasn’t noticed, first. I haven’t yet succeeded.

“Dad, what’s this rust colored streak coming out of this bolt?” Yay, I found something! “There’s one of each side of the elevator.”

“Just a little bit of rust.” He smiled and materialized a can of WD-40 and blasted the bolts with it. This airplane is pristine, so even a little blemish of rust really stands out.

“Were you flying recently?”

“Yeah, I went out last week and splashed around at Lake Watcom.”

“That explains the streaks. Water spray on the tail. Did you pump out the floats afterward?”

“No, but I doubt there’s more than a pint of water in there.”

“Let’s see about that.” I retrieved the hand pump while he popped out the drain plugs. He was right again, I couldn’t suck out more than a cup of water from the floats, total, from all the compartments.

But there was something odd about the last one.

“This water is PINK, Dad!”

Now he was not smiling.

“Unless you landed at the rainbow lake on Unicorn Planet, there may be a hydraulic leak in there.”

He put his fingers in the residue and sniffed it like a bush guide on the trail of a white tiger. “Yeah, that’s what it is. Let’s pop the hatch and take a look.”

Testing With Open Expectations

This is an example of why good testing is inherently an open investigation. Yes, I had definite ideas of what I was testing for: leaky floats. Yes, I had a specific method in mind for performing that test. Had I not a specific method, I would have developed one on the fly. That’s the testing discipline. My oracles were judging the amount of water I was able to pump out of the floats compared to other occasions, and I also tasted the water a couple of times to detect if it was salty. It shouldn’t be, because it had been several flights since we had landed in salt water, but I check just in case there was a previously undetected leak from before. If salt water gets in there, we could have a serious corrosion problem.

I had no conscious intent to check the color of the water. But in testing we take every anomaly seriously. We take data in and ask ourselves, does this make sense? In that way, we are like secret service agents, trained to deal with known threats and to have a good chance to discern fresh and unusual threats, too.

The question “Can I explain everything that I see?” is a good heuristic to keep in mind while testing.

But if I were to have automated my float pump tests, I would never have found this problem. Because unlike a human, a program can’t look at something and just try to explain it. It must be told exactly what to look for.

I got an email, today…

Geoff confirmed the hydraulic leak at the connection in the left float, and will be sealing it, probably tomorrow.  He’ll move the Husky to the big hangar to do the work. Nice, that you decided to pump the floats!

Dad

Nervous About Wolfram

Take a look at the screen shot, below. This is from my first five minutes of playing with Wolfram/Alpha. Do you see what’s wrong with it? I’ll tell you in a minute…

Wolfram/Alpha is the new search engine that isn’t so much a search engine as a find-interesting-ways-to-analyze-data-and-show-it-to-me engine. It’s a closed system, as far as I can tell. It does some cool things. But I don’t understand how they will keep up with the data quality problem.

This worries me because the output from Wolfram/Alpha looks authoritative. I want to be able to trust it. But look at this slightly disturbing problem. I searched for Francis Bacon, but instead of getting a page about the various Francis Bacons of history and having an opportunity to disambiguate, I got the output, below. As you see, it combines information from two different men: Francis Bacon, 1st Viscount St. Alban and Lord Chancellor of England under Elizabeth I, and Francis Bacon, the painter. Furthermore, there appears to be no way to focus the search. Adding search terms that should distinguish between the two men appears to do nothing.

This tells me that there isn’t a lot of data in the system, yet, and that the data that is there may be mangled in ways that I may not notice unless I already know the thing I asked to learn about.

At least with Google and Wikipedia, it’s a relatively open system where I get a variety of results. So, beware, folks.

That said, I’m going back to playing with Wolfram/Alpha some more… Because it’s cool.

fb

Quality is Dead #2: The Quality Creation Myth

One of the things that makes it hard to talk about quality software is that we first must overcome the dominating myth about quality, which goes like this: The quality of a product is built into it by its development team. They create quality by following disciplined engineering practices to engineer the source code so that it will fulfill the requirements of the user.

This is a myth, not a lie. It’s a simplified story that helps us make sense of our experience. Myths like this can serve a useful purpose, but we  must take care not to believe in them as if they were the great and hoary truth.

Here are some of the limitations of the myth:

  1. Quality is not a thing and it is not built. To think of it as a thing is to commit the “reification fallacy” that my colleague Michael Bolton loves to hate. Instead, quality is a relationship. Excellent quality is a wonderful sort of relationship. Instead of “building” quality, it’s more coherent to say we arrange for it. Of course you are thinking “what’s the difference between arrange and build? A carpenter could be said to arrange wood into the form of a cabinet. So what?” I like the word arrange because it shifts our attention to relationships and because arrangement suggests less permanence. This is important because in technology we are obliged to work with many elements that are subject to imprecision, ambiguity and drift.
  2. A “practice” is not the whole story of how things get done. To say that we accomplish things by following “practices” or “methods” is to use a figure of speech called a synecdoche– the substitution of a part for the whole. What we call practices are the public face of a lot of shadowy behavior that we don’t normally count as part of the way we work. For instance, joking around, or eating a salad at your desk, or choosing which email to read next, and which to ignore. A social researcher examining a project in progress would look carefully at who talks to whom, how they talk and what they talk about. How is status gained or lost? How do people decide what to do next? What are the dominant beliefs about how to behave in the office? How are documents created and marketed around the team? In what ways do people on the team exert or accept control?
  3. Source code is not the product. The product is the experience that the user receives. That experience comes from the source code in conjunction with numerous other components that are outside the control and sometimes even the knowledge of product developers. It also comes from documentation and support. And that experience plays out over time on what is probably a chaotic multi-tasking computing environment.
  4. “Requirements” are not the requirements, and the “users” are not the users. I don’t know what my requirements are for any of the software I have ever used. I mean, I do know some things. But for anything I think I know, I’m aware that someone else may suggest something that is different that might please me better. Or maybe they will show me how something I thought was important is actually harmful. I don’t know my own requirements for certain. Instead, I make good guesses. Everyone tries to do that. People learn, as they see and work with products, more about what they want. Furthermore, what they want actually changes with their experiences. People change. The users you think you are targeting may not be the users you get.
  5. Fulfillment is not forever and everywhere. The state of the world drifts. A requirement fulfilled today may no longer be fulfilled tomorrow, because  of a new patch to the operating system, or because a new competing product has been released.  Another reason we can’t count on a requirement being fulfilled is that can does not mean will. What I see working with one data set on one computer may not work with other data on another computer.

These factors make certain conversations about quality unhelpful. For instance, I’m impatient when someone claims that unit testing or review will guarantee a great product, because unit testing and review do not account for system level effects, or transient data occurring in the field, or long chains of connected transactions, or intermittent failure of third-party components. Unit testing and review focus on source code. But source code is not the product. So they can be useful, but they are still mere heuristic devices. They provide no guarantee.

Once in a while, I come across a yoho who thinks that a logical specification language like “Z” is the great solution. Because then your specification can be “proven correct.” The big problems with that, of course, is that correctness in this case simply means self-consistency. It does not mean that the specification corresponds to the needs of the customer, nor that it corresponds to the product that is ultimately built.

I’m taking an expansive view of products and projects and quality, because I believe my job is to help people get what they want. Some people, mainly those who go on and on about “disciplined engineering processes” and wish to quantify quality, take a narrower view of their job. I think that’s because their overriding wish is that any problems not be “their fault” but rather YOUR fault. As in, “Hey, I followed the formal spec. If you put the wrong things in the formal spec, that’s YOUR problem, stupid.”

My Take on the Quality Story

Let me offer a more nuanced version of the quality story– still a myth, yes– but one more useful to professionals:

A product is a dynamic arrangement, like a garden that is subject to the elements. A high quality product takes skillful tending and weeding over time. Just like real gardeners, we are not all powerful or all knowing as we grow our crop. We review the conditions and the status of our product as we go. We try to anticipate problems, and we react to solve the problems that occur. We try to understand what our art can and cannot do, and we manage the expectations of our customers accordingly. We know that our product is always subject to decay, and that the tastes of our customers vary. We also know that even the most perfect crop can be spoiled later by a bad chef. Quality, to a significant degree, is out of our hands.

After many years of seeing things work and fail (or work and THEN fail), I think of quality as ephemeral. It may be good enough, at times. It may be better than good enough. But it fades; it always fades, like something natural.

Or like sculpture by Andy Goldsworthy.  (Check out this video.)

This is true for all software, but the degree to which it is a problem will vary. Some systems have been built that work well over time. That is the result of excellent thinking and problem solving on the part of the development team. But I would argue it is also the result of favorable conditions in the surrounding environment. Those conditions are subject to change without notice.

James Tam: Customer Service that Works

After Adam White’s test of of Rypple.com, I decided to try it myself. I soon ran into a fairly serious problem. I was able to try the service without registering, but when I tried to register, the system claimed I already was registered. Then when I tried to reset my password, it claimed I was not registered. Seemed like someone’s database tables were in a bunch.

(Oh well, quality is dead…)

But since I was so hard on WebGreeter about their creepy customer service technology. I thought I’d call Rypple’s toll free line and take my chances.

Aside: I hate calling tech support. I hate the stonewalling and being patronized by poorly trained anonymous dunderheads. Even if I were to lose most of my fingers in a data mining accident I could still count on one hand the good experiences I’ve had with tech support.

Anyway, I had a really good experience. After one ring, James Tam answered. That’s right no voicemail menu! I had tapped the man Tam himself!

The first thing I said was “I want to report a bug on your system.” I was expecting a defensive or brusque reply. Instead he asked me to tell him about it, and I did.

Well here’s the thing. He couldn’t solve my problem right away. He tried. I believe he’s working on it right now. But one thing he said struck me: “This looks like a problem on our side.”

Let me savor those words again…

This looks like a problem on our side.

Somebody taking responsibility?? How often do you hear that from a website support guy? Assuming you ever get to talk to one?

Look what happened. On paper, I had a bad quality experience. Yet I feel good about Rypple. Go Rypple.

The IMVU Shuffle

Michael Bolton reported on our quick test of IMVU, whose development team brags about having no human-mediated test process before deploying their software to the field.

Some commentors have pointed out that the bugs we found in our twenty minute review weren’t serious– or couldn’t have been– because the IMVU  developers feel successful in what they have produced, and apparently, there are satisfied users of the service.

Hearing that, I’m reminded of the Silver Bridge, which fell down suddenly, one day, after forty years of standing up. Up to that day, it must have seemed quite reasonable to claim that the bridge was a high quality bridge, because– look!– it’s still standing! But lack of information is not proof of excellence, it turns out. That’s why we test. Testing doesn’t provide all possible information, but it provides some. Good testing will provide lots of useful information.

I don’t know if the IMVU system is good enough. I do know that IMVU has no basis to claim that their “continuous integration” process, with all their “automated test cases” has anything to do with their success. By exactly the same “not dead yet” argument, they could justify not running any test cases at all. I can’t help but mention that the finance industry used the same logic to defend deregulation and a weak enforcement of the existing laws that allowed Ponzi schemes and credit swap orders to cripple the world economy. Oops, there goes a few trillion dollars– hey maybe we should have been doing better oversight all these years!

It may be that no possible problem that could be revealed by competent testing would be considered a bad problem byIMVU. If that is the case, then the true reason they are successful is that they have chosen to offer a product that doesn’t matter to people who will accept anything they are offered. Of course, they could use ANY set of practices to do that.

Clearly, what they think they’ve done is establish a test process through automation that will probably discover any important problem that could happen before they release. That’s why Michael and I tested it, and we quickly verified what we expected to find: several problems that materially interfered with the claimed functionality of IMVU, and numerous glitches that suggested the presence of more serious problems nearby. Maybe its present users are willing to put up with it, or maybe they are willing to put up with it for now. But that’s not the point.

The point is that IMVU is not doing certain ordinary and obvious things that would reveal problems in their product and they promote that approach to doing business as if it’s an innovation instead of an evasion of responsibility.

The IMVU people can’t know whether there are, in fact, serious problems in their product because they have chosen not to discover them. That they promote this as a good practice (and that manual testing doesn’t scale, which is also bullshit) tells me that they don’t know what testing is for and they don’t know the difference between testing and a set of computerized behaviors called “test cases”.

They are setting themselves up to rediscover what many others have before them– why we test. Their own experiences will be the best teacher. I predict they will have some doozies.

Quality is Dead #1: The Hypothesis

Quality is dead in computing. Been dead a while, but like some tech’d up version of Weekend at Bernie’s, software purveyors are dressing up its corpse to make us believe computers can bring us joy and salvation.

You know it’s dead, too, don’t you? You long ago stopped expecting anything to just work on your desktop, right? Same here. But the rot has really set in. I feel as if my computer is crawling with maggots. And now it feels that way even when I buy a fresh new computer.

My impression is that up to about ten years ago most companies were still trying, in good faith, to put out a good product. But now many of them, especially the biggest ones, have completely given up. One sign of this is the outsourcing trend. Offshore companies, almost universally, are unwilling and unable to provide solid evidence of their expertise. But that doesn’t matter, because the managers offering them the work care for nothing but the hourly rate of the testers. The ability of the testers to test means nothing. In fact, bright inquisitive testers seem to be frowned upon as troublemakers.

This is my Quality is Dead hypothesis: a pleasing level of quality for end users has become too hard to achieve while demand for it has simultaneously evaporated and penalties for not achieving it are weak. The entropy caused by mindboggling change and innovation in computing has reached a point where it is extremely expensive to use traditional development and testing methods to create reasonably good products and get a reasonable return on investment. Meanwhile, user expectations of quality have been beaten out of them. When I say quality is dead, I don’t mean that it’s dying, or that it’s under threat. What I mean is that we have collectively– and rationally– ceased to expect that software normally works well, even under normal conditions. Furthermore, there is very little any one user can do about it.

(This explains how it is possible for Microsoft to release Vista with a straight face.)

I know of a major U.S. company, that recently laid off a group of more than a dozen trained, talented, and committed testers, instead outsourcing that work to a company in India that obviously does not know how to test (judging from documents shown to me). The management of this well-known American company never talked to their testers or test managers about this (according to the test manager involved and the director above him, both of whom spoke with me). Top management can’t know what they are giving up or what they are getting. They simply want to spend less on testing. When testing becomes just a symbolic ritual, any method of testing will work, as long as it looks impressive to ignorant people and doesn’t cost too much. (Exception: sometimes charging a lot for a fake service is a way to make it seem impressive.)

Please don’t get me wrong. Saving money is not a bad thing. But there are ways to spend less on testing without eviscerating the quality of our work. There are smart ways to outsource, too. What I’m talking about is that this management team obviously didn’t care. They think they can get away with it. And they can: because quality is dead.

I’m also not saying that quality is dead because people in charge are bad people. Instead what we have are systemic incentives that led us to this sorry state, much as did the incentives that resulted in favorable conditions for cholera and plague to sweep across Europe, in centuries past, or the conditions that resulted in the Great Fire of London. It took great disasters to make them improve things.

Witness today how easily the financial managers of the world are evading their responsibility for bringing down the world economy. It’s a similar deal with computing. Weak laws pertaining to quality, coupled with mass fatalism that computers are always going to be buggy, and mass acceptance of ritualistic development and testing practices make the world an unsafe place for users.

If we use computers, or deal with people who do, we are required to adapt to failure and frustration. Our tools of “productivity” suck away our time and confidence. We huddle in little groups on the technological terrain, subject to the whims and mercies of the technically elite. This is true even for members of the technically elite– because being good in one technology does not mean you have much facility with the 5,000 other technologies out there. Each of us is a helpless user, in some respect.

Want an illustration? Just look at my desktop:

  • Software installation is mysterious and fragile. Can I look at any given product on my system and determine if it is properly installed and configured? No.
  • Old data and old bits of applications choke my system. I no longer know for sure what can be thrown away, or where it is. I seem to have three temp folders on my system. What is in them? Why is it there?
  • My task manager is littered with mysterious processes. Going through, googling each one, and cleaning them up is a whole project in and of itself.
  • I once used the Autoruns tool to police my startup. Under Vista, this has become a nightmare. Looking at the Autoruns output is a little like walking into that famous warehouse in Indiana Jones. Which of the buzillion processes are really needed at startup?
  • Mysterious pauses, flickers, and glitches are numerous and ephemeral. Investigating them saps too much time and energy.
  • I see a dozen or two “Is it okay to run this process?” dialog boxes each day, but I never really know if it’s okay.  How could I know? I click YES and hope for the best.
  • I click “I Agree” to EULAs that I rarely read. What rights am I giving away? I have no idea. I’m not qualified to understand most of what’s in those contracts, except they generally disclaim responsibility for quality.
  • Peripherals with proprietary drivers and formats don’t play well with each other.
  • Upgrading to a new computer is now a task comparable with uprooting and moving  to a new city.
  • I’m sick of becoming a power user of each new software package. I want to use my time in other ways, so I remain in a state of ongoing confusion.
  • I am at the mercy of confused computers and their servant who work for credit agencies, utility companies and the government.
  • I have to accept that my personal data will probably be stolen from one of the many companies I do business with online.
  • Proliferating online activity now results in far flung and sometimes forgotten pockets of data about me, clinging like Spanish Moss on the limbs of the Web.

Continuous, low grade confusion and irritation, occasionally spiking to impotent rage, is the daily experience of the technically savvy knowledge worker. I shudder to think what it must be like for computerphobes.

Let me give you one of many examples of what I’m talking about.

I love my Tivo. I was a Tivo customer for three years. So why am I using the Dish Network and not Tivo? The Dish Network DVR sucks. I hate you Dish Network DVR developers! I HATE YOU! HAVEN’T YOU EVER SEEN A TIVO??? DO YOU NOT CARE ABOUT USABILITY AND RELIABILITY, OR ARE YOU TOTAL INCOMPETENT IDIOTS???

I want to use a Tivo, but I can’t use it with the Dish Network. I have to use their proprietary system. I don’t want to use the Dish Network either, but DirectTV was so difficult to deal with for customer service that I refuse to be their customer any more. The guy who installed my Dish Network DVR told me that its “much better than Tivo.” The next time I see him, I want to take him by the scruff of his neck and rub his nose on the screen of my Dish Network DVR as it fails once again to record what I told it to record. You know nothing of Tivos you satellite installer guy! Do not ever criticize Tivo again!

Of all the technology I have knowingly used in the last ten years, I would say I’m most happy with the iPod, the Tivo, and the Neatworks receipt scanning system. My Blackberry has been pretty good, too. Most other things suck.

Quality is dead. What do we do about that? I have some ideas. More to come…

WebGreeter Fails Turing Test

Beware, if you visit WebGreeter.com a disturbing thing will happen. You will be immediately accosted by what appears to be a chatbot, but is apparently a human doing a creepy impression of a chatbot. This cyborg thing will ask you for your contact information. If you give it to them, they may use it right away to call you on the phone.

Live Attractive Smiling Operator! Click Now!

Live Attractive Smiling Operator! Click Now!

Based on their website, and the high pressure salesman who called me without an invitation, they believe in an aggressive approach to sales. The salesman even used the word “aggressive” several times in his pitch.

The first time I encountered this technology was on a website for expert witnesses. I had trouble using the site and clicked on the “Live Operator” icon. A chat window opened, but everything that they said to me appeared to have come from a script, while seeming to ignore the gist of my questions. I was kind of offended, since this cyborg thing insisted it was a live operator (each time using exactly the same perfectly worded sentence to convey that) while continuing to answer a different question than the one that I asked. At one point I challenged it to type something instead of pasting a canned response, to which I got an obviously canned response that it was not allowed to engage in personal conversation.

(What? Isn’t that the point of a live operator? The ability to engage on a personal level, human-to-human, empathically? Do they think I want a live operator so that I can be ignored in person?)

I concluded that it was not human. But when I went to the WebGreeter website to learn about this weird chatbot technology, I found out it really HAD been a human. That human failed the Turing Test, by successfully declining to use human communication skills with me.

Folks, this is what I’m complaining about with scripting, scripted testing, scripted behavior of any kind. My situation required an appropriately trained and reasonably motivated human to listen to my questions and help me solve my problem. Instead, I meet someone’s idea of an efficient technology that creepily removed what’s good about humanity while being able, technically, to claim humanity. The human in that equation could hide behind rules, pretending to help me while rendering no assistance at all.

I don’t mind that they have scripts. I mind when they are followed instead of applied. This distinction is crucial. In applying, the human remains in charge. In following, the human actually lobotomizes himself in an effort to become animated furniture.

When we deal with people, we ought to be able to trust in certain fundamental human qualities. But just as this crazy WebGreeter company has thrown those out the window, so too have many “process” people torn them out of software projects. Or tried to. Instead they want us to perform rituals. Nice repeatable rituals.