Round Earth Test Strategy

The “test automation pyramid” (for examples, see here, here, and here) is a popular idea, but I see serious problems with it. I suggest in this article an alternative way of thinking that preserves what’s useful about the pyramid, while minimizing those problems:

  1. Instead of a pyramid, model the situation as concentric spheres, because the “outer surface” of a complex system generally has “more area” to worry about;
  2. ground it by referencing a particular sphere called “Earth” which is familiar to all of us because we live on its friendly, hospitable surface;
  3. illustrate it with an upside-down pyramid shape in order to suggest that our attention and concern is ultimately with the surface of the product, “where the people live” and also to indicate opposition to the pyramid shape of the Test Automation Pyramid (which suggests that user experience deserves little attention);
  4. incorporate dynamic and well as static elements into the analogy (i.e. data, not just code);
  5. acknowledge that we probably can’t or won’t directly test the lowest levels of our technology (i.e. Chrome, or Node.js, or Android OS). In fact, we are often encouraged to trust it, since there is little we can do about it;
  6. use this geophysical analogy to explain more intuitively why a good tooling strategy can access and test the product on a subterranean level, though not necessarily at a level below that of the platforms we rely upon.

Good analogies afford deep reasoning.

The original pyramid (really a triangle) was a context-free geometric analogy. It was essentially saying: “Just as a triangle has more area in its lower part than its upper part, so you should make more automated tests on lower levels than higher levels.” This is not an argument; this is not reasoning. Nothing in the nature of a triangle tells us how it relates to technology problems. It’s simply a shape that matches an assertion that the authors wanted to make. It’s semiotics with weak semantics.

It is not wrong to use semantically arbitrary shapes to communicate, of course (the shapes of a “W” and an “M” are opposites, in a sense, and yet nobody cares that what they represent are not opposites). But at best, it’s a weak form of communication. A stronger form is to use shapes that afford useful reasoning about the subject at hand.

The Round Earth model tries to do that. By thinking of technology as concentric spheres, you understand that the volume of possibilities– the state space of the product– tends to increase dramatically with each layer. Of course, that is not necessarily the case, because a lot of complexity may be locked away from the higher levels by the lower levels. Nevertheless that is a real and present danger with each layer you heap upon your technology stack. An example of this risk in action is the recent discovery that HTML emails defeat the security of PGP email. Whoops. The more bells, whistles, and layers you have, the more likely some abstraction will be fatally leaky. (One example of a leaky abstraction is the concept of “solid ground,” which can both literally and figuratively leak when hot lava pours out of it. Software is built out of things that are more abstract and generally much more leaky than solid ground.)

When I tell people about the Round Earth model they often start speaking of caves, sinkholes, landslides, and making jokes about volcanoes and how their company must live over a “hot spot” on that Round Earth. These aren’t just jokes, they are evidence that the analogy is helpful, and relates to real issues in technology.

Note: If you want to consider what factors make for a good analogy, Michael Bolton wrote a nice essay about that (Note: he calls it metaphor, but I think he’s referring to analogies).

The Round Earth model shows testing problems at multiple levels.

The original pyramid has unit testing at the bottom. At the bottom of the Round Earth model is the application framework, operating environment, and development environment– in other words, the Platform-That-You-Don’t-Test. Maybe someone else tests it, maybe they don’t. But you don’t know and probably don’t even think about it. I once wrote Assembler code to make video games in 16,384 bytes of memory. I needed to manage every byte of memory. Those days are long gone. Now I write Perl code and I hardly think about memory. Magic elves do that work, for all I know.

Practically speaking, all development rests on a “bedrock” of assumptions. These assumptions are usually safe, but sometimes, just as hot lava or radon gas or toxified groundwater breaks through bedrock, we can also find that lower levels of technology undermine our designs. We must be aware of that general risk, but we probably won’t test our platforms outright.

At a higher level, we can test the units of code that we ourselves write. More specifically, developers can do that. While it’s possible for non-developers to do unit-level checks, it’s a much easier task for the devs themselves. But, realize that the developers are working “underground” as they test on a low level. Think of the users as living up at the top, in the light, whereas the developers are comparatively buried in the details of their work. They have trouble seeing the product from the user’s point of view. This is called “the curse of expertise:”

“Although it may be expected that experts’ superior knowledge and experience should lead them to be better predictors of novice task completion times compared with those with less expertise, the findings in this study suggest otherwise. The results reported here suggest that experts’ superior knowledge actually interferes with their ability to predict novice task performance times.”

[Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2), 205–221. doi:10.1037/1076-898x.5.2.205]

While geophysics can be catastrophic, it can also be more tranquil than a stormy surface world. Unit level checking generally allows for complete control over inputs, and there usually aren’t many inputs to worry about. Stepping up to a higher level– interacting sub-systems– still means testing via a controlled API, or command-line, rather than a graphical interface designed for creatures with hands and eyes and hand-eye coordination. This is a level where tools shine. I think of my test tools as submarines gliding underneath the storm and foam, because I avoid using tools that work through a GUI.

The Round Earth model reminds us about data.

Data shows up in this model, metaphorically, as the flow of energy. Energy flows on the surface (sunlight, wind and water) and also under the surface (ground water, magma, earthquakes). Data is important. When we test, we must deal with data that exists in databases and on the other side of micro-services, somewhere out in the cloud. There is data built into the code, itself. So, data is not merely what users type in or how they click. I find that unit-level and sub-system-level testing often neglects the data dimension, so I feature it prominently in the Round Earth concept.

The Round Earth model reminds us about testability.

Complex products can be designed with testing in mind. A testable product is, among other things, one that can be decomposed (taken apart and tested in pieces), and that is observable and controllable in its behaviors. This usually involves giving testers access to the deeper parts of the product via command-line interfaces (or some sort of API) and comprehensive logging.

Epigrams

  • Quality above requires quality below.
  • Quality above reduces dependence on expensive high-level testing.
  • Inexpensive low-level testing reduces dependence on expensive high-level testing.
  • Risk grows toward the user.

 

 

A Six-fold Example from Pradeep Soundararajan

Pradeep blogged this, today.

I need to amplify it because it provides a nice example of at least six useful and important patterns all in one post. This is why I believe Pradeep is one of the leading Indian testers.

Practical advice: “Ask for testability”

His story is all about asking for testability and all the good things that can come from that. It’s rare to see a good example present so vividly. I wanted more details, but the details he gave were enough to carry the point and fire the imagination.

Practical advice: “Try video test scripting”

I have never heard of using videos for scripted testing. Why didn’t I think of that?

Testing as a social process

Notice how many people Pradeep mentions in his post. Notice the conversations, the web of relationships. This aspect of testing is profoundly important, and it’s one that I find Pradeep to excel in. It’s kind of like x-ray vision– the ability to see past the objects of the project to the true bones of it, which is how people think of each other, communicate with, and influence each other. Pradeep’s story is a little bit technical, but it’s mostly social, as I read it.

Experience report

Pradeep’s post is an example of an experience report. Not many of them around. It’s like sighting a rare orchid. He published it with the support of his client, otherwise we’d never have seen it. That’s why there can never be an accurate or profound history written about the craft of testing: almost everything is kept secret. The same dynamic helps preserve bad practice in testing, because that bad practice thrives in the darkness just as roaches do.

Sapient tester blogging

I have referred in the past to a phenomenon I call “sapient tester blogs.” These are introspective, self-critical, exploratory essays written by testers who see testing as a complex cognitive activity and seek to expand and develop their thinking. It’s particularly exciting to see that happening in India, which brings me to the final point…

Leadership in Indian testing

There’s not a lot of good leadership in Indian testing. Someday there will be. It’s beginning to happen. Pradeep’s post is an example of what that looks like.

There must be more than a hundred thousand testers in India. (I wonder if some agency keeps statistics on that?) I would expect to see at least a hundred great tester blogs from India, not six!

Logging: Exploratory Tester’s Friend

I’m on a new project lately, working with a team at QualiTest. We’re testing a class III medical device. This is an exciting project, because for the first time I am aware of, formalized exploratory testing will be used to do such a validation. We will not rely on masses of procedural test scripts. I’ve been called in on this project because I created the first published formalized ET process in 1999 (for Microsoft), and created, with my brother Jon, session-based test management, which is basically a general form of that Microsoft process.

The QualiTest team consists of senior testers hand-picked for this job, who have regulatory testing backgrounds and an enthusiasm to use their brains while they test. On top of testing well, we have to document our testing well, and trace our testing to requirements. Automatic logging is one of the tools that will help us do that.

I am amazed at how crazy nuts some people get over documentation– how they sweat and shiver if they don’t have a script to cling to– and yet they don’t spare a thought for logging. Logging is great for testers, programmers, and technical support. Logging is automatic documentation. Sing the praises of logging.

I’m talking about function-level logging built into the products we test.

If you test a web app, you already have this (the web server and application logs, plus the use of a proxy to log locally, if you want) or would have it with a little tweak here and there by the programmer. For desktop apps, the programmer has to build it in. Here’s why he should do that right away:

  1. Instead of following a script written weeks or months ago by some over-literal, function-besotted and data-blind intern, the tester can think, explore, play, and maintain the thread of inquiry without worrying that you won’t know what you tested, later on.
  2. Instead of remembering what you tested, the product tells you how you tested it. Process the log with a simple Perl script, and you can potentially have an automatically generated test report.
  3. Instead of just wondering how you made that crazy bug happen, the developer can consult the log.
  4. Instead of asking the customer what he was doing moments before the crash, he asks for the log.

If logging is built into the base classes of the product, very little coding is involved.

This idea first occurred to me in 1993, after hearing from John Musa about how his telecom systems would “phone home” with data about how they were being used, but I couldn’t get a programmer to put logging into anything I tested until I was at SmartPatents in 1997. Since then I’ve helped several projects, including a couple of medical device projects, get going with it.

On this most recent project I was asked to create requirements to specify the logging. Here is the generic version of what I came up with:

1. Each significant action that the user takes shall be logged. (pressing buttons, touching screen objects, turning knobs, startup and shutdown, etc.) This provides critical information needed to demonstrate test coverage during validation, and improves our ability to meet and exceed regulatory requirements.

2. The results of any diagnostic self-tests or assert failures shall be logged.

3. Any function should be logged, regardless of user action, that causes a change to data, screen display, system configuration, modes or settings, communicates with other equipment, or produces an error or information message.

4. Everything that could be interesting and useful for testing, support, and system debugging should be logged UNLESS the event occurs so frequently (many times a second) that it poses a performance or reliability risk.

5. Each log event shall include at least the following information:
– Time stamp: For instantaneous events, time stamp (millisecond resolution). For events over time log the start and stop times by logging it as two separate events (e.g. “Event START”, “Event END”). Events that set a persistent mode or state can be logged as one event (“high security mode ON”) but the state of any such modes shall be automatically logged at startup and shutdown so that a complete record of that setting can be maintained over time.
– Event type ID: always unique to event type; IDs not re-used if an event is retired and a new event is created.
– Event type description: short, unique human readable label
– Event information: any data associated with the event that may be useful for customer service or assessing test coverage, this data may be formatted in ways specific to that event type.

6. At startup and shutdown, the current settings, modes, and confuguration shall be recorded to the log.

7. Any errors shall be recorded to the log, including the actual text of the error message.

8. Every type of loggable event shall be stored in one table in the source code or in a data structure accessible on the system itself, such as a header file, enum, array or resource file. This facilitates providing the validation and customer service teams with a complete list of all possible events.

9. The log format shall be in text form, structured and delimited consistently such that it can be parsed automatically by a third party tool. The data for each event should be on one line, or else be marked with standard start and end markers.

10. The log format should be structured and delimited such that it is reasonably human readable (such as tab delimited).

11. The level of detail included in the log file should be configurable in terms of preset levels: 1- error and service events only, 2- Functional events, error events, service events, 3- All events including diagnostic information messages about internal states and parameters.

12. The log should behave as a ring buffer with a maximum of X events (where X is configurable within some limit that would not be exceeded in 7 days of heaviest anticipated use). If the size of the log exceeds available space, the oldest events shall be discarded first.

13. When the log is exported, it should have a header that identifies the software version (and serial number of the HW, if applicable) and current configuration.

Testability Through Audibility

I was working with a client today who complained that there were hidden errors buried in a log file produced by the product he was testing. So, I wrote him a tool that continuously monitors any text file, such as a server log (as long as it is accessible through the file system, as in the case of a test server running locally) and plays WAV files whenever certain string patterns appear in the stream.

With this little tool, a streaming verbose log can be rendered as a stream of clicks and whirrs, if you want, or you can just have it yell “ERROR!” when an error pops up in the log. All this in real time without taking your eyes off the application. Using this, I found a bug in a browser based app whereby perfectly ordinary looking HTML displayed on the screen coincided with a Java null pointer exception in the log.

I released this bit of code with the GPL 2.0 license and you can find it here:

http://www.satisfice.com/tools/log-watch.zip

By the way, this is an example of what I call agile test tooling. I paired with a tester. I heard a complaint. I offered a tool idea. The tester said “yes, please.” I delivered the tool the next day. As we were playing with it, I added a couple of features. I don’t believe you have to be a programmer to be a great tester, but it helps to have a programmer or two on the testing staff. It’s nice work for programmers like me, who get bored with long term production coding.