Round Earth Test Strategy

The “test automation pyramid” (for examples, see here, here, and here) is a popular idea, but I see serious problems with it. I suggest in this article an alternative way of thinking that preserves what’s useful about the pyramid, while minimizing those problems:

  1. Instead of a pyramid, model the situation as concentric spheres, because the “outer surface” of a complex system generally has “more area” to worry about;
  2. ground it by referencing a particular sphere called “Earth” which is familiar to all of us because we live on its friendly, hospitable surface;
  3. illustrate it with an upside-down pyramid shape in order to suggest that our attention and concern is ultimately with the surface of the product, “where the people live” and also to indicate opposition to the pyramid shape of the Test Automation Pyramid (which suggests that user experience deserves little attention);
  4. incorporate dynamic and well as static elements into the analogy (i.e. data, not just code);
  5. acknowledge that we probably can’t or won’t directly test the lowest levels of our technology (i.e. Chrome, or Node.js, or Android OS). In fact, we are often encouraged to trust it, since there is little we can do about it;
  6. use this geophysical analogy to explain more intuitively why a good tooling strategy can access and test the product on a subterranean level, though not necessarily at a level below that of the platforms we rely upon.

Good analogies afford deep reasoning.

The original pyramid (really a triangle) was a context-free geometric analogy. It was essentially saying: “Just as a triangle has more area in its lower part than its upper part, so you should make more automated tests on lower levels than higher levels.” This is not an argument; this is not reasoning. Nothing in the nature of a triangle tells us how it relates to technology problems. It’s simply a shape that matches an assertion that the authors wanted to make. It’s semiotics with weak semantics.

It is not wrong to use semantically arbitrary shapes to communicate, of course (the shapes of a “W” and an “M” are opposites, in a sense, and yet nobody cares that what they represent are not opposites). But at best, it’s a weak form of communication. A stronger form is to use shapes that afford useful reasoning about the subject at hand.

The Round Earth model tries to do that. By thinking of technology as concentric spheres, you understand that the volume of possibilities– the state space of the product– tends to increase dramatically with each layer. Of course, that is not necessarily the case, because a lot of complexity may be locked away from the higher levels by the lower levels. Nevertheless that is a real and present danger with each layer you heap upon your technology stack. An example of this risk in action is the recent discovery that HTML emails defeat the security of PGP email. Whoops. The more bells, whistles, and layers you have, the more likely some abstraction will be fatally leaky. (One example of a leaky abstraction is the concept of “solid ground,” which can both literally and figuratively leak when hot lava pours out of it. Software is built out of things that are more abstract and generally much more leaky than solid ground.)

When I tell people about the Round Earth model they often start speaking of caves, sinkholes, landslides, and making jokes about volcanoes and how their company must live over a “hot spot” on that Round Earth. These aren’t just jokes, they are evidence that the analogy is helpful, and relates to real issues in technology.

Note: If you want to consider what factors make for a good analogy, Michael Bolton wrote a nice essay about that (Note: he calls it metaphor, but I think he’s referring to analogies).

The Round Earth model shows testing problems at multiple levels.

The original pyramid has unit testing at the bottom. At the bottom of the Round Earth model is the application framework, operating environment, and development environment– in other words, the Platform-That-You-Don’t-Test. Maybe someone else tests it, maybe they don’t. But you don’t know and probably don’t even think about it. I once wrote Assembler code to make video games in 16,384 bytes of memory. I needed to manage every byte of memory. Those days are long gone. Now I write Perl code and I hardly think about memory. Magic elves do that work, for all I know.

Practically speaking, all development rests on a “bedrock” of assumptions. These assumptions are usually safe, but sometimes, just as hot lava or radon gas or toxified groundwater breaks through bedrock, we can also find that lower levels of technology undermine our designs. We must be aware of that general risk, but we probably won’t test our platforms outright.

At a higher level, we can test the units of code that we ourselves write. More specifically, developers can do that. While it’s possible for non-developers to do unit-level checks, it’s a much easier task for the devs themselves. But, realize that the developers are working “underground” as they test on a low level. Think of the users as living up at the top, in the light, whereas the developers are comparatively buried in the details of their work. They have trouble seeing the product from the user’s point of view. This is called “the curse of expertise:”

“Although it may be expected that experts’ superior knowledge and experience should lead them to be better predictors of novice task completion times compared with those with less expertise, the findings in this study suggest otherwise. The results reported here suggest that experts’ superior knowledge actually interferes with their ability to predict novice task performance times.”

[Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2), 205–221. doi:10.1037/1076-898x.5.2.205]

While geophysics can be catastrophic, it can also be more tranquil than a stormy surface world. Unit level checking generally allows for complete control over inputs, and there usually aren’t many inputs to worry about. Stepping up to a higher level– interacting sub-systems– still means testing via a controlled API, or command-line, rather than a graphical interface designed for creatures with hands and eyes and hand-eye coordination. This is a level where tools shine. I think of my test tools as submarines gliding underneath the storm and foam, because I avoid using tools that work through a GUI.

The Round Earth model reminds us about data.

Data shows up in this model, metaphorically, as the flow of energy. Energy flows on the surface (sunlight, wind and water) and also under the surface (ground water, magma, earthquakes). Data is important. When we test, we must deal with data that exists in databases and on the other side of micro-services, somewhere out in the cloud. There is data built into the code, itself. So, data is not merely what users type in or how they click. I find that unit-level and sub-system-level testing often neglects the data dimension, so I feature it prominently in the Round Earth concept.

The Round Earth model reminds us about testability.

Complex products can be designed with testing in mind. A testable product is, among other things, one that can be decomposed (taken apart and tested in pieces), and that is observable and controllable in its behaviors. This usually involves giving testers access to the deeper parts of the product via command-line interfaces (or some sort of API) and comprehensive logging.

Epigrams

  • Quality above requires quality below.
  • Quality above reduces dependence on expensive high-level testing.
  • Inexpensive low-level testing reduces dependence on expensive high-level testing.
  • Risk grows toward the user.