The “test automation pyramid” (for examples, see here, here, and here) is a popular idea, but I see serious problems with it. I suggest in this article an alternative way of thinking that preserves what’s useful about the pyramid, while minimizing those problems:
- Instead of a pyramid, model the situation as concentric spheres, because the “outer surface” of a complex system generally has “more area” to worry about;
- ground it by referencing a particular sphere called “Earth” which is familiar to all of us because we live on its friendly, hospitable surface;
- illustrate it with an upside-down pyramid shape in order to suggest that our attention and concern is ultimately with the surface of the product, “where the people live” and also to indicate opposition to the pyramid shape of the Test Automation Pyramid (which suggests that user experience deserves little attention);
- incorporate dynamic and well as static elements into the analogy (i.e. data, not just code);
- acknowledge that we probably can’t or won’t directly test the lowest levels of our technology (i.e. Chrome, or Node.js, or Android OS). In fact, we are often encouraged to trust it, since there is little we can do about it;
- use this geophysical analogy to explain more intuitively why a good tooling strategy can access and test the product on a subterranean level, though not necessarily at a level below that of the platforms we rely upon.
Good analogies afford deep reasoning.
The original pyramid (really a triangle) was a context-free geometric analogy. It was essentially saying: “Just as a triangle has more area in its lower part than its upper part, so you should make more automated tests on lower levels than higher levels.” This is not an argument; this is not reasoning. Nothing in the nature of a triangle tells us how it relates to technology problems. It’s simply a shape that matches an assertion that the authors wanted to make. It’s semiotics with weak semantics.
It is not wrong to use semantically arbitrary shapes to communicate, of course (the shapes of a “W” and an “M” are opposites, in a sense, and yet nobody cares that what they represent are not opposites). But at best, it’s a weak form of communication. A stronger form is to use shapes that afford useful reasoning about the subject at hand.
The Round Earth model tries to do that. By thinking of technology as concentric spheres, you understand that the volume of possibilities– the state space of the product– tends to increase dramatically with each layer. Of course, that is not necessarily the case, because a lot of complexity may be locked away from the higher levels by the lower levels. Nevertheless that is a real and present danger with each layer you heap upon your technology stack. An example of this risk in action is the recent discovery that HTML emails defeat the security of PGP email. Whoops. The more bells, whistles, and layers you have, the more likely some abstraction will be fatally leaky. (One example of a leaky abstraction is the concept of “solid ground,” which can both literally and figuratively leak when hot lava pours out of it. Software is built out of things that are more abstract and generally much more leaky than solid ground.)
When I tell people about the Round Earth model they often start speaking of caves, sinkholes, landslides, and making jokes about volcanoes and how their company must live over a “hot spot” on that Round Earth. These aren’t just jokes, they are evidence that the analogy is helpful, and relates to real issues in technology.
Note: If you want to consider what factors make for a good analogy, Michael Bolton wrote a nice essay about that (Note: he calls it metaphor, but I think he’s referring to analogies).
The Round Earth model shows testing problems at multiple levels.
The original pyramid has unit testing at the bottom. At the bottom of the Round Earth model is the application framework, operating environment, and development environment– in other words, the Platform-That-You-Don’t-Test. Maybe someone else tests it, maybe they don’t. But you don’t know and probably don’t even think about it. I once wrote Assembler code to make video games in 16,384 bytes of memory. I needed to manage every byte of memory. Those days are long gone. Now I write Perl code and I hardly think about memory. Magic elves do that work, for all I know.
Practically speaking, all development rests on a “bedrock” of assumptions. These assumptions are usually safe, but sometimes, just as hot lava or radon gas or toxified groundwater breaks through bedrock, we can also find that lower levels of technology undermine our designs. We must be aware of that general risk, but we probably won’t test our platforms outright.
At a higher level, we can test the units of code that we ourselves write. More specifically, developers can do that. While it’s possible for non-developers to do unit-level checks, it’s a much easier task for the devs themselves. But, realize that the developers are working “underground” as they test on a low level. Think of the users as living up at the top, in the light, whereas the developers are comparatively buried in the details of their work. They have trouble seeing the product from the user’s point of view. This is called “the curse of expertise:”
“Although it may be expected that experts’ superior knowledge and experience should lead them to be better predictors of novice task completion times compared with those with less expertise, the findings in this study suggest otherwise. The results reported here suggest that experts’ superior knowledge actually interferes with their ability to predict novice task performance times.”
[Hinds, P. J. (1999). The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. Journal of Experimental Psychology: Applied, 5(2), 205–221. doi:10.1037/1076-898x.5.2.205]
While geophysics can be catastrophic, it can also be more tranquil than a stormy surface world. Unit level checking generally allows for complete control over inputs, and there usually aren’t many inputs to worry about. Stepping up to a higher level– interacting sub-systems– still means testing via a controlled API, or command-line, rather than a graphical interface designed for creatures with hands and eyes and hand-eye coordination. This is a level where tools shine. I think of my test tools as submarines gliding underneath the storm and foam, because I avoid using tools that work through a GUI.
The Round Earth model reminds us about data.
Data shows up in this model, metaphorically, as the flow of energy. Energy flows on the surface (sunlight, wind and water) and also under the surface (ground water, magma, earthquakes). Data is important. When we test, we must deal with data that exists in databases and on the other side of micro-services, somewhere out in the cloud. There is data built into the code, itself. So, data is not merely what users type in or how they click. I find that unit-level and sub-system-level testing often neglects the data dimension, so I feature it prominently in the Round Earth concept.
The Round Earth model reminds us about testability.
Complex products can be designed with testing in mind. A testable product is, among other things, one that can be decomposed (taken apart and tested in pieces), and that is observable and controllable in its behaviors. This usually involves giving testers access to the deeper parts of the product via command-line interfaces (or some sort of API) and comprehensive logging.
Epigrams
- Quality above requires quality below.
- Quality above reduces dependence on expensive high-level testing.
- Inexpensive low-level testing reduces dependence on expensive high-level testing.
- Risk grows toward the user.
Jan Svoboda says
Good analogies are quite powerful tools to get a complex idea more accessible immediately.
I used asynchronous warfare analogy to put across a point about place of test automation in test strategy (after being asked to automate everything).
Imagine you are to defend your newly established country against guerrilla opponent (the bugs). High level automated tests are like fortifications. You can keep rather limited area safe continuously with much less manpower but it is costly and you cannot fortify the whole country (finances, but also difficult terrain) and you have to think about maintenace. But you can identify important locations like cities, industry and crossroads that people and important resources pass through…
It makes you think how to prioritise security in different areas (cities and industry/power plants as opposed to mountains and jungle), manage your man power (defensive regression testing on critical areas versus search and destroy missions that cover wide areas, but do not ensure continuous presence in area) and free it up by well placed fortifications and prepared positions. Also how to train it and equip it.
Another that comes to mind when thinking about complexity underneath the surface is an organism/patient analogy. It is hard to pinpoint a cause of symptoms unless you can get deeper (biological/chemical analysis or bodily fluids, monitoring of bio signals like EEG, EKG) and even those are aggregates of many phenomenons. Data flows and it’s importance to function of vital parts are very clear in this analogy.
[James’ Reply: These are nice ones!]
Ahmed Fathi Moustafa says
Really very good analysis and a wonderful to connect between the software testing and the Round Earth, as when we connect the moral with the physical, the idea became clear and simple
Aaron Evans says
Actually, the pyramid is a pyramid, it’s not semiotics or a just a cute diagram. The point is that the base needs to be stronger than the top. This is true in testing, architecture, and rhetoric.
While your round earth model acknowledges a practical reality, namely that we can’t know everything about all the layers beneath our application code (you can only count so many turtles before yak shaving day is over), the testing pyramid relies upon proven physical facts that were clear to the ancients long, long before Isaac Newton or Neil Armstrong were born and are still applicable today.
[James’ Reply: Your analysis of my work is not quite up to the standard I require in order to reply to you. But keep trying!]
Kevin says
The “test pyramid” was conceived due to the below assumption. Note the exception and lack of hard rule about the layers.
“The pyramid is based on the assumption that broad-stack tests are expensive, slow, and brittle compared to more focused tests, such as unit tests. While this is usually true, there are exceptions. If my high level tests are fast, reliable, and cheap to modify – then lower-level tests aren’t needed.”
https://martinfowler.com/bliki/TestPyramid.html
[James’ Reply: Yes. My model is also based on premises. One of my premises is that the pyramid is routinely misunderstood and misapplied. By reversing the geometry I think we can help that situation.]
Nilan-jan says
@Kevin – it is critical when referencing the original pyramid to realize that it has nothing to do with testing. ‘Testing’ is used very loosely. That hasn’t stopped everyone from referring to it as the ‘test’ pyramid.
In all the places where he refers to ‘test’ it really doesn’t mean much.
Eddie says
You have taken what we have for so long seen as very helpful, turned it on its head and made it even more helpful. Your contributions to the field of testing…Legendary
Guillermo Chussir says
Interesting analogy!
“test pyramid” is an already quite established concept and very often asked during job interviews. But I think this analogy is better.
Ken says
I feel the round earth model is useful in quickly describing a general product to be tested and a high-level approach to that automation. I particularly like the inclusion of data into the model. I don’t discount the “traditional” test pyramid however as I feel it conveys something different. While the round earth model focuses on the product and the “test problem”, the test pyramid focuses on the distribution of automated tests/checks. The pyramid definitely lacks context but for those who understand that context, I think it serves a purpose. This neatly serves a complementary purpose so thank you for posting!
petr kus says
I think that’s a really good analogy! But testing the pyramid and your earth analogy describe different things. The earth analogy describes the product model and the testing pyramid describes the ideal distribution of the different types of tests. Both are good to use! 🙂 The same analogy (the product) is the patient. And I really like the fortification analogy to test automation! Thank you!
[James’ Reply: Round Earth is a well-formed analogy that is the basis for making the same exact point that the Pyramid model wants to make. As I explained, the Pyramid model is a bad analogy– a pyramid shape is NOT an ideal distribution of different types of tests, in fact, it’s incoherent. I can’t tell, and YOU can’t tell, and nobody can tell what “distribution” even means. Is it talking about the distribution of your time? Is it talking about the distribution of filenames in your folder structure? The number of bytes in your source files? The reason I don’t know is that the creator of the Pyramid never thought it through. He was telling you about a feeling and it’s a mystery whether your feeling really matches his feeling.
I’m offended that stupid models (by which I mean models whose originators willfully refuse to think through) are presented as good professional practice. You should be offended, too.]
Sebastian Stautz says
I became aware of another analogy I can attach to this:
Thinking of zoom levels, e.g. of city a watched from space. Level of details for the check automation.
e.g. a “smoke test” is something being more zoomed out. You do not see many details but have a good overview in general.
To see more details you need to zoom in. If you want cover the same space as in the “smoke test” but with more details you need need to scroll around which means more effort.
You can zoom in on the surface as well as on every layer beneath it (surely with other tools).
And you can think of automated checks as saved and compared pictures.
Once an deviation was found you have to figure out if the product is wrong or the screenshot needs to be updated because the city changed in an acceptable way.
When was the last check that all “expected” pictures represent an state we wish the city to have? How good can we determine this at all upfront?
Humans expect the city to change at some places but no one changed the comparison pictures.
Who updates all the outdated “expected” pictures which are showing deviations?
How should we trust the automated picture comparison at?
Is saving and comparing pictures the only way we can observe changes and find problems in the development of the city? I think not. Do not get obsessed with it.
Exploring a changing city by a telescope is more than just making “expected” picture and comparing them with observed ones.
[James’ Reply: Interesting idea.]
TJ says
I really like this analogy. You also see this principle in similar aspects.
https://www.keesfloor.nl/weerkunde/3atmosfeer/3atmosfeer.htm
And I think that if you are really starting to see the holistic view of software development then you see that in software testing it is the ’round earth model’ and in say project management (for the sake of example) it is the ‘atmosphere model’ and TOGETHER they form the total overview.