This post is co-authored with Michael Bolton. We have spent hours arguing about nearly every sentence. We also thank Iain McCowatt for his rapid review and comments.
Testing and tool use are two things that have characterized humanity from its beginnings. (Not the only two things, of course, but certainly two of the several characterizing things.) But while testing is cerebral and largely intangible, tool use is out in the open. Tools encroach into every process they touch and tools change those processes. Hence, for at least a hundred or a thousand centuries the more philosophical among our kind have wondered “Did I do that or did the tool do that? Am I a warrior or just spear throwing platform? Am I a farmer or a plow pusher?” As Marshall McLuhan said “We shape our tools, and thereafter our tools shape us.”
This evolution can be an insidious process that challenges how we label ourselves and things around us. We may witness how industrialization changes cabinet craftsmen into cabinet factories, and that may tempt us to speak of the changing role of the cabinet maker, but the cabinet factory worker is certainly not a mutated cabinet craftsman. The cabinet craftsmen are still out there– fewer of them, true– nowhere near a factory, turning out expensive and well-made cabinets. The skilled cabineteer (I’m almost motivated enough to Google whether there is a special word for cabinet expert) is still in demand, to solve problems IKEA can’t solve. This situation exists in the fields of science and medicine, too. It exists everywhere: what are the implications of the evolution of tools on skilled human work? Anyone who seeks excellence in his craft must struggle with the appropriate role of tools.
Therefore, let’s not be surprised that testing, today, is a process that involves tools in many ways, and that this challenges the idea of a tester.
This has always been a problem– I’ve been working with and arguing over this since 1987, and the literature of it goes back at least to 1961– but something new has happened: large-scale mobile and distributed computing. Yes, this is new. I see this is the greatest challenge to testing as we know it since the advent of micro-computers. Why exactly is it a challenge? Because in addition to the complexity of products and platforms which has been growing steadily for decades, there now exists a vast marketplace for software products that are expected to be distributed and updated instantly.
We want to test a product very quickly. How do we do that? It’s tempting to say “Let’s make tools do it!” This puts enormous pressure on skilled software testers and those who craft tools for testers to use. Meanwhile, people who aren’t skilled software testers have visions of the industrialization of testing similar to those early cabinet factories. Yes, there have always been these pressures, to some degree. Now the drumbeat for “continuous deployment” has opened another front in that war.
We believe that skilled cognitive work is not factory work. That’s why it’s more important than ever to understand what testing is and how tools can support it.
Checking vs. Testing
For this reason, in the Rapid Software Testing methodology, we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do. This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.
Now that Michael and I have had over three years experience working with this distinction, we have sharpened our language even further, with updated definitions and a new distinction between human checking and machine checking.
First let’s look at testing and checking. Here are our proposed new definitions, which soon will replace the ones we’ve used for years (subject to review and comment by colleagues):
Testing is the process of evaluating a product by learning about it through experiencing, exploring, and experimenting, which includes to some degree: questioning, study, modeling, observation, inference, etc.
(A test is an instance of testing.)
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
(A check is an instance of checking.)
- “evaluating” means making a value judgment; is it good? is it bad? pass? fail? how good? how bad? Anything like that.
- “evaluations” as a noun refers to the product of the evaluation, which in the context of checking is going to be an artifact of some kind; a string of bits.
- “learning” is the process of developing one’s mind. Only humans can learn in the fullest sense of the term as we are using it here, because we are referring to tacit as well as explicit knowledge.
- “exploration” implies that testing is inherently exploratory. All testing is exploratory to some degree, but may also be structured by scripted elements.
- “experimentation” implies interaction with a subject and observation of it as it is operating, but we are also referring to “thought experiments” that involve purely hypothetical interaction. By referring to experimentation, we are not denying or rejecting other kinds of learning; we are merely trying to express that experimentation is a practice that characterizes testing. It also implies that testing is congruent with science.
- the list of words in the testing definition are not exhaustive of everything that might be involved in testing, but represent the mental processes we think are most vital and characteristic.
- “algorithmic” means that it can be expressed explicitly in a way that a tool could perform.
- “observations” is intended to encompass the entire process of observing, and not just the outcome.
- “specific observations” means that the observation process results in a string of bits (otherwise, the algorithmic decision rules could not operate on them).
There are certain implications of these definitions:
- Testing encompasses checking (if checking exists at all), whereas checking cannot encompass testing.
- Testing can exist without checking. A test can exist without a check. But checking is a very popular and important part of ordinary testing, even very informal testing.
- Checking is a process that can, in principle be performed by a tool instead of a human, whereas testing can only be supported by tools. Nevertheless, tools can be used for much more than checking.
- We are not saying that a check MUST be automated. But the defining feature of a check is that it can be COMPLETELY automated, whereas testing is intrinsically a human activity.
- Testing is an open-ended investigation– think “Sherlock Holmes”– whereas checking is short for “fact checking” and focuses on specific facts and rules related to those facts.
- Checking is not the same as confirming. Checks are often used in a confirmatory way (most typically during regression testing), but we can also imagine them used for disconfirmation or for speculative exploration (i.e. a set of automatically generated checks that randomly stomp through a vast space, looking for anything different).
- One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.
- A check is describable; a test might not be (that’s because, unlike a check, a test involves tacit knowledge).
- An assertion, in the Computer Science sense, is a kind of check. But not all checks are assertions, and even in the case of assertions, there may be code before the assertion which is part of the check, but not part of the assertion.
- These definitions are not moral judgments. We’re not saying that checking is an inherently bad thing to do. On the contrary, checking may be very important to do. We are asserting that for checking to be considered good, it must happen in the context of a competent testing process. Checking is a tactic of testing.
If you follow our work, you know that we have made a big deal about sapience. A sapient process is one that requires an appropriately skilled human to perform. However, in several years of practicing with that label, we have found that it is nearly impossible to avoid giving the impression that a non-sapient process (i.e. one that does not require a human but could involve a very talented and skilled human nonetheless) is a stupid process for stupid people. That’s because the word sapience sounds like intelligence. Some of our colleagues have taken strong exception to our discussion of non-sapient processes based on that misunderstanding. We therefore feel it’s time to offer this particular term of art its gold watch and wish it well in its retirement.
Human Checking vs. Machine Checking
Although sapience is problematic as a label, we still need to distinguish between what humans can do and what tools can do. Hence, in addition to the basic distinction between checking and testing, we also distinguish between human checking and machine checking. This may seem a bit confusing at first, because checking is, by definition, something that can be done by machines. You could be forgiven for thinking that human checking is just the same as machine checking. But it isn’t. It can’t be.
In human checking, humans are attempting to follow an explicit algorithmic process. In the case of tools, however, the tools aren’t just following that process, they embody it. Humans cannot embody such an algorithm. Here’s a thought experiment to prove it: tell any human to follow a set of instructions. Get that person to agree. Now watch what happens if you make it impossible for that person ever to complete the instructions. Human beings will not just sit there until they die of thirst or exposure; they will stop themselves and change or exit the process. And that’s when you know for sure that this human– all along– was embodying more than just the process that he or she agreed to follow and tried to follow. There’s no getting around this if we are talking about people with ordinary, or even minimal cognitive capability. Whatever procedure humans appear to be following, they are always doing something else, too. Humans are constantly interpreting and adjusting their actions in ways that tools cannot. This is inevitable.
Humans can perform motivated actions; tools can only exhibit programmed behaviour (see Harry Collins and Martin Kusch’s brilliant book The Shape of Actions, for a full explanation of why this is so). The bottom line is: you can define a check easily enough, but a human will perform at least a little more during that check– and also less in some ways– than a tool programmed to execute the same algorithm.
Please understand, a robust role for tools in testing must be embraced. As we work toward a future of skilled, powerful, and efficient testing, this requires a careful attention to both the human side and the mechanical side of the testing equation. Tools can help us in many ways far beyond the automation of checks. But in this, they necessarily play a supporting role to skilled humans; and the unskilled use of tools may have terrible consequences.
You might also wonder why we don’t just call human checking “testing.” Well, we do. Bear in mind that all this is happening within the sphere of testing. Human checking is part of testing. However, we think when humans are explicitly trying to restrict their thinking to the confines of a check– even though they will fail to do that completely– it’s now a specific and restricted tactic of testing and not the whole activity of testing. It deserves a label of its own within testing.
With all of this in mind, and with the goal of clearing confusion, sharpening our perception, and promoting collaboration, recall our definition of checking:
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
From that, we have identified three kinds of checking:
Human checking is an attempted checking process wherein humans collect the observations and apply the rules without the mediation of tools.
Machine checking is a checking process wherein tools collect the observations and apply the rules without the mediation of humans.
Human/machine checking is an attempted checking process wherein both humans and tools interact to collect the observations and apply the rules.
In order to explain this thoroughly, we will need to talk about specific examples. Look for those in an upcoming post.
Meanwhile, we invite you to comment on this.
Update 2013-04-10: As a result of intense discussions at the SWET5 peer conference, I have updated the diagram of checking and testing. Notice that testing is now sitting outside the box, since it is describing the whole thing, a description of testing is inside of it. Human checking is characterized by a cloud, because its boundary with non-checking aspects of testing is not always clearly discernible. Machine checking is characterized by a precise dashed line, because although its boundary is clear, it is an optional activity. Technically, human checking is also optional, but it would be a strange test process indeed that didn’t include at least some human checking. I thank the attendees of SWET5 for helping me with this: Rikard Edgren, Martin Jansson, Henrik Andersson, Michael Albrecht, Simon Morley, and Micke Ulander.
Update, 2019-12-16: We have added “experiencing” to our definition of testing. We do this to emphasize the role of direct, interactive, human experience with the product, the system of which it is a part, and the context that affects it. Testers can and will use tools (including automated checks) in support of testing, but experience of the product is essential to evaluation and learning.
I like the fact that we are differentiating the capabilities of humans and tools.
I congruent with you on how humans get easily distracted from a set of steps to be followed for checking a functionality. Humans can’t do that on a daily bases.
To co-relate this to what we do regularly in our projects:
1. Unit tests and Integration tests are Machine checking, where you validate most of your functionality and business rules
2. Crazy/Monkey testing, handling edge cases in the UI is Human testing ?
[James’ Reply: I would say “unit tests” are not tests. Checking is not the same as testing. I don’t know what crazy/monkey testing is.]
Ingo Philipp says
Hi James, in your article you mentioned that “testing encompasses checking”.
[James’ Reply: Yes.]
In addition, your diagram shows that checking is inside testing. Now, you said that “checking is not the same as testing”. To me, that’s a little bit confusing.
[James’ Reply: Are you confused when I tell you that wheels are not cars? Leaves are not trees? Parts are not wholes? I bet you understand these examples just as well as I do. People keep thinking that output checking IS testing. They keep using the word testing for checking and in so doing devalue all the elements that make testing powerful. Then they pursue an impoverished process that is called testing (but really is mostly checking), thereby confusing any who look upon that and our craft about what it is we should be doing.]
This is like saying that the set of real numbers is inside the set of complex numbers, but real numbers are not the same as complex numbers. Now, all real numbers are complex numbers, but not all complex numbers are real numbers.
[James’ Reply: You are talking about a subset. I am not talking about a subset. Checking is not a kind of testing. Checking is a part of testing that exists only in the service of testing.
I’m talking about parts and wholes. Legs are not subsets of people, they are parts of people. Legs don’t have a functional existence independent of people.]
So, I would say that checking is testing, but not all testing is checking. In other words: Just as real numbers are a special form of complex numbers, checking is a special form of testing. To me, that’s a different statement than saying that “checking is not the same as testing”. Does that even make sense? 😉 Cheers, Ingo.
[James’ Reply: I’m afraid that does not make sense. Checking is not testing. If you are checking, and only checking, then you are not testing. You CANNOT be testing, because testing, as I explain in this post, is a larger, deeper, social process that guides and situates checking. If you are doing mere checking then you are merely operating a machine or merely simulating a machine.]
Ingo Philipp says
Understood. Thanks, Ingo.
Excellent! Thanks you two for clarifying this so nicely! I am a HUGE fan of Checking vs Testing. I will reference this post a LOT.
I really like this development, and I’ll be clear as to why. I’ve long felt that automation of some form will always be important, and it won’t just ‘go away’.
[James’ Reply: I’ve never heard of anyone saying tools would go away… except GUI-based checks which often fall apart under their own weight.]
I’ve also felt, that while it may provide value, until we have the ability to have competent ‘AI’ to emulate human cognition, that computer checking, would remain as any other program has been for as long as I can remember.
[James’ Reply: Until? Look, when software can test itself, you won’t need to worry about testing at all, or development for that matter. Programs will write themselves before testing is automated.]
That being, I remember as a young child, watching some program or movie and someone was explaining that Computers only do what humans tell them to do. This is true, but what often gets missed, I think, is that humans do not always give the computer everything in these checks because there are boundaries in what can be checked by a machine, or trying to script that much would add significant delay in completion of these checks.
So we probably find, especially with the prevalence of agile thinking, that machine checking takes on a form that looks almost as ‘minimal viable’ as the code to which is also up to a ‘minimal viable’ standard. I think this is why we need human beings involved in Testing as an exploratory activity in addition to the machine only checking.
David Greenlees says
Interesting and detailed post. Thanks to you and MB, as always.
Perhaps you’ll explain this more in the future posts you mention but I’m stuck a little on this…
“Human checking is a checking process wherein humans collect the observations and apply the rules without the mediation of tools.”
…combined with this…
“And that’s when you know for sure that this human– all along– was embodying more than just the process he agreed to follow and tried to follow. There’s no getting around this if we are talking about people with ordinary, or even minimal cognitive capability. Whatever procedure humans appear to be following, they are always doing something else, too. Humans are constantly interpreting and adjusting their actions in ways that tools cannot. This is inevitable.”
As I understand it Human Checking is actually Testing…
“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.”
…or perhaps it’s not as it doesn’t meet every part of this definition? A human following a particular process will always be interpreting, observing, inferring(?), and most likely learning… even there is no tangible output from it.
Or am I just way off? There is a very good chance of that!
[James’ Reply: This is a good question. I have added this text to the post: “You might also wonder why we don’t just call human checking “testing.” Well, we do. Bear in mind that all this is happening within the sphere of testing. Human checking is part of testing. However, we think when a human is explicitly trying to restrict his thinking to the confines of a check– even though he will fail to do that completely– it’s now a specific and restricted tactic of testing and not the whole activity of testing. It deserves a label of its own within testing.”]
srinivas kadiyala says
Thanks for the post on Checking vs Testing.
It was easily understandable – difference of checking and testing.
As programmers use compilers,editors such as Visual studio, then the programming is called “Automation programming”?
like wise,we use tools to test: QTP/Selenium – we call “Automation Testing”
I would like to know: What tools are there for machine checking?
[James’ Reply: Any programming language is a tool for machine checking.]
Louise Capper says
It may also be an interesting addition to your cabinet making analogy to consider that even in IKEA there are those skilled persons that design the cabinets to made in the factories, and they work with skilled ‘cabineteers’ who develop the design into functional prototypes prior to the design moving into factory production. These skilled craftsmen could be considered as the humans that design the algorithms that will be later used either as part of human and/or machine checking, in fact they would probably deciding what type of checking is appropriate.
I certainly agree with your premise that there should be no moral judgements on the importance of checking activities. These checking activities are considered just as important in my domain where software is safety critical (e.g. defence, aerospace, …). Having said that the humans involved in the process certainly enjoy testing activities ahead of human checking, and they value the contribution of machine checking to their daily job satisfaction state.
[James’ Reply: Good point. Thank you.]
Kees Blokland says
I appreciate this post. It clarifies a lot.
John Stevenson says
Thank you for this great post and helping to clarify the original post by Michael Bolton. Can you inform me if my own definitions now still apply?
In very simple terms I see checking as confirming what you think you already know about the system and testing as asking questions in which you do not know the answer.
[James’ Reply: Your definitions are not consistent with ours, as far as I understand them. We are not saying that a check must confirm anything. It just applies a rule to an observation. This could be motivated by a confirmatory strategy, or something else.
Furthermore, strictly speaking you only perform a check when you are worried that you may not know the answer. Otherwise there is no point at all.
As a loose intuitive “serving suggestion” for checks versus tests and not a definition, however, I have not problem with what you said.]
Adam Knight says
I like this article and I think that the distinction between testing and checking is an important one and have been using within my organisation now for some time (we use the term ‘assessment’ for the sapient testing activity)
I have, however, a concern over your definition of checking.
You say ‘we distinguish between aspects of the testing process that machines can do versus those that only skilled humans can do. We have done this linguistically by adapting the ordinary English word “checking” to refer to what tools can do.’
For me the use of tools and automation infrastructures is about more than checking. A critical role for tools is to gather information that is inefficient or impractical for a human to gather. We might then decide to apply checking against a subset of this information. One principle that I adopt in automation which I think is important to bear in mind when applying checks is to gather information, further to that required to furnish the check, to allow the appropriate re-assessment by a human tester should the check flag up an unexpected result. I worry that referring to tool use as ‘checking’ may place too much emphasis on the check at the expense of thinking about the other information that we might gather. Tools can be far more powerful if we approach them from the perspective, not of abstracting information behind checks, but of looking at what information they can tell us. In many cases I use tools without applying checks at all, but rather to gather the information that I need on the behaviour of a system during a recursive or parallelised operation which I could not achieve without the tool. Rather than referring to “checking” as “what tools can do”, I would suggest that checking is a subset of what tools can do, just as both are a subset of the greater activity of testing.
Thanks again for a though provoking post.
[James’ Reply: Hi Adam. We did not restrict the role of tools to checking. We said “Please understand, a robust role for tools in testing must be embraced. As we work toward a future of skilled, powerful, and efficient testing, this requires a careful attention to both the human side and the mechanical side of the testing equation. Tools can help us in many ways far beyond the automation of checks. But in this, they necessarily play a supporting role to skilled humans; and the unskilled use of tools may have terrible consequences.”
Furthermore, even in checking, a human and tool can cooperate to perform a check. I just added an extra note in the implications about that, though. I hope it helps.]
Good to see some clarity spoken about this issue. I don’t know if it would have been easier to reclaim the word “sapience” by explaining its difference from “intelligence” but I see why they might be strongly linked in peoples’ minds.
Combining the definition of checking and human checking you have here would it be fair to say that it evaluates to this?:
“Human Checking is the process of making evaluations wherein humans apply algorithmic decision rules to specific observations of a product they have collected, all without the mediation of tools”
Because if so you’ve made the point I want to make yourself. Humans can’t apply algorithmic decision rules. There are tacit processes, interpretations and modelling that humans cannot stop performing even if they wanted to; so your definition of algorithmic (“can be expressed explicitly in a way that a computer could perform”) isn’t possible in theory; so the definition of human checking isn’t actually applicable to something a human is capable of doing. Could you expand on this point, if possible, please?
[James’ Reply: Yes, we say this in the post. A check performed by a human is, in the strictest terms, NOT a check. But in slightly broader terms, it is a human trying to perform a check. Perhaps we should add the word “attempt” in there somewhere. I’ve added it, but Michael may object. We’ll see.]
Would it not be correct to say that, while a “unit test” is “checking”, the process of designing a unit test is a sapient one?
[James’ reply: Yes. Although I usually remember to add the qualifier “good” as in, “a good unit check.” We can certainly can create bad unit checks by writing a program that automatically generates them, thereby non-sapiently creating them.]
Stephen Blower says
I find that even though this topic on the surface appears to be quite simple it does prove difficult to explain to other testers the importance of understanding the difference. Often we assume that we understand the meaning of a word (definition) when attributed with context. However it is not until you take a step back and really look in to the details of what you believe the meaning of a word is that you start to realise that your preconceived ideas need to be adjusted.
Why is it difficult to explain and emphasise this difference? In my situation I was in the camp of “yeah of course I know the difference” and why wouldn’t I think that, as these two words are not particularly hard to understand. Now put this in the context of testing software and without redefined your understanding of what testing v’s checking means you’ll still not understand that there is a significant difference.
In my case, having difficulty explaining the difference (with clarity) between Testing and Checking, lies down to me believing I fully understand the difference, rather than in reality the fact being that I agree with it, of course, but I haven’t fully understood the need for a distinction.
Having been quizzed recently on what my understanding of a commonly used word is, it has shifted my thought process from thinking I know to questioning what I think I know.
[James’ Reply: We need to come up with a post to describe worked examples. That will help.]
I agree, designing “a good unit check” requires careful thought and is not something that be generated by a program. I would say that a “unit check” is an output of unit testing, where unit testing includes, but is not limited to, designing “good unit checks”. It follows that, outside of the process of unit testing, running “unit checks” is not to be considered testing. Does my understanding differ from yours?
[James’ Reply: Yes, this fits. Nice summary.
And as an example of checking outside the process of testing, imagine a set of checks constructed for the purpose of fooling people into thinking you had tested, but with no real intent to discover problems. This is more common than many believe.]
Markus Gärtner says
I understand machine checking as the thing that started to become “continuous deployment”, whereas “continuous integration” with lots of machine checks inform a human what to do next: green – investigate gaps in the checking strategy, red – investigate what is obviously to the machine broken.
I love the insights about human checking where a human will do some other things than the pure script in your previous terminology, and will miss some things from the scripts. I also love the drawing which details what I had in mind with the discussion around “automation vs. exploration – we got it wrong” (don’t want to put up the shameless plug here).
Thanks for sharing the inspiration.
Sigge Birgisson says
Thank you James and Michael for this clarifying post. I must say that it was time for it.
Exactly two years ago I had been rolling the subject in my head for a very long time and discussing it with peers, mostly developers, and wrote this post.
I would say that through the good discussions in the comments it quite well aligns with your post today. Yours having more structure to enable definitions to become more clear of course.
What i wanted back then, was to emphasise the needs for testing by programmers before even considering their development task at hand to be done. Testing their code for worthiness of being shown another person, as an addition to the checking they created.
A thing that I care the very much for, is the adaptation of ideas across cultural boundaries. Those are the times when ideas get their wings. That is what I was trying back then, getting the ideas going with programmers, but still there were too much unclear variables and too easy to shake the ideas off with a shrug. In this case the sapience word was one of those things as you already pointed out, but also that the concept seemed reasonable, but still does not seem to add any value.
[James’ Reply: Well speak for yourself. It adds value for me. But I don’t see how you would use the sapience concept for this. What would be the point? Sapience is about respecting the difference between people and machines. But that respect doesn’t magically create tests for you!]
So for the continuous journey of testing and checking, how would you see this idea get its wings out of testing and throughout software development? The refining blog post is a great start, but how do you see that this can unfold even further? Is this potentially one way towards better management of software projects? Or is this just another one of those keep-in-mind sort of ideas towards better development practices?
[James’ Reply: We hope that it helps people stop thinking that they’ve solved the testing problem just because they made some checks. I also hope that it leads to less waste of testing resources on the endless pursuit of automated checks.]
I would really like to see how this concept can do good within software as a whole.
Thanks for finally ‘retiring’ the use of ‘Sapient’. I’ve always found that word to be a bit confusing for people and have preferred to use the term ‘Cognitive Thought’ (or Cognition) to basically state “Use your brain!” in regards to this line of work. Too many times we hear about the “monkey on the keyboard” view of Testing, which it is not. Testing is about ‘experimentation’ to prove or disprove a hypothesis. In our case for software it can be boiled down to does this do what I expect or not, and if not then why. This implies some type or level or a’priori knowledge. But if we do not have that knowledge then testing can be as you have stated before, a way to ‘learn’ about the behaviour of the subject being examined (the software or system).
And at times this learning and/or experimentation will involve the use of tools to AID in the process. And that is the key, tools are aids in our work. But the issue I’ve seen over the years is that people believe the tool is the solution, not a mechanism to get to the solution. As we all know sometimes the tool doesn’t help.
[James’ Reply: “cognitive thought” in no way means what we were trying to say with sapient. So, you have inadvertently demonstrated the problem. You aren’t complaining about the term or how we used it. You are complaining about something we actually never tried to do!
Anyway, I like all the rest of what you said. So thanks!]
I just sent this e-mail around to my group. “Mark” is our manager, I’m the senior engineer.
There is an extremely valuable blog post by two software testing thinkers (James Bach and Michael Bolton) here:
You know how I wave my hands and run off at the mouth about the philosophy of testing? An important part of what I’m trying to communicate is encapsulated in this post and the simple terminological change that they propose.
I would like for this to happen:
– Everyone in our group reads this post.
– We add “Discuss testing vs. checking” to our QA group meeting agenda until we’re all on the same page (or we discover that we agree to disagree).
– If we agree on these principles and this “Testing vs. checking” terminology:
o We publish this blog post (or a summary of it) to the company with the comment that we are now going to be following this terminology.
o We start using this terminology ALL THE TIME and we actively and consistently expect others to do the same.
o We cherish the fact that we’ve had some good discussions and we leave it at that.
I would, of course, prefer that the “If” there evaluates to “true”!
Ten or twenty years from now, people will be saying “you know, there was that time that people differentiated ‘testing’ from ‘checking’, and it revolutionized how people think about QA. Wasn’t that great?” …and we have the chance to lead or follow.
By the way, Mark just read this over my shoulder, and he basically agrees, and says we should prepare to talk about this at next week’s (i.e. not tomorrow’s) QA group meeting. Please read and think about this blog post between now and then. I will try to remember to send out a reminder e-mail, say next Monday.
J. Michael Hammond
Principal QA Engineer
Attivio – Active Intelligence
Stephen Hill says
I really like this distinction between human, machine and human/machine checking as a way of avoiding talk about ‘sapience’. The split, to me, is intuitive because, as I try to explain in a (pending moderation) reply to Rahul over at http://www.testingperspective.com/?p=3143, our human wisdom gets in the way of us being able to perform a raw check of an algorithm in the way a machine can. I think this split helps us get away from the idea that a ‘check’ has to be automated as well which is good – though I freely accept that many checks are good candidates for automation.
I do think tools can only check. This is due to the way an assertion is made. A computer uses things like =, !=, >, <,… to evaluate. As long as a computer cannot make an observation not directly related to the actual evaluation it is only ever checking. Yes, tools can aid testing but they will still only be checking (not talking about things like data creation or similar here) and yes, there are checks only tools can actually viably do. But fact remains this will always be checking.
There are analog computers and neural networks, where this premise might be proven wrong but these I doubt are in common use by testers or anyone for that matter.
Another thing I observed when checking myself, if I manually follow a test script that contains specific checks, it actually forces me at some point to just do an assertion. It is usually the last thing I do. Before that I do lots of other (non scripted) things. So checking can be used to focus back to something that might be important but it also shows, how bad I am to actually doing checking.
So, as indicated above maybe humans are actually bad at or plain can't do checking (reliably) and that's why we seek out tools for those tasks.
[James’ Reply: If you look at the definition of testing and checking, it is clear that tools CAN do things other than checking. Lots of things. An infinite number of things. A check is a specific kind of thing. But for example: if you use a tool construct test data, that’s not a check.]
Rema Ravi Subramanian says
Congratulations to authors for presenting the testing/checking aspect thoroughly.
The human factor mainly comes into picture (in my opinion), the order in which test cases are executed. I have observed human testers extremely fast and efficient in determining “that” particular order of ,executing tests which will pick the bugs within a short time. Hence humans are more efficient in tests around a functionality change. However machines are real blessing for the regression type of checking.
In order to use machines for more functionality checking, definitely we need improvements in automation tools
Andrew Prentice says
In programming there’s a widespread and longstanding concept of assertions: http://en.wikipedia.org/wiki/Assertion_(computing)
A common, if not the most common, application of this today is for test assertions: http://en.wikipedia.org/wiki/Test_assertion
How does checks & checking, as defined here, differ from assertions & asserting as commonly understood in software development i.e. What difference do you see between asserting something and checking something?
Where do assertions fit into the testing process presented here?
[James’ Reply: Look at the definitions in the post and tell me what you think.]
Andrew Prentice says
They strike me as synonymous.
[James’ Reply: Andrew, you can do better than that. They are obviously not synonymous. However, an assertion– properly used according to your first citation– is a kind of machine check. So you have that part right. Unit checks probably use assertions in them, too.
They are not synonymous because a check can be more than an assertion. Human checks are obviously much more, unless you are directly equating a human with a Turing Machine. But even machine checks include the process of gathering the observations, not just the application of the decision rule. Furthermore you might have decision rules that are more complicated than simple assertions. In short, there might be a lot of code associated with a machine check, and no programmer would gesture at my 3,000 lines of Perl which does statistical analysis of thousands of data points and say “that’s an assertion” or even “that’s a whole pile of assertions.” For one thing, there are no actual assertion statements anywhere in it that relate to the product I’m checking. It would fit the definition of human/machine checking, however.
Thank you for raising the question, though. I have added a bit of text to the implications section to talk about assertions.
Simon Morley says
I was reading (testing) this article and came across “The bottom line is: you can define a check easily enough” and realised I hadn’t seen “check” defined or previously used. I then went back and “checked” that I hadn’t missed the definition…
From the checking definition I can assume:
Check: An output of an algorithmic decision rule.
Is that correct?
[James’ Reply: Sorry I meant to include that and forgot. I will add it. A check is “an instance of checking.” Nice and simple. If you need something more specific for some reason, then I would say “a check is the the smallest block of activity that exemplifies all aspects of checking.” By this definition, a check must result in an artifact of some kind; a string of bits.]
Then… on the “checking” definition I started thinking about “algorithmic” – this is typically interpreted as a finite number of steps, but doesn’t have to be – there could be randomized algorithms – where the behaviour/result might not be deterministic or las vegas algorithms – which will produce a result but are not necessarily time bounded. All of this might be valid for a check, but I was trying to construct a simpler form/definition without algorithmic – either by replacing algorithmic with deterministic or dropping algorithmic. Thoughts?
[James’ Reply: By algorithmic we mean that it can be represented as an algorithm– in other words that it can be performed by a computing device. We could have said “computable.” Why would you want to drop it, man? You can have an algorithmic process that is not deterministic or that involves random variables.]
On to implications.. As testing encompasses checking I think it’s also important to highlight that “good” or “excellent” checking implies “testing” -> ie the decision to use a check or to do some checking and / or the analysis of those checks and the checking is back in the testing domain (to determine how/what to do next). Making a check or doing some checking without caring about the result would be wasteful – although I’m sure it happens…
[James’ Reply: Checking is a tactic of testing, yes.]
Without this I see a risk that checks/checking in some way form their own “activity” -> someone thinking that we don’t have time for testing, so we’ll do some checking instead (which might be motivated and justified) -> then this becomes a standard/common practice… So, I want the checking justified (as much as the testing) – based on the value of the information we hope to learn from them. Thoughts?
[James’ Reply: You can definitely do checking inside of a messed up and attenuated testing process. That can be okay if the product happens to be great, or if the customers are forgiving.]
I like what I read so far, but I need to do some more thinking / analysis of the above too.
damian synadinos says
I’ve just read this post and completely agree with the idea (metaphors, concepts, terminology, definitions, etc.). Thus far, I’ve nothing to add. Well done, and thank you!
So……now what? That is, “What do I do with all this information?”
[James’ Reply: Use it as a rhetorical scalpel. Talk about having a checking strategy within your testing strategy, and keep those things distinct. Don’t let talk about checking drown out talk about testing. Be ready to give examples of testing without checking.
In short, use this information to protect testing.]
I figure that I need to “sell” the idea, and then “apply” the idea.
I assume that, in order to sell it, I have to find various “carrots”. I need to find the motivating factor(s) and benefit(s) for each person to accept this new idea. Do you have any suggestions?
[James’ Reply: The principal carrot is that we can protect the business by avoiding obsession with things that don’t provide much business value.]
If I can sell the idea and get everyone speaking the same language, then I have to apply the idea. I need to find practical ways that this idea can be implemented. Any ideas?
[James’ Reply: What do you think it means to “implement” this idea?]
I realize that perhaps these questions aren’t specific to this idea. Perhaps these questions have more general answers. Perhaps what I’m really asking is, “How do I sell and apply *any* new idea (…such as human/machine checking vs. testing)?”
[James’ Reply: Selling an effective approach to testing must begin with you understanding and practicing testing well. Knowing what can and can’t be done by a machines is part of that.]
A future post promises specific examples to help explain this idea thoroughly. I look forward to that post! However, if there IS any specific guidance around what to do with this info, I’ve love to see it included, as well.
Mark Rushton says
I’ll be re-reading this a lot in the future. Thanks for what you do to improve the craft.
Michael Leung says
It is indeed an interested topic. What I would like to share may be a bit off topic because I am totally fine with your definitions about testing and checking but how and when to apply them in software projects matters.
The definition of testing reminded me of some bitter experiences in previous projects. Users quickly discussed and signed the specification in a highly multi-tasking manner (i.e. not focused). But when users were asked to accept the system we have built, they would seriously put down all their jobs on hand and start “testing”, while we only expected them to do “checking” against the signoff specification. After usually prolonged evaluating, learning, experimenting, questioning, studying, modeling, observing and inferencing, they decided that they now understood what they really need and requested us to build something quite different. Project delay, scope creep, finger-pointing… Why the cabineteers won’t suffer from “testing”?
Skillful cabineteers are everywhere in Hong Kong because they can magically fit all the furniture into a less than 400 square feet houses (usually a room in your standard). Here is a typical sample found from a public blog: http://xaa.xanga.com/9c9f7a6670134249809544/o198225488.jpg. Note that the cabineteers won’t just let us confirm the drawing and then start crafting. They put in all the measurements in inches with color codes. Because of the size of our houses, a quarter of an inch is enough to invalidate the whole design. We therefore do the “testing” up-front — evaluating, learning, experimenting, questioning, studying, modeling, observing, inferencing and obtain each and every measurement. We agree with the cabineteers that these are the “algorithmic decision rules” when “checking” the cabinet upon delivery.
Basically, this is Acceptance Test Driven Development (ATDD) or Behavior Driven Development (BDD) we have been practicing in recent years – First Testing, then development and finally (automated) checking.
Traditional specification writing is not a good collaboration tool but “testing”, as you have defined it, is. “Testing” magically bridges the communication gap between users and developers. The result is close to zero misunderstanding of what to build and all we need is “checking” and not another round of “testing” upon delivery. (We still have UX testing but it is only the presentation stuff)
Although the above is not totally relevant, I hope that I have looked at testing and checking from a different angle.
[James’ Reply: I’m concerned at the extremely mechanistic view of testing that you seem to have. Not just as you express it here, but on your blog. Your post about the “new V-model” is a love letter to the cause of standardizing without first learning; defining artifacts before exploring the implications of the product. Your references are to Factory School advocates.
Only in a very simple kind of project would I use checking as a principal tactic for testing.
What is so difficult about the concept of developing skills? How come that plays no part in your narrative? The first thing I would do if I had to work under the regime you describe is to break all the rules and reinvent the process.]
Andrew Prentice says
James, you misunderstand. It’s your stated definition of checking that strikes me as synonymous with asserting, not what you actually think checking is.
[James’ Reply: If that’s what you wanted to complain about, then it would have saved some time to have said so up front. Now that you have said it, I can reply to that concern. Here’s my reply: That’s not a bug; it’s a feature.]
Obviously, checking, if it “includes the process of gathering the observations, not just the application of the decision rule”, is not synonymous with asserting. However, “the process of gathering the observations” is absent from your definition of checking (“the process of making evaluations by applying algorithmic decision rules to specific observations of a product.”) i.e. your definition implies gathering observations can occur prior to and separate from checking.
[James’ Reply: The art of writing a definition includes choices about what to lock down tight and what to leave loose. We build plasticity into our definitions in order to help them be applied powerfully and without much effort. If we made it explicit that the entire process of observation is included in checking, then people would say that an assertion is NOT a check, but only part of a check. We would also have a definition that is more complicated, harder to remember or quote, and harder to apply.
It’s fair for you to say that an assertion is a check. That fits a reasonable interpretation of the words we used. However, we feel it is also a reasonable interpretation to say that “specific observations” refers not merely to the results of the observation, but also observation process. In other words, one reasonable expansion of “specific observations” is “a specific string of bits that results from an algorithmic process of interacting with and observing the product.”
We don’t care if someone chooses to interpret this more narrowly. By doing so, he is either assigning the observation process to testing in general, or to something else, I guess. But what he can’t say is that it’s wrong for OTHER people to interpret it more broadly and– we think– more powerfully.]
My point is your definition needs sharpening, else it will be misunderstood, and the last thing the discussion around automation needs is even more misunderstanding.
[James’ Reply: We will monitor the issue and adjust our definition further if we feel that there’s any widespread problem with this. Meanwhile, anyone is free to define these words in their own way. We’re not a law-making body. We are merely stating the protocol used in the Rapid Software Testing methodology. And we are sharing our ideas in case other people care what we think… and some people do.]
PS You should post your code on BitBucket or GitHub so people can see what you’re talking about.
[James’ Reply: What I’m talking about is pretty simple to explain, and only the concept matters anyway. Also, I’m not at liberty to release it under creative commons license.]
John Ruberto says
Interesting post. I wonder if in the follow up post, you might also add a little more context on why you felt it important to draw this distinction so clearly. During my first read, the post sounded like the proverbial cabinet maker complaining about factory made cabinets. Though, on second pass & your responses to the comments, it sounds more like you see an over reliance on “checking” in industry. What are you seeing in practice and how did that influence the need for these definitions?
[James’ Reply: the blog at Developsense.com covers that in more depth. See:
This isn’t even all of them! But it’s the main body.]
Jesper L. Ottosen says
Thank you both for the refinement of checking.
Trackback & updated blog post on “testing AND checking”
damian synadinos says
[What do you think it means to “implement” this idea?]
Perhaps getting “everyone speaking the same language” *is* the point. If you can “talk” right, then it can help you “do” right, too.
damian synadinos says
Edited to add to my previous reply:
I said: Perhaps getting “everyone speaking the same language” *is* the point. Reading (part of) your response to Dale Emery (on Michael Bolton’s blog), I see that I’m pretty much right:
Me: “Now what…do I do with this information?”
You: “install linguistic guardrails to help prevent people from casually driving off a certain semantic cliff” that results in “software that has been negligently tested”, and ultimately “improve the crafts of software testing”.
Petteri Lyytinen says
Further comments later on but, just as a quick note:
As pointed out by a friend, the term “automatic programming” is used in the context of artificial intelligence and refers exactly to what the name suggests: code that writes new code.
That might not be a big issue here but I thought it’s still worth mentioning, for your consideration.
[James Reply: Yes, and “artificial intelligence” is ANOTHER thing that does not exist.]
Simon Morley says
On algorithmic – I’m very happy that it is used if it’s meaning is not restricted to finite or deterministic steps. That was the aspect I wanted an answer to.
I’ve used non-deterministic checks in the past – for instance splitting up logfiles on time or size (whichever threshold triggers first) and parsing for certain keywords.
John Ruberto says
Thank you for the pointers on your (and Michael’s) motivations, and for the time to write these thought provoking posts.
Gerard Numan says
Highly appreciated blog! It poses basic questions and clearly comes from a heartfelt risk in current developments: that of losing test intelligence in the face of technical abilities. The challenge now is: how to maintain, rediscover or implant testing intelligence?
My intuition on this, (“testing intelligence”) would be that it is a mental faculty of judgement: the ability to interpret and distill quality criteria (1) and apply these to objects under test (2). These 2 judgemental abilities are easily taken for granted: we simply rely on the documents we have (judgement 1) and rely on the test techniques or tools to perform actions on the test objects whereby we make simpel comparisons between expected and real results (judgement 2).
Advanced tools and their current degree of usability have a darkening effect in that they blur the need for critical judgements as a part of testing. Tools can do more and more work for us, we forget to do the most important parts ourselves.
I think it is desirable that authorities, like the authors of this blog, elaborate these topics and come up with more suggestions on how to recognize “testing forgettenness in the face of mere checking” and how to define testing intelligence and how to develop it.
[James’ Reply: The development of testing skill and judgment is our principal professional focus. You could say that nearly every blog post we make tries to get at that subject.]
Henrik Emilsson says
I really like when ideas are maturing and grows into even better things. 🙂
The Testing vs. Checking series were in more than one way revolutionary, this since they put words on some things many people felt were wrong.
I believe I have read and thought through the concepts a lot, and at the time I wrote a blog post about the Testing vs. Checking Paradox (http://thetesteye.com/blog/2010/03/the-testing-vs-checking-paradox/).
I eventually closed that post by saying I was wrong in my assumption, but now that I see this post I am not so sure anymore.
Maybe I was touching the concept of Human Checking?
The more you know about something, the more you only need to check against your assumptions. I.e. this is nothing that can be done by a machine, since a machine cannot know what I know – and they cannot know what I assume in a specific context.
But when I test, I do a mix of learning and checking (personal) assumptions (in a given context).
What do you think about adding the type of oracles into the definitions?
Machine Checking requires one specified oracle; Human Checking requires one or more oracles with a intelligent judgment whether which oracle is good enough?
[James’ Reply: I don’t think I accept your premise that the more you know something, the more you only need to check. If you “know,” why do you even need to check?
If you don’t know, and you want to know, then you need to find out. That requires testing. If you have a stable and trusted model of the world and the product, then it may be that you are comfortable with a checking strategy to serve as your test strategy. In other words, you test strategy may be dominated by the tactic of checking.
Machines don’t check against assumptions. They don’t know what assumptions are. Checks are designed by humans who make many assumptions, however. I don’t see the point of counting oracles, either. Whether you want to call it one oracle or many is subjective.
During human checking, the human attempts to follow the specified check procedure. However the human will bring additional oracles to bear. That is technically outside the scope of the check, but informally is part of it. Thus, there is no need to make that explicit in the definition.]
Michael Leung says
I am really delighted to have received your reply. To make our discussion fruitful maybe I should describe more about the context in which I give my narrative.
First, I take development skill for granted because it is a must anyway. In short, we follow Uncle Bob’s S.O.L.I.D design principles and Eric Evans’ DDD. For many years we believe we are “doing the thing right” but are not “doing the right thing”. Very often, we and users have different interpretations of the same specification. To us, doing the thing right is relatively easier than doing the right thing.
[James’ Reply: Skill cannot be taken for granted. Skill is developed through struggle and study. Furthermore, you have not said anything about testing skill. And the people you cite are not testers.]
Secondly, yes, my projects may have fallen into your simple category. My projects in recent years were mostly 10 to 12 iterations of 4-weeks each, involving two teams of 6 people each. For the current one to be deployed in June this year, users expressed in a review that it was the first time they “grow” specifications with “testing” as the collaboration tool. In describing how to test a feature, they evaluate, learn, experiment, question, study, model, observe & inference. Management is especially impressed by the “Specification PLUS test cases” approach, as these are their living documentation.
[James’ Reply: “Test cases” are not testing. Learning is not captured in test cases. Experiments are not even captured by them. Questioning is definitely not… etc.]
Michael Feathers once said that legacy code is code without tests. Test cases inside the specifications are readily executable (both manually & automatic). You know for sure if it is better or worse after checking in a change.
[James’ Reply: Michael Feathers is not a tester. He doesn’t study testing. What he says about coding is worth listening to, but what he says about testing is at best dubious.
Testing never results in knowing anything important “for sure.”]
Of course, the iterative approach also contributes to users’ satisfaction as they “grow” the final product together with us. By playing around with completed features, they understand more and more about what they actually want and provide more realistic requirements ahead. But to realize the benefit of iterative approach, automated “checking” against test cases inside the feature specifications is mandatory.
[James’ Reply: Mandatory? Obviously that is not true. I have a lot of experience with iterative development in commercial projects, and I do relatively little automated checking. Automated checking can be helpful or it can be a huge useless time sink. You have to be careful with it.]
Requiring users to redo “testing” against completed features is definitely out of question.
[James’ Reply: I don’t require users to do any testing, ever. Why is that even an issue?]
You know, no matter what development methods we are practicing, there are always some blind spots that can only be pointed out by independent outsiders and the most effective way is to find reputable persons with opposite opinions and ask them questions. So, by your definition of “testing”, do you mean that we have to build a prototype first? We cannot perform “testing” against a feature specification on paper?
[James’ Reply: If you read my definition you see that it can be applied to an idea, just as it is possible to test a physical product. However, the nature of that super-early testing is quite different and in some ways it is obviously limited. There are things you will only discover when you build the real product.]
If that is the case, are there any ways to reduce the “waste” of throwing away the prototype or making many changes to the original design and avoid introducing bugs when making these changes?
[James’ Reply: Why do you choose to call the learning process a waste? It’s just learning, man.]
I understand that custom build software is much more complicated than custom build cabinets but am I too naïve to apply the same principle? We won’t expect the cabineteers to build us a prototype for “testing” before we confirm the final design. We would imagine how the cabinet, the drawing on paper, fits into my 400 square feet house and give measurements (test cases) down to quarter of an inch. For example, we would model the cabinet door and when widely opened, should leave at least half an inch before touching the sofa.
[James’ Reply: Cabinets are many orders of magnitude less complex than software. Come on, man. The two situations cannot be equated.]
Eric Jacobson says
Sigh…just when I was beginning to finally feel comfortable speaking precisely about “testing” and “checking”, you two had to refine it! Fair enough. I know it’s all for the best. Two questions:
What was the problem with Michael’s original “check” definition (an observation, linked to a decision rule, resulting in a bit)? I loved that definition. People always nodded their heads in understanding when I repeated it.
[James’ Reply: Actually, I wrote that original definition (except for the bit part, which Michael added). That definition is a subset of the new one. Whatever cases fit that definition will reasonably fit the new one. With the new one, we wanted to emphasize that the decision rule is algorithmic (i.e. computable; amenable to automation) and that the observation is specific (i.e. expressible as a string, rather than some tacit impression). We wanted to allow that the evaluation might result in more than a bit as a result. We also wanted to distinguish between human and machine checking, so we generalized the definition a little bit in order to afford sub-classing.]
Can you explain the distinction between “Automated Testing” and “Automated Checking” ? I have been attempting to replace the former with the latter but now I’m thinking they can coexist.
[James’ Reply: There is no such thing as automated testing. Testing cannot be automated. Testing can be supported with tools, of course, just as scientists use tools to help their work. Automated checking is a synonym for “machine checking” or possibly “human/machine checking.”]
BTW – I love the cabinet maker metaphor. My brother is a professional cabinet maker. I don’t have the name you were looking for to describe an expert cabinet maker. However, their work is referred to as “museum quality” furniture. I’m doubtful my brother’s skill could ever be automated. He can build an entire cabinet to follow the slight curve of a board that came from a bent tree.
[James’ Reply: Cool!]
Mario G says
My reply got too long, so I wrote it on my blog:
[James’ Reply: Wow, your reply is a beautiful example of the Dunning-Krueger effect, assimilation bias, and the incommensurability of opposing paradigms. It’s a Factory School idea that’s been put in a bunker and allowed to inbreed for 40 years, untrammeled by the developments in our craft.
You are like a newly discovered comet of wrong.
The next time someone tells me that what I’m saying is just obvious, I can point them to your blog and they will beg my pardon.]
Michael Leung says
Seems like nothing has changed since Mr. Bret Pettichord put forward the Four Schools of Software Testing 10 years ago, except that it may be five now if Test-Driven Design stands on its own. We don’t exchange ideas, but labels
[James’ Reply: Plenty has changed! My school has continued to learn and grow, while the rest of you guys… do whatever it is you do other than studying the craft of testing software. It’s not my fault that you are off on that tangent. And if you think I’m off on a tangent, well, that’s what it means to have different and incommensurable paradigms.
You seem to think that completely different ontologies of practice should be able to routinely exchange ideas. I wonder why you think so? Maybe pick up a book on sociology, once in a while, man.]
Thanks for this elaboration on checking vs. testing. I’ve been trying to use these points to make improvements at work since attending a Michael Bolton workshop in Portland a couple of years ago. This additional info will aid in my arguments that we are far too focused on checking with a criminally small amount of time left for testing.
I must also thank you for introducing me to the Dunning-Krueger effect. It’s a concept with which I’m all too familiar (and must admit, I suffered from a few times) but didn’t know it had a name.
And an interesting observation, Mario G managed to get through a lengthy post and respond to comments without once using James’ name except when he pasted the response from here. He called out Michael Bolton and Cem Kaner by name, yet James is relegated to ‘et al.’ Fascinating.
[James’ Reply: I am He Who Must Not Be Named!
Mario committed a rookie transgression by posting a comment, here. He either knows or should know that his way of thinking is something I oppose to the degree of heaping Voltairian ridicule upon it wherever I encounter it. I think his ideas are sociologically ignorant and scientifically illiterate mumbo jumbo. The most sympathetic thing I can say is that it is a completely different paradigm and he has a right to express his stupid opinions just the same as I have the right to express mine.
But why he would expect me to show him any sort of respect when he wanders over here to pollute my blog is beyond me. It’s as if Dick Cheney were to write a letter to the editor of Mother Jones magazine…
I’ll argue with anyone who seems seriously trying to understand. But I suspect that learning is not his thing.]
Michael Leung says
My curiosity and some posts below make me think that we might perhaps exchange some ideas.
“For the field to advance, we have to be willing to learn from people we disagree with–to accept the possibility that in some ways, they might be right.”
[James’ Reply: The problem is not that you disagree with me. The problem is that you come from a different paradigm. Paradigms cannot be mixed and matched using any simple form of translation. They are incommensurable. In a truly cross-paradigm situation, the communication cannot succeed using any formal means. It requires informal methods of exchange (e.g. a series of shared experiences).
I can disagree with people inside my own paradigm. With people outside, “disagree” is too weak a word. A better way to say it is that I don’t comprehend why you seem to think the way you do. What you say makes little sense. It’s a mystery to me. How do people like you live and reproduce? I don’t know. I suspect that nothing you will say will help.
I have plenty of people around me with whom I can have productive disagreement. I don’t need to make myself crazy trying to convince you to leave behind nearly all of your assumptions and beliefs.]
“The distinction between manual and automated is a false dichotomy. We are talking about a matter of degree (how much automation, how much manual), not a distinction of principle.”
[James’ Reply: We *are* actually talking about a distinction of principle, and not one of degree. The sentence you cite is a reference to something else, which is the idea of tool-supported testing (not checking).]
I think this post helps the following:
– good testers understand further subtleties of checks
– testers/developers who had bonafide uses of checks, no longer look bad.
– overall clarity on checks
When I had first read Michael’s post/s on checking/testing, it wasn’t so much a new term, it helped me understand better what is “not testing” or what is “bad testing”. Along with that I liked the definitions/conditions for testing such as “a test must be an honest attempt to find a defect” and “a test is a question which we ask of the software”. To me, these clearly delineated a test/not-a-test.
In this post, I would have liked to restate the overall mission of testing. In the diagram, we should show that bad checks are not part of testing. With the current post one can leave with the impression that bad checks are part of testing. (Also worth keeping in mind that bad checks is rampant – much more than good tests – my opinion).
I think there are some prerequisites for using checks as described here:
1. One needs to understand the overall mission of testing
2. One needs to explicitly define the meaning of testing
When teams focus *completely* on automation, it might seem like their checks are part of testing. However, if there is no overall understanding of testing, I am not sure if there is value in checks.
Many groups do not have an explicit definition of testing. I am not sure if you would endorse their checks and tests. In this case, I think testing is hit and miss and one can create ‘bad checks’.
[James’ Reply: Good points. Obviously I advocate good testing, not bad testing. One popular kind of bad testing is that which is obsessed with checking to the exclusion of the rest of testing. Even good checking that is overdone amounts to bad testing. But good checking, as you say, requires a good testing process (including an understanding of the mission) in order to pull off]
Rickard Persson says
Love this blog post and it’s subject!
After reading it, a particular scene pops up in my mind:
You all seen the smart little monkey using a stick, probing the anthill to get to the eggs. The stick is a tool and the monkey gain two-fold advantages from using it: avoiding the fearsome counter-attacking ants and extending his range to reach the eggs.
Like a carpenter using a saw…
Like a tester uses automated testing tools…
In my view it’s always a human (or a monkey for that matter ;)) that holds the other end of the stick and decides how to use it for best result. I think it’s sad the software industry generally believe that knowledge about the tool itself is more valuable than the human wisdom of when and how to use it.
Automation testing consumes a lot of energy, time and money to develop and maintain, and often the tools grows into being more important than the reason for using it in the first place. To me that’s a very backward way to optimally solve a problem…
If these purified definitions of checking and testing helps us to straighten out that misconception, I’m for it all the way!
I’m thrilled to see what comes next!
Darren D says
Great article and feedback discussion.
In this comment made:
“Furthermore, strictly speaking you only perform a check when you are worried that you may not know the answer. Otherwise there is no point at all.”
I am not sure what you mean by ‘not know the answer’. Is this referring to not knowing what the expected result should be(would that be called exploratory testing?) or knowing the expected result but not sure what the program under test will return?
[James’ Reply: I mean the latter. You must have some algorithmic method of evaluating the result in order for it to be a check. But if you are 100% confident in the outcome of a check (e.g. if you are completely sure that the right numbers will appear on the report, say), then there is no point in running the check (at least from your point of view). It’s no longer a check, it’s just a demonstration done perhaps for the entertainment of others or for ritualistic/magical purposes, such as to appease management or to perpetrate a fraud.]
Darren D says
I am just thinking about how to define and convey instructions to manual testers using this terminology to help solidify this knowledge in my mind. Perhaps one could refer to a manual test script as a ‘Check Script’ or ‘Check Procedure/Case’ depending on how it is written. Maybe if written only at a logical/test condition level or vague level (not clear steps but clear test objective) it could then be considered more than a ‘human checking script’, maybe it can be called an ‘Exploratory Script’. I would think a ‘Test Charter’ would also be more than human checking.
[James’ Reply: I would call a strict set of checking instructions a “check procedure” unless I felt that “test procedure” was necessary in order to prevent management disorientation.
I would call it a test charter or test procedure if it were written in a form that afforded significant exploratory testing.]
I have at times given out a test requirement that I don’t fully understand and written the objective in a manual test script so I can trace to the requirement and assign it out to a tester to explore this test requirement in the application and write a test script to make important checks repeatable or in other words asked them to do exploratory testing and produce a ‘Check Script’
Ugh. Yours is the mentality of a modern-day Sulla or, to put it in terms you’ll understand, Mohamed Morsi.
I used to like reading your blog, but the fact that you can’t handle when people tell you that you’re not “great” is sickening. Signing off and never looking back.
[James’ Reply: This is surreal. Imagine this: someone you have never heard of takes the trouble to introduce himself, claims to have once liked you and now doesn’t like you, then compares you FAVORABLY to two famous and successful world leaders (I would have preferred Churchill, but I can’t have everything). Wouldn’t that make you feel, actually, pretty great?
Let me tell you, normal people don’t get this treatment!
Dear boy, if you want to deflate my ego you should go about it in a different way.]
Michael Phillips says
Nowadays any application of consequence exists in a wider system that includes the company’s internal network but also the external; the internet has made the external network much more complex than ever before. A complex system involves many different actors or agents interacting, and it’s in the nature of complex systems that they exhibit emergent behaviour. ‘Checking’ usually only involves observing the behaviour of an individual actor in a complex system; while that is important, it doesn’t give us much insight into the behaviour of a wider system or the influence of the individual actor on the emergence of that system’s behaviour. We can probably never get complete information on this, and while we may be contractually only responsible for the behaviour of the individual actor, we have a moral responsibility to gain insights into the behaviour of the wider system as a whole. Data has real consequences for people, and it exists in a complex emergent system; if we only check then we are neglecting that moral responsibility. I think it’s only by experimenting, investigating and evaluating that we gain insight into an individual actor’s effects on a system as a whole, and that, for me, is why checking, whether it’s automated or manual or both, can never be acceptable as a complete approach to testing unless we’re testing an application that’s completely irrelevant and lives in its own controlled world with no interactions at all, in which case I would ask why anyone’s building it.
Two books; ‘The Edge of Chaos’ by Mitchell Waldrop and ‘The Quark and the Jaguar’ by Murray Gell Mann influenced my thinking on this matter and I think anyone who’s satisfied to just ‘check’ should read them and really try to understand the importance of what theese two scientists are saying. Sooner rather than later.
Mario G says
Well, you teach courses, isn’t that what you do?
[James’ Reply: Yes.]
And you teach them to people who think differently or, at least, don’t think like you (otherwise, what need would they have for you to teach them anything). Correct?
[James’ Reply: Yes.]
Thus, for the right amount of money, you teach people even if they have different paradigms.
[James’ Reply: Yes. I value my time. So there are things that don’t get priority unless someone pays me.
However, you may be confusing teaching with “making the student learn.” I can’t make you learn or even listen to me. If you harken from a different paradigm you are likely to sit there thinking “this guy is stark raving mad” or at perhaps simply “he doesn’t understand testing.” I’m sure there are some people who do that, having been sent to my class.
My class works a little differently than most, though. I challenge you to solve problems that are pretty much unsolveable unless you re-organize your brain to recognize the value of what I’m trying to teach. So there’s at least a hope that my methods of communicating tacit knowledge will connect with someone like you.
I believe I have virtually no chance of doing that via text. Words cannot bridge a paradigmatic schism.]
So money would have opened your mind to my comments.
[James’ Reply: If you pay me, I could be willing to closely and formally analyze your comments and present my analysis to you. Whether that is what you mean by “opening my mind” is for you to say.
Certainly, if you pay me, or even if you were to show up at an event without paying me, I would find it interesting to interact with you for a while to confirm or refute my assessment of our respective conditions. It has happened before that I flipped the bit on someone over text, and then came to respect that same man greatly once I met him. (That happened with Rikard Edgren.)
I’m curious (somewhat) to see how you would react to the Pattern exercise, or the Calendar exercise, or any of several other challenges I make to students who then must exhibit real testing insight or else crumple to the floor in a fetal ball.]
Or are you saying that you only teach to people who already agree with you?
[James’ Reply: I teach people who are already human adults, and some children. I teach people who are already in the room. I teach people who already speak English. More than that I cannot say.]
Mario G says
[James’ Reply: Mario, I’m curious why you keep trying to say things to me, without first taking the effort to understand what I wrote in this post. In your latest comment you seem to think I changed my message, but in fact I haven’t. I did improve the diagram, but that was simply to better illustrate the post. If you think that changes the message, then perhaps the trouble is that you were only looking at the diagram and not reading the post, all along.
Anyway, you still don’t seem to understand the distinction we are making, and you continue to ask questions which are actually answered in the very post you are asking questions about. Apart from literally pasting the post into the comment area, here, I don’t know how I can help you with your stubborn case of assimilation bias and inattentive blindness.]
“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference”
James, I’ve read the above description a few times, but I fail to comprehend a particular part of this description. Maybe its ignorance on my part, but can you explain why or how testing includes only “to some degree:” questioning, study, modelling, observation and inference?
[James’ Reply: It means two things. A) That not ALL questioning is testing, not ALL study is testing, etc., but just that which directly relates to the goal of testing. B) That the list of activities is not complete.]
Are you implying that there maybe (or are) more ways to evaluating a product through experimentation other than questioning, study, modeling etc.?
[James’ Reply: Yes.]
Or are you saying that testing comprises of experimentation through part questioning, part study, part modelling, part observation and part inference? (An analogy of my question is how a dish (test) is prepared… 2 cups of water (questioning), 2 teaspoons of salt (study) etc.).
[James’ Reply: Yes, that, too.]
Hope my questions make sense. By the way, nice Captcha words! 🙂
Michael Bolton says
In addition to what James has to say above, have a look at Exploratory Skills and Dynamics (http://www.developsense.com/resources/et-dynamics3.pdf); and try a search on his site and mine for “what testers find”. You’ll see lots of others things (and many more on our sites) that could be included under “learning about the product through experimentation”.
Thanks Michael, I’ve been doing exactly that after I read James’ response. Your blogs opened my mind to many things I didn’t see before (but wanted to).
Igor Filin says
I believe in human’s high cognitive capacity. This is something a person can build inside the brain. There’s a very interesting book on that called “The Talent Code” by Dan Coyle.
Richard Forjoe says
[James’ Reply: Thank you for these interesting comments. My replies are below.]
Great Article, very thought provoking. The comparison between machine checking and human checking i thought was really good. Thank you. See my comments below. NB: Apologies in advance if some of my comments have already been made as i haven’t read through all the comments made by others.
“This situation exists in the fields of science and medicine, too. It exists everywhere: what are the implications to skilled human work of the evolution of tools? ”
RF: I think this should be ‘what are the implications of the evolution of tools on skilled human work? ‘
[James’ Reply: That is indeed another way to say it. I guess your way is less confusing, so I will change mine to yours.]
“We want to test a product very quickly. …”
RF: Do you mean testing quickly in terms of running more tests in a shorter period? Testing a product quickly is what those think about cost first would say.
[James’ Reply: “Running more tests in a shorter period” doesn’t really mean anything, because there is no general meaning to the numbering of tests. I mean we need to find those important bugs as soon as reasonable. We need to work with urgency.
I don’t know what you mean about people who “think about cost first.” I think we all want to find bugs quickly, instead of slowly.]
“…We believe that skilled cognitive work is not factory work. That’s why it’s more important than ever to understand what testing is and how tools can support it.”
RF: I think this is a brilliant distinction to make. A warrior learns his craft and can apply it using any tool. A spear thrower learns his tool and attempts to apply it to any situation, limiting his craft to one tool. As testers we shouldn’t be letting our tools shape us but rather we should learn our craft. People seem to be learning the tools and calling themselves craftsmen which is the problem, this might be driven by the assumption or fact that products are becoming too complex for the craft.
[James’ Reply: Are you familiar with the Book of Five Rings?]
“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.
(A test is an instance of testing.)
Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.
(A check is an instance of checking.)”
RF: I am assuming the distinction is purely down to Testing being continuous experimentation where as checking is referring to stopping after evaluating an observations ie expected result?
[James’ Reply: It has nothing to do with continuity. The difference is mainly that checking is algorithmic and testing is not. Testing involves creative interpretation and analysis, and checking does not.]
I think a check is an instance of testing but we should identify in testing when it’s a check or a test. I think this will encounter the push back it seems exploratory testing seems to constantly have.
[James’ Reply: Sorry, but a check is not an instance of testing, any more than tires are an instance of a car. A check is part of testing.]
In organisations they seem more keen on checks than tests, and it seems a lot of testers instead of pushing to add more value are satisfied with just doing what customers want and ignoring tests. Checks come across as confidence builders, which do not focus on finding the important bugs. Testing ends up being just a confidence activity which is just a fraction of what testing is.
[James’ Reply: I don’t think it’s fair to say that organizations prefer checking to testing, because such organizations don’t know what they are doing. They don’t know enough to have a preference.
In other words, any organization that seems to have that preference can be moved to a different preference by a skilled tester.]
I agree with the concept but my only disagreement is that Checks are instances of testing as they are being done to evaluate a product by learning about it through observations. Even if you only learn about an observation/expected outcome.
[James’ Reply: You don’t actually disagree. What you are doing here is not understanding. If you don’t understand something, whether or not you agree is irrelevant.
By saying checks are instances of testing you are committing exactly the error that we are trying to stamp out. You are confusing testing with checking. A check is part of testing, not an instance of it. Testing means experimenting. Checking is not experimenting (although it’s an important part of experimenting).]
A question I raised to Michael regarding checking vs testing comment on twitter: Is executing theories in the form of test ideas/cases, checking or testing?
[James’ Reply: There is no such thing as “executing theories” as far as I know. A theory cannot be executed because a theory is not a procedure.]
My initial answer: This is based on my thinking that we experiment with theories, ask questions, observe etc during the test design process hence test ideas that get generated are still tests.
[James’ Reply: That sounds like testing.]
My new answer: Based on the above definition, if no further test ideas are generated after evaluating the theory then it’s a check, but if the theory is evaluated and based on the learning/outcome further test ideas are generated then it’s a test as long as the cycle of learning continues. Hence test ideas are still checks when we don’t go past the initial observation identified during evaluation of the test idea.
[James’ Reply: I don’t get this at all. You’ll have to explain it to me over a beer.]
“•“learning” is the process of developing one’s mind. Only humans can learn in the fullest sense of the term as we are using it here, because we are referring to tacit as well as explicit knowledge.”
RF: Should this be Human learning as by the definition of learning, Machines are also capable.
[James’ Reply: I don’t feel the need to emphasize that “machine learning” is not learning in the human sense.]
“•“experimentation” implies interaction with a subject and observation of it as it is operating, but we are also referring to “thought experiments” that involve purely hypothetical interaction. By referring to experimentation, we are not denying or rejecting other kinds of learning; we are merely trying to express that experimentation is a practice that characterizes testing. It also implies that testing is congruent with science.”
RF: During experimentation we observe the products behavior to interactions, its environment etc…it’s not just the product we are observing.
[James’ Reply: Yes.]
“Human Checking vs. Machine Checking…”
RF: I agree, my initial thoughts being humans find it difficult to follow/stick to rules whereas machines thrive on applying rules, processes etc. Machines can perform a streamlined version of human activities, with boundaries/constraints intentionally/unintentionally set. What Machines can do human activities stripped into logic and programmed by a human to be executed by a machine.
My only disagreement is that Machine checking is an outcome/output of Human checking, hence should be within Human checking?
[James’ Reply: That would miss the entire point of making this distinction. I don’t say “human checking” to denote that it is done by humans. I say it to distinguish it from how machines work.]
Edward Granger says
I checked, there is no special word for cabinet maker.
Hannes Lindblom says
Following up a discussion on Twitter: how does the ideas of test vs checking relate to the concept of scripted testing vs exploratory testing? I feel that there are similarities between the concepts. If we for instance imagine “pure” scripted testing, will that only contain checks and no testing?
“Scripted testing” is about what controls testing. “Checking” is about the subset of testing that can be scripted (whether or not it is scripted.)
Scripted testing means testing that in some *aspect* is scripted. Scripted means controlled from the outside of the conscious test execution process, rather than being completely controlled consciously by the tester. There are many ways in which bits and pieces of testing might be scripted. If *everything* is scripted, however, I would not call that testing. Testing is a thoughtful process of investigation and consideration. We can’t script that.
What I once called “pure scripted testing” I would now call checking. Although it’s part of testing, I don’t think we should call it testing (the same way we don’t call a leaf a “tree”). My language has evolved.
So, checking CAN be scripted but it isn’t necessarily so. All machine checking is scripted. All human checking is an attempt by a human to emulate machine checking, so it is often scripted, too. Checking is not necessarily scripted, however, because checking may be pursued in the moment, under the control of the tester. What makes checking “checking” is not that it is scripted in some way or another, but rather that it is *completely* scriptable.
Rupert Burton says
I’m wondering if the intention of the users actions (or process) is important when determining whether we are testing or just checking?
[James’ Reply: Yes, it is important. In The Shape of Actions, a book that is the primary basis of our distinction between checking and testing, intention is discussed at length. You can’t do testing in the full sense of the word without the intent to test.]
Clearly tools do not have any intent, they just follow a script. People may also follow a script without thought or intent. But, a good tester with a desire to learn will be testing even when following a script. A tester may also utilise a tool to carry out checks on their behalf, the tool maybe just checking but the tester will be experimenting or exploring through the tool.
[James’ Reply: If so, then by definition the tester is not ONLY following the script, because that allows only the “intent to follow” but not the intent to test. Meanwhile, a human is never ONLY following a script. If a fire alarm goes off, he will stop the check. If the system hangs, he will stop the check (and there is rarely a sure way to tell that a system has actually hung, so that is going to be outside the script, too.)
Scripted testing and exploratory testing exist on a continuum. Even very scripted testing has some exploratory elements to it. This is one reason we distinguish between machine checking, which can be absolutely and purely algorithmic, and human checking, which will never quite be so.]
Without intention in the definition:
We could attempt to make discoveries through experimentation but not actually learn anything that helps us to evaluate a product.
Is this not to be considered testing because we failed to learn anything to help in an evaluation? – I’d say it is still testing.
[James’ Reply: It may be poor testing– it would have to be if you learned nothing or if all that you learned was false– but it’s testing.]
In contrast we may observe an anomaly in the results of an automated suite that gives us salient information even if we had no intention of learning.
[James’ Reply: How could that be? If you were running an automated set of checks, then you had intent to do so, right? That intent is probably to test the product. Or are you saying that someone delivering a pizza who happened to glance at a screen might “notice an anomaly that gives us salient information?” If so, then such a person would probably ignore it, but in any case I would not call that testing.
There may be testing-like actions that people engage in for some purpose other than testing. But why worry about that? My concern is for testers and testing and people who do intend to learn the status of their products.]
During the run there was just a machine performing checks, when it finishes and we examine the output does this retrospectively becoming testing because can make a value based judgement based on our observation? Seems to me the checks are still just checks, the testing doesn’t begin until after human interaction.
[James’ Reply: It’s not that it retroactively becomes testing (although there might be such a thing in the case of a delayed oracle, whereby you see a specification today, and suddenly realize that a behavior you saw a week ago was actually a bug, thus rendering what had been a tour of the product into what retroactively becomes a test), it’s the the checking process is embedded ALL ALONG in a testing process. You can’t design the checks without engaging in testing (possible poor testing), you can’t decide to use the checks with engaging in testing, and you can’t interpret the results of the checks without engaging in testing.
What we are calling checking is ONLY found embedded in testing. It’s possible that what someone else might fairly call checking could be outside of testing, but in Rapid Testing methodology our sense of checking is parasitic on our sense of testing.]
Or, is there perhaps a grey area between where checking stops and testing begins?
[James’ Reply: Look at our diagram in the post. Checking is inside of testing. Also, notice the cloud shape and the dashed lines? Those are intended to suggest fuzziness of different kinds.]
I would like to further illustrate with the following example:
I was recently working on a large and complicated but poorly documented application undergoing transformation to a new platform. Knowing that testing resources were stretched and being charged with creating an automated regression suite we had to be creative in our activities. We used SQL queries to introduce some variability into the data for our automated scripts. This would allow us to perform the necessary regression checks whilst leaving open the opportunity to discover things about the product.
[James’ Reply: Excellent idea.]
For example a script to test might fail because:
a) of a regression error,
b) our (testing) understanding of the business requirements was not correct [so we needed to update the SQL query and/or test – but we learnt something],
c) the script executed a scenario that had not been tested before (or possibly even documented) and the code had not been developed according to the business need, though not technically a regression error (again we learnt something).
[James’ Reply: Also, D) a tool/platform failure, E) a bug in your checking code, or F) the state of the checking system or system-under-test was disturbed by some external agent.]
Although the primary objective of this activity was regression checking, we were also learning about the system when the checks failed. I’m not sure I’d call this experimentation although there was the intention of learning through observation, so I’m not sure if it meets your definition “testing” or not. Maybe this an example of what you’d call “speculative exploration”, though I wouldn’t call the inputs exactly random.
[James’ Reply: By any scientific notion of “experiment” I know, you are running an experiment. You are testing. Your primary objective is testing. You are wishing to know the status of the product. It is motivated by this research question “what is the status of my product after these changes?” You are using a machine-checking process to fulfill most of the fact gathering and initial evaluation of the results. Now, it may be poor testing, I don’t know. That depends on the quality of your design and management of the test-that-includes-the-checks.
The way to make that not testing (and also not checking) would be to perform that process as a ceremony: press the button, wait until it stops, perhaps email the results to a third party who ignores it. Do this with the intent of “not getting yelled at.” Now it’s not even checking, because no fact gathering that might theoretically have happened has made any difference to anyone.]
When the tests failed we learnt something, but if the tests all passed we didn’t (at least for that run).
[James’ Reply: Not so! You learn that the checks did not detect failure. This you did not know before you ran them. If you DID know, then that wouldn’t be checking because you would already have gathered all the facts and the so-called “checks” would not be gathering anything new. You learned that no smoke came out of the machines. You learned that they didn’t hang. You learned how long it took them to complete, and if that were to be much different in time than you were used to, you would have investigated that.]
We could’ve built the tests with hard coded data just to check for regression issues – I’d call this checking.
[James’ Reply: Both your randomized thing and the hard-coded thing are checks under the definition of checking in the post you are commenting on.]
However, we built the tests with the intention to learn, expecting to find issues and knowing that we would have to investigate and evaluate. We may have only been utilising automated checks (most of the time) but I’d still call this testing.
[James’ Reply: The entire process is testing. The scriptable fact-gathering PART of it is checking.]
I’d welcome your feedback.
[James’ Reply: My feedback is that you are obviously a man of philosophical chops. I hope you continue to comment on my work. It will help me be sharper.]
Florian Jekat says
I read through some older posts from Michael Bolton and you regarding the topic ‘testing and checking’. But there is a point in this post that makes think.
“Testing is the process of evaluating a product by learning about it through experimentation, which includes to some degree: questioning, study, modeling, observation and inference.” and an implication
“Testing is an open-ended investigation ….” and
“evaluating means making a value judgment”.
Is it really a testing task making a value judgement of a product? I have a rule mind that says “Never be the gate keeper.” How should it be done to say a product is good or bad for testers? In addition making a value judgement and open-ended investigating a product seem to be counterparts. Didn’t you say testing provides information for someone who has the authority to make a informed decision or a value judgement about a product?
Are testers mutated to stakeholders?
[James’ Reply: This is a great question, Florian.
The value judgments we make as testers are not personal judgments, but rather judging with respect to the values of our stakeholders. When I see a bug, I am always asking myself whether my clients would consider that a threat to the value of the product, whether or not I personally might think so.
The judging that stakeholders do can be yoked directly to decisions about the product, whereas the judgments testers make relate only to the direction of their work and what (and how) they report.
If someone asks me my opinion of whether the product should ship, I am happy to give it, as long as they accept that as information pursuant to their decisions, and not treating it AS the decision.]
Florian Jekat says
there is another thought that comes to my mind while I was thinking about your post. What is the essence of testing? Let’s make a short thought experiment:
I experiment a little bit with a tablet exposed in a store just for fun after work. I won’t evaluate it for buying. I act only just for fun, but I am learning something about this product. Is this testing?
[James’ Reply: That is not yet testing, but it may retroactively become testing. If you did this for a while and then decided you did want to test the tablet, then what you had just done would certainly be considered part of that testing process. So, it’s testing-like. It’s “proto-testing.”
The reason why I would not call it testing is that you claim to be making no attempt to evaluate the product.]
You wrote a reply on one comment here that says “You can’t do testing … without the intent to test”. I repeat your definition of testing here, too: “Testing is the process of evaluating a product by learning about it….”.
I act consciously and in an act of volition in my thought experiment above, but I only want to play and learn(!) a little bit with the tablet. Is it testing?
[James’ Reply: I see no good reason to call that testing by my definition that you cited.]
Sure, there is no obvious intent for testing, but how can I differentiate between an intent for testing and other intents?
[James’ Reply: You are the only one who can! Tell us what your intent is. If you don’t know your intent, then who else does?]
That leads to my starting question “What is the essence of testing?” in your understanding?
[James’ Reply: The essence of testing is to shine light so that others do not have to work in darkness. This is not merely the fun of waving a torch at night, but shining that light with purpose; as a service.]
Matt Griscom says
James and MB,
Thank you for this clear discussion of the terms. I’ve been doing it wrong, and this is the perfect time to fix my work…
In developing the MetaAutomation pattern language, I’ve been keeping in mind the vast difference in design and artifact between a manual test (executed by a human) and an “automated test” (executed by a test harness etc.) and how much I’ve struggled with widespread ignorance that there’s any difference there at all. I’ve even written a bunch about it for my book, but my treatment of that topic didn’t quite feel right.
You’ve solved the problem for me: there is no such thing as an “automated test,” or at least not a fully “automated test.” These are called “checks.” If somebody talks about running automated tests, it’s just because they haven’t gotten the news yet.
[James’ Reply: It’s like calling a whale a “fish.” It’s fine to do that if you aren’t a practicing biologist, but if your work involves research about sea life, you do need to keep the details straight. And it’s fine to say that a television is a sort of “babysitter” unless you are serious about daycare or parenting, in which case you do need to know that a TV cannot perform the role of a care-giver. Our terminology becomes refined to the degree that we need deeper or more reliable solutions.]
I need to (ahem) make some checks with you on how you guys use some words
• An assertion is an operation that doesn’t interact with the system under test (SUT) but does have a Boolean result
[James’ Reply: I would simply say that an assertion is condition logging. When a certain condition is true, log that it is true. How the process of such logging interacts or does not interact with the SUT doesn’t seem to me to affect its status as an assertion. In fact, the very act of logging does interact with the SUT (affecting timing, disk space, network bandwidth, or something like that), although usually not in any material way. Assertions are, I suppose, the simplest kind of check.]
• A check (as defined in your post) is zero or more operations on the SUT plus zero or one assertion, all fully automated
[James’ Reply: A check does not have to be automated at all. The key idea is that the check CAN be automated. That’s what makes it special and interesting. I think you would have to have at least one assertion, though, or else I don’t see how it could fit the definition of a check, since no evaluation would be happening.]
Does this make sense?
Thanks to Michael Bolton for referring me to this blog post, and thanks in advance to James Bach for his feedback especially because I know he won’t pull his punches.
A good explanation of your concept. The distinction is useful. Unfortunately, it doesn’t align with how the words are used out there in the real world.
[James’ Reply: Oh, I think it does. I don’t encounter much resistance.]
You might encounter a lot less resistance if you came up a new word for what you refer to as “Testing but not Machine Checking”. Maybe stick with “Exploring”.
[James’ Reply: Exploring would obviously not be the right word for that. Also, there’s no need for a word for that, just as I have no need for a word for a car-with-everything-except-tires.]
BTW, if my memory of how Venn Diagrams serves me well, your diagram contradicts your words.
[James’ Reply: This is obviously not a Venn diagram. But it is related to one.]
Well, actually, I’m not even sure how to interpret the diagram since Testing is outside the oval.
[James’ Reply: I agree that you don’t know how to interpret the diagram. The word “Testing” is the title of the diagram, not part of it. I can understand how you might be confused, but it is not unusual for people to put titles on their diagrams that do not participate in the semantics of the diagram itself.]
Is it just an oval that is so big that we cannot see any part of it?
[James’ Reply: No.]
If so, everything inside the large oval is part-of or a kind-of Testing.
[James’ Reply: Okay, since you want to be precise let’s use words properly: it’s an ellipse, not an oval. I’m sure you already know the difference, but it’s fun and useful for me to point out that you, yourself, in criticizing my perceived inexactitude, have also found that in human communication, words are heuristics. How precisely we use them varies with the situation. I like to be quite precise, but I am rarely as precise in natural language/communication as I am when I write code. You sensed that you could say “oval” without sounding like a buffoon or confusing me– and you were right.
In this case, my intent is to be very precise. So, I’m taking your criticism seriously.
My intent in this diagram is to show an ellipse that represents testing, and that everything inside the ellipse is a part of testing. This comports with what you have supposed. So, yay!]
And Machine Checking is a kind of “Learning by experimenting. …” Which doesn’t sound right.
[James’ Reply: Not “a kind of.” Instead, “a part of” which is what you just suggested as a possibility (and you were right). Machine checking is an optional part of testing. I tried to imply “optional” partly by making the line for that ellipse dashed.]
So maybe “Learning by experimenting” needs its own oval which doesn’t intersect with Machine Checking and maybe overlaps Human Checking a little bit.
[James’ Reply: If I had been trying to suggest that machine checking exists outside of testing, then I would have done that. But my claim is that it does not exist outside of testing.]
And Testing should be inside the large oval to indicate that all these kinds of checking and experimenting are different kinds of testing.
[James’ Reply: Unless “Testing” is the title of the diagram, which is the case, here. Titles are conventionally placed outside of the diagram, although exceptions to that rule are not uncommon.]
But wait, aren’t you saying that Machine Checking is not Testing?
[James’ Reply: Yes. Just as tires are not the car. Tires are part of a car. Understand how that works?]
Ack! That implies the oval for Testing should intersect Human Checking but not Machine Checking.
[James’ Reply: Not if you interpret this diagram correctly.]
The bottom line is: Since you are advocating for precision in language, it should be accompanied by precision in diagramming. 😉
[James’ Reply: Of course, I agree. However, I also need my readers to be reasonably careful (and charitable) in their interpretations of my diagrams. Right under the diagram you are referring to are words (which are not part of the diagram, but you correctly understood that). The words read as follows:
‘You might also wonder why we don’t just call human checking “testing.” Well, we do. Bear in mind that all this is happening within the sphere of testing. Human checking is part of testing.’
So, what I’m saying is, you might find it easier to understand the diagram if you read the accompanying post more carefully.]
I’m having a hard time coming to a conclusion on the following thought I had, could you please offer your insight – thanks in advance.
How do you decide which checks are worth automating and which checks are best left to be done manually? Automation seems like an obsession now and it feels to me that the pervading feeling is that if it can be automated, then automate it. That doesn’t sit right with me, since there’s an overhead in creating and maintaining the check.
[James’ Reply: The answer to that relies on many specific factors such as: what tools I have, what skills I have with tools, what product risk is associated with the check, what else I have to do, how much the situation is changing, how complex the oracle is, how testable the product is, what my team wants me to do, what I want to do, etc.
I must resist doing things merely because I feel like it, or merely because I haven’t bothered to think about alternative things I could do. I am here to find important bugs quickly.]
Gyula Csom says
Though a bit late to comment… it is a wonderful problem you are dealing with in your article!
Perhaps exploring the following subject might be of your interest: well-structured vs. ill-structured problems. Herbert A. Simon  seems to be one of the first persons who examined the problem of ill structured problems . Although his focus was AI, the two subject seems to be very close to each other – i.e. well-structured-vs-ill-structured-problems and testing-vs-checking. Also the topic seems to have a greate literature…
 Herbet A. Simon: The structure of ill structured problems
s/Am I a warrior or a just spear throwing /Am I a warrior or just a spear throwing
Greets from New Zealand 🙂
[James’ Reply: Thanks!]