Behavior-Driven Development vs. Testing

The difference between Behavior-Driven Development and testing:

This is a BDD scenario (from Dan North, a man I respect and admire):

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then ensure the account is debited
And ensure cash is dispensed
And ensure the card is returned

This is that BDD scenario turned into testing:

+Scenario 1: Account is in credit+
Given the account is in credit
And the card is valid
And the dispenser contains cash
When the customer requests cash
Then check that the account is debited
And check that cash is dispensed
And check that the card is returned
And check that nothing happens that shouldn’t happen and everything else happens that should happen for all variations of this scenario and all possible states of the ATM and all possible states of the customer’s account and all possible states of the rest of the database and all possible states of the system as a whole, and anything happening in the cloud that should not matter but might matter.

Do I need to spell it out for you more explicitly? This check is impossible to perform. To get close to it, though, we need human testers. Their sapience turns this impossible check into plausible testing. Testing is a quest within a vast, complex, changing space. We seek bugs. It is not the process of  demonstrating that the product CAN work, but exploring if it WILL.

I think Dan understands this. I sometimes worry about other people who promote tools like Cucumber or jBehave.

I’m not opposed to such tools (although I continue to suspect that Cucumber is an elaborate ploy to spend a lot of time on things that don’t matter at all) but in the face of them we must keep a clear head about what testing is.

54 thoughts on “Behavior-Driven Development vs. Testing

  1. TDD, BDD, ATDD etc etc are NOT about testing in the traditional sense of the word. They do not replace the need for human testers (in particular) or a QA process (in general).

    They are techniques that facilitate communication – developer to developer; developer to stakeholder; developer to tester.

    They can also help ensure that the software is nicely modular, that the software can be safely refactored over its (hopefully) long life, and that some classes of defects never get to the next stage in the lifecycle.

    I know that James (who I respect and admire) understands this, but I worry that some developers might take this blog-post as a criticism of TDD/BDD (or unit testing as a whole), rather than a reminder of the importance of exploratory testing and the role of human testers.

    [James’ Reply: Why would you or anyone think I’m criticizing unit testing, unless by that you are referring to something that isn’t testing at all?]

  2. I agree too James.

    Have you read The Cucumber Book?[1] I would like your feedback as to whether we’ve reflected this view clearly enough, because I think you’ve made a very important point that many people who embrace tools like Cucumber do unfortunately miss.

    [1] http://pragprog.com/book/hwcuc/the-cucumber-book

    [James’ Reply: I have not read it. I’ve seen several demos of Cucumber. I’m reacting to what people said about it in the demos, and what I think I see in it. I’ll check out the book.]

  3. The link to Dan North’s “Introducing BDD” is broken. It is missing the colon between “http” and “//”.

    [James’ Reply: Fixed. Thanks.]

  4. The people with whom I’ve worked that promote BDD or ATDD using tools like Cucumber or Fit all generally agree it’s a design tool and NOT a quality tool. Converting a requirement or a user story into a testable binary (or a “check”) forces all involved to analyze what’s being requested. Since computers won’t recognize fuzzy language, forcing us to use computers puts a check on such things. When it works well, a Cucumber test is a very informative metric letting the developer know when they’ve coded “just enough” to stop.

    It is not, however, a tool for verifying quality or even the fitness of the requirement under test anymore than a unit test would be a replacement for an integration or performance test.

    I do, however, understand your frustration. I find myself repeating this maxim a lot recently. It seems as though anything with the word “test” or “quality” somewhere in it will be immediately misunderstood.

  5. What is the difference between ‘ensure’ and ‘check that’?

    If the BDD definition had been written like this:
    +Scenario 1: Account is in credit+
    Given the account is in credit
    And the card is valid
    And the dispenser contains cash
    When the customer requests cash
    Then check that the account is debited
    And check that cash is dispensed
    And check that the card is returned

    would you have changed the way you wrote your testing version with respect to these lines?

    [James’ Reply: The main thing I did in my version was add an “everything else” clause to illustrate that these BDD statements are not confirmations of correct behavior, but rather partially correct behavior combined with possibly incorrect behavior that goes unnoticed. The clause also illustrates that BDD coverage is limited to the particular states of the many aspects of the system that happen to be true at the moment of execution– and that those states are both myriad and unspecified.

    The second thing I did is roll back the arrogance of the word “ensure.” We don’t ensure anything in testing. To ensure is to “make sure of” or “compel.” We don’t do that. Developers do that by writing code. What a process like BDD can do is check something, not ensure.]

  6. Cucumber introduces too much overhead/complexity to be an efficient design tool. If that is its main purpose, which I agree it is, then it is a big FAIL.

    I appreciate the goal of Cucumber but there is simply too much plumbing overhead. I’ll spend my time talking with the customer and writing real BDD driven code.

    [James’ Reply: That’s my biggest WTF feeling about Cucumber: The amazing amount of effort people will go to in order to accomplish a simple objective, when that objective can be accomplished with much less effort by other means.]

  7. We use Cucumber in our team and we are very well aware that such automation relies on the supposition that the future shall resemble the past/documented – it helps verifying what you have coded, prevents documented problems and has to be incrementally improved.

    We never rule out the need of QA and we have QA people.

    [James’ Reply: What I don’t get is why you invest so much work to accomplish so little.]

  8. (I’m double-posting)

    But I understand what you (and this blog), people in my team (even myself, earlier) thought humans could really be replaced.

  9. James,

    I think you’ve just demonstrated what I consider to be the main benefit of writing Cucumber (from a team point of view rather than a design point of view) i.e. it gives the team the chance to discuss and critique the language of requirements before they are implemented. Just as you point out the invalid use of the word ‘ensure’, discussions around Cucumber tests often involve pointing out the invalid use of the words being used to describe how our system should behave. Over time this facilitates a much more exact use of language, which has benefits for everybody involved.

    As a tester, I find it very beneficial to get the chance to play Socrates before work gets under way. It also gets my brain thinking about how to test the requirements before I actually get round to doing it.

    [James’ Reply: And I can do all of that, rather easily and cheaply, without Cucumber.]

    • How?

      [James’ Reply: Are you serious? You want me to explain how an ordinary human conversation works? Do you think before Cucumber existed people didn’t know how to talk to each other about technology? What is wrong with you? Get help.]

      • (!(That was helpful, James))

        Perhaps, he was wondering how best to capture that “human conversation” and turn it into an automated test?

        [James’ Reply: You can’t “capture” human conversations. All you can do is record what people say. You can’t turn them into automated tests. A test is a performance, involving perceptions, adjustments, judgements, etc. It lives in the moment.

        That would be like capturing what a programmer does and turning that into “automated programming” that then goes on and writes new programs so that we don’t need programmers anymore. No one does that. It would be a dumb idea.]

        After all you claimed you can do the same thing that Cucumber does with much less overhead. I am all about reducing overhead, especially with regard to testing, so please enlighten us.

        [James’ Reply: We do this by talking to each other. You already are enlightened. You know how to talk. And we do this by interacting with the product over time and testing it, often in ways mediated by tools (including tools that perform extensive output checks, at times). What I do when I interact with the product cannot be “captured” or bottled and duplicated. That’s not in the nature of intellectual work. Whenever someone claims to have done that, what they have really done is captured some trivial aspect and ignored the meat of the matter.]

  10. In regards to your last comment as to the misplaced time in BDD/ATDD tools, my .02 is that the issue is that for most ‘fixtures’ one or more equally relevant user/customer facing demonstrative features exist that accomplish the same goal.

    For example, one of the normal ways we show a user we ‘ensure[d] the account is debited’ is to provide a receipt. This example is a bit more complex due to the hardware integration, but assuming we can capture this (say by comparing the print stream) it supports both an automated development time ‘verification’ (not testing/checking) of specification examples, /and/ it also serves as the run-time feature itself.

    IMO, Cucumber is an internally proxy for testing outcomes for the purpose of showing the development team their system is good. When instead we could be creating our best demonstration externally as an actual artifact of the system to the user/customer to show them the system is good.

    The BDD/ATDD/Specification By Example ‘process’ is very good, but the wrong artifacts are being created. Rather than a parallel development-time-only artifact, we should just create the actual artifact that supports both development and run-time production.

    This doesn’t change the central thesis of your post, though it is more obvious that it’s a demonstrative feature and not a ‘test’ per-se.

  11. While I agree that for finding new defects human testers are irreplaceable (I make the ‘quest’ argument myself). We ourselves do not test “all possible states” but use techniques such as boundary value analysis that allows us the maximum coverage for minimum time / effort.

    [James’ Reply: Of course we don’t test all possible states! But that wasn’t my point. The quest of testing is not embodied in the things we check but in the unfolding dynamic of our thinking and learning which leads (for people with good testing skills) to a MUCH better sampling of the phase space of the product. BDD checks are don’t unfold at all. They are cute little static confections.]

    Those techniques translate very well into automated tests.

    [James’ Reply: No they don’t. Not in any way. In fact, no good test has ever been automated. Good testing can only be done by humans. For the same reason that cruise control is not “automated driving” a program that performs a check is not automated testing.

    It’s this typical misunderstanding of the nature of manual testing and automated checking that led to I and my colleagues changing our language so that we now avoid the use of the term “test automation.”]

    However, automated testing is actually invaluable for a few things that human testers can’t do (not actually finding ‘new’ defects), the most prominent of those is speed.

    [James’ Reply: I use tools in the course of my testing, both to enable finding bugs (I don’t use the term “defect” if I can help it) that I couldn’t find otherwise, as well as to perform checks. I agree that it is wonderfully valuable when done well. It is typically, however, a shocking waste of time– embraced by people who generally don’t enjoy testing in the first place, and so have little idea what they are giving up.]

    With automated testing you can ensure that the most important and / or risky aspects of your app is covered (you can even use our own techniques such as boundary value analysis) and working which allows you to change fast and deploy quickly. Without tools like RSPEC or Cucumber you can not hope deploy once or more a day.

    [James’ Reply: I don’t think your tools are doing what you think they are doing.]

  12. Hi James,

    Could you please clarify the difference between “Check” and “Ensure”… It seems that you have different understanding than mine for these words…

    [James’ Reply: I like to use the Oxford English Dictionary as my authoritative reference, to the extent I accept authority over my use of English…

    There are several definitions of ensure. The first one seems closest to the notion of checking:

    1. To make (a person) mentally sure; to convince, render confident.

    This is not outrageously wrong, though it strikes me as over-confident. What troubles me are four other definitions that go way beyond checking and refer to causing something to be the case:

    6. To secure, make safe (against, from risks).
    7. Comm. To insure v. (a person’s life, property, etc.). Obs.
    8. To make certain the occurrence or arrival of (an event), or the attainment of
    (a result); = assure v. 5.
    9. To make (a thing) sure to or for a person; to secure.
    ]

  13. While I agree with the thrust of your argument, Curtis, I encourage you to be careful.

    “Converting a requirement or a user story into a testable binary (or a “check”) forces all involved to analyze what’s being requested.”

    Consider the assumptions in that statement:

    1) The requirement is simple enough to be expressed in terms of a binary outcome (which is the sole outcome of a check). I can’t think of very many requirements that yield to that kind of simplification. Whatever is going on, something else is going on; whatever happens, something (or nothing) will happen in response.
    2) The action of converting an idea from one form to another to can force someone to do something. At best, the process might suggest something to some person (but not to others).
    3) “All involved” includes some person(s) other than the one who is performing the conversion.
    4) The analysis is thoughtful and thorough.

    —Michael B.

  14. We are actively implementing BDD at our place. And we use specflow for this purpose. Between I’m the tester in our company. I do lot of manual testing (did reasonable automation in the past) but also involved in BDD approach. This technique is helping us to establish the collaboration between testers, developers and product owners. This is working really well for us. We have written hundreds of scenarios so far and run them frequently. We all work together and come up with the test scenarios then developers convert them into Gherkin Language(Given,when then) format. This raised my interest in BDD. I did lot of research on “Do we need programming skills to implement BDD” and found zero results. This means it is clearly understanding that no tester asked this question ( I might be wrong). Although I’m the tester (Did very basic coding 8 years ago and I’m from IT background.) I’m really interested in learning C# with .net ( atleast basics) and then implement these scenarios myself which add’s lot of value to the company). Thank you James for giving me the opportunity to share my thoughts.

  15. Reading this reminded me how language shapes perception, and of an article by Don Gray: http://www.donaldegray.com/debugging-system-boundaries-the-satir-interaction-model/ where he makes the point that, “the words we use create meaning in other people – And we don’t get to choose what that meaning is, they do.”

    In my FDA regulated world there are consequences for miss-communicating meaning. If we all agree that the word “ensure” is an overstatement in this context – then choose and use a more appropriate word.

    Also, for me: the Check:Test:Ensure model is like the Intake:Meaning:Significance model in a Satir Interaction model way. When I identify people/organizations mistaking “intake” for “significant” that is an interesting anti-pattern to explore in the organization.

  16. Although I agree with you, James, that automated specs don’t replace the human mind when it comes to testing, I still miss the point why you think Cucumber is a huge overhead.
    I played with Cucumber a bit in the past, and found that it’s not much harder than writing an automated spec in any other tool.

    [James’ Reply: The concept of an automated spec is a dumb idea in itself. The source code of the product is ALREADY AN AUTOMATED SPEC. What you’re doing with your so-called automated spec is creating a new layer of technology that is a pretense of communication while subtly discouraging it. Users and domain experts aren’t programmers– you are tricking them into thinking they are not programming while in fact they are programming. This trick gives them a false view of both the product and the automated check of the product. I would much rather walk my clients through the code, or else give them a fully realized English specification, than to write a boneheaded set of hundreds of simplistic examples and expect them to understand them, or give them a hideously oversimplified construct in which to express their ideas for the specification. We already have a construct: it’s called English. We also have diagrams. We also have conversation. And we have prototyping. Those things work together to solve the problem.

    As for overhead. Obviously you have to write all the fixtures to tie the language into the tool. That’s pure work that I never have to do, because I don’t fool with tools like that.]

    Or, do you mean that writing executable specifications is an overhead by itself, regardless of the tool used? Then, what can we do instead?

    [James’ Reply: Come on man, use your head– WE TALK TO EACH OTHER. We write, we draw, etc. I’ve been doing that for 28 years. Stop playing dumb!]

    As a member of a team that looks for opportunities to automate user story verification and enhance the dialog between stakeholders, I’m very interested in your opinion.

    [James’ Reply: How about instead you stop hating talking to each other, develop your social skills, and engage people?]

  17. I like BDD and what it does. It tries to find a simple common denominator for producing software that fulfils wanted behaviour. To describe that behavior a language construct called Gherkin (and not Cucumber! Cucumber is the ruby interpreter for Gherkin) is used.

    I see the strength not in the test (more to that later) or the automation bit. What I see is that the system designer/BA is forced into a simple model to describe the application. A language that can be easily understood by all involved and reduces defects/bugs at a point where they have the most severe consequences….the inception phase.

    [James’ Reply: I don’t feel that we are served by simplistic language. I feel better served by using the full spectrum of English. Also diagrams. Where are the diagrams in Gherkin? I don’t see them!]

    Since BDD is a development methodology we do it some discredit using the testing scale to critique it. We have to see BDD as an extended form of Unit Test that testers can “play” with too. I’d expect the test that does the checking above to be scripted by the developer and not a tester. The tester would be underutilised doing that. As James points out his time is much better used testing things that require a brain and cannot be checked or automated.

    I am an absolute fan of automated smoketests that test the absolute basic functionality (as would be the case above). James would probably disagree because that is like carpet bombing the same land over and over again. He is of course right from his context but these automated checks are needed too.

    [James’ Reply: They don’t test anything, Oliver. They check. The checks can be useful. One thing I warn against is paying a high price for simple checks. I also warn against using checks to the exclusion of tests– it’s an abdication of tester responsibility.]

    With frequent release cycles, hot-dog developers and careless infrastructure cowboys they manage to break something fundamental eventually. If I can prove/disprove that in 10 minutes in an automated fashion that’s great. Is that testing? Certainly not. Is that checking? Probably. Is it regression? Definitely not (using PM’s and test tools sales guy’s “definition” of regression here). Does it fulfill a useful function? Hell yeah. We save months of manual effort a year by doing it.

    [James’ Reply: I can create and use such automation inexpensively. When I look at Gherkin/Cucumber I see needless confection. Please DON’T confuse simple automated checks with overwrought fixtures tied into faux English protocols for specification.]

    The nice thing with BDD is that this smoke test can be traced back to (mostly positive case) behavior we want to see. That will keep most PM’s, customers and auditors at bay. It also shows clearly how a piece of software was intended. If it is automated that would include a live demo!

    [James’ Reply: Yeah, I can do the same thing with simple Perl code.]

    But I also think James is spot-on with saying this is not testing. Of couse it isn’t but lets not condemn the method as i think it has merit in a lot of other ways that bring quality into development without being about testing at all. So we need to complement BDD by the things we as testers already know and do.

    Oliver

    P.S. A tester using Cucumber is called a developer 😉

  18. I’ve done alot of reading on BDD this year and I’ve run into the same problem with people thinking that doing BDD is the same as testing. Some things I like to bear in mind when I’m thinking/talking about BDD are…

    – You must be sure what problem you’re trying to solve if you implement this stuff. BDD isn’t a solution to a testing problem so don’t expect it to fix any of those kinds of issues. It might help alleviate them (for a time) but it won’t fix them completely.

    – It’s really important what you call the different practices involved. For example, thinking/communicating about the feature files as “Rejection checks” rather than “Acceptance tests” helps immensely (think about it! Thanks to Michael Bolton for that).

    – Thinking of feature files as requirements rather than tests also helps. Well it helped me at least.

    – You get nothing for free. There is a cost for everything (money, time, sanity, etc). Lots of people think the checks (NOT tests remember!) write themselves and that the results interpret themselves for some weird reason! So try and figure out whether the ROI is worth it for your needs. You have to create CODE to do this stuff which needs creating, maintaining and (shock horror) testing so don’t be seduced into thinking “any old tester” can do it. They can’t.

    Vern

  19. James, wouldn’t BDD steps be helpful for coders to understand the scope of requirements better. I agree that BDD alone might not be efficient in testing.

    [James’ Reply: Better than what? Better than having a conversation and doing a prototype? I doubt that very much. I think it’s a tool being pushed by fan boys.]

  20. There’s an effort underway among developers in our shop to implement Cucumber tests as part of the build to “help us release faster.” I’ve not yet received a satisfactory answer to how this is so. Cucumber is the proposed solution, but the appeal is actually to some underlying principle (and not to Cucumber as a tool specifically. Presumably other tools would work in principle).

    At any rate, does anyone know of an agile-esque principle that asserts that this kind of BDD testing helps developers get the release out *faster*? QA has been invited to participate in writing these tests. Since “speed of release” is not a quality concern in and of itself (and can of course even run contrary to QA objectives), I’m reluctant to commit much in the way of QA time/resources to writing such tests.

    [James’ Reply: The idea that it helps you release faster is a fantasy based on the supposed value of regression check automation. If you think regression checks will help you, you can of course put them in place and it’s not necessarily expensive to do that (although if you automation through the GUI then you will discover that it IS expensive).

    BDD, as such, is just not needed. But again, if it’s cheap to do, you might want to play with it. The problem I have is that I don’t think it IS cheap to do in many cases. I’m deeply familiar with the problems of writing fixtures to attach high level “executable specs” to the actual product. It can be a whole lot of plumbing to write. And the temptation will be to write a lot less plumbing and to end up with highly simplified checks. The simpler they are, the less value they have over a human tester who can “just do it.”]

  21. @PM Hut: It might be helpful to see the description of a check that James and I developed a couple of years back:

    http://www.developsense.com/blog/2009/09/transpection-and-three-elements-of/

    I’ve given a great deal of thought and keystrokes to checking. Two key posts are here http://www.developsense.com/blog/2009/08/testing-vs-checking/ and here http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing/.

    —Michael B.

  22. Hi James,

    To me, these practices address a couple of questions that I encounter in my work, on product development teams:

    1) Did we break anything that mattered enough to us before that we chose to implement an automated check? (and then, “how should we respond to that?”)
    2) Is this product worth testing? (essentially, is our prototype to the degree that we want to truly evaluate it)

    [James’ Reply: Other practices address them better.]

    Did you EVER at ANY point in your testing career experience a scenario in which you thought “hey, this executable specification thing really seems to be helping this team move along”?

    [James’ Reply: No.]

    I find it to be a valuable practice in my work because I use it as an investment strategy. I use ATDD and BDD to help guide my team to building a minimum viable product. Then once a certain amount of specs are achieved, I will switch my mode to critiquing the product, interactively.

    I also wonder if, unknowingly, this whole BDD-ATDD thing is really a team prototyping technique. And it just so happens that once the specs are passing, some teams believe the product (at that point, a functioning prototype) is “good enough” to ship.

    [James’ Reply: You seem to be on the verge of comparing your favored practice to alternative practices. But I’m not seeing that, yet.]

    I do value manual interaction with the product to test it. However, I also am interested in the concept of zero quality control, and try to explore both poles of behavior (prevent some failures, expose some failures). I’m not sure about the balance between the two. But I do keep my programming and testing chops exercised so I can temper my work based on assignment.

    [James’ Reply: Zero quality control? What does that mean?]

    Not to candy coat, but I do sincerely appreciate the healthy antagonism you bring by questioning these practices that some of us unknowingly ritualize to the point of uselessness.

    :patrick

  23. [James’ Reply: Other practices address them better.]
    I’m surprised I didn’t get a list of articles or suggestions with that comment. Though I am inspired to research, for myself, where other practices may excel.

    [James’ Reply: Patrick, what I’ve been trying to say about this BDD crap is that it’s overwrought and underpowered. It’s an excuse for dumping a lot of energy into a small gain. I don’t need to give you articles or suggestions, other than, maybe, that I suggest you notice that before there was “BDD” there were people who talked together and wrote things down here and there. You may not believe it, but there is really is a history of practice that predates yours (and mine). I’ve never been on a project that was successful using BDD-like practice (I worked on an automation system to do something like that at Borland, for about a year, before giving up on it because it too much work for too little return), but I have been on projects that worked just fine without it.]

    [James’ Reply: You seem to be on the verge of comparing your favored practice to alternative practices. But I’m not seeing that, yet.]
    What is the “favored practice” you mention? I didn’t mention a favored practice anywhere in my comment. I merely mentioned that I use this technique to help guide my teams. Maybe it’s the way I worded it that made it appear as though ATDD-BDD is “THE ONLY” way I approach the problem?

    [James’ Reply: When you mention the value of a practice, without discussing its cost or alternative practices, it does leave me with the impression that you favor it, yes.]

    [James’ Reply: Zero quality control? What does that mean?]
    Yeah, I deserve an electric shock for using buzzwords like that. Based on what I’ve read, “zero quality control” means focus on prevention of defects at the point of origin. Stop production when an a problem is found within the product or delivery process and fix it there. Don’t focus on late-phase critique or assessment. I’m guessing you’d already heard of this concept, knowing how well-read you seem to be. This is not something I APPLY on my work in binary fashion. I seek out techniques to help teams build products that are WORTHY of testing, deployment, further effort, further investment. Applying a mindset of “prevent defects at point of origin” seems to be helpful so far, but it certainly is not something I apply without questioning it intermittently.

    [James’ Reply: Of course it’s wonderful that people want to solve problems in better ways. But calling that “zero quality control?” That makes it sound like you want to abolish fire departments because you would rather that everyone just be really really careful around fire. Don’t call it zero quality control (or zero defects, which is its more common handle). Instead call it self-improvement. And by the way, no amount of self-improvement, in a complex world, eliminates the need for looking at what you’ve done and appraising it critically.]

  24. BDD vs. Testing is a very interesting topic.
    However, I propose a broader view : Specification by Example vs. Testing.
    It seems to me that adopting a ‘documentation-centric’ model, as Specification by Example (see Gojko Adjic’s book Specification by Example, ISBN 9781617290084) and not the ‘system-behavior-specification-centric’ model as BDD or the ‘acceptance-testing-centric’ model as ATDD, would be benefic for testing, because it brings into the context the ideea of writing specifications using examples + automating some acceptance and rejection checks using executable specifications + obtaining a living documentation.
    That means specs+docs+tests, under a single format, and a single document to be modified in case of a change request.
    This can be done using Cucumber and Gherkin, and does not exclude exploratory testing.
    What’s your opinion about this new view on the topic?

    [James’ Reply: I haven’t read that book, so I can’t say much about it. What I do know is that examples do not specify anything. An “example” is not even an example! The “exampleness” of a phenomenon depends entirely on how you interpret that phenomenon. Your mind is not a camera, and mental models are not merely images of the pictures that your eye took.

    Examples in combination with mental models (that is to say, previous understanding) or sufficient explanation (which also relies on previous understandings) can be wonderful. I think it’s a great idea to use examples. Use them liberally. But don’t say that examples *are* the spec. That’s not even true in the case where the examples are the ONLY use you will EVER make of the software.

    Consider steganography. A steganographic message is an example of a message, obviously. But without an explanation of that, no one who views that “example” will be aware that a message is even there. In other words, you can demonstrate a feature, your user can nod and say “yeah, I like that” and later on it turns out the user was looking at and talking about some completely different aspect of the example than you had intended to convey. So, examples are in no way specification. But examples may illustrate and support specification.]

  25. In reading your comments to people, James, you often come across as very condescending even while you extoll the benefits of effective communication. A few of your “better” ones:

    * [James’ Reply: Come on man, use your head– WE TALK TO EACH OTHER. We write, we draw, etc. I’ve been doing that for 28 years. Stop playing dumb!]

    * [James’ Reply: How about instead you stop hating talking to each other, develop your social skills, and engage people?]

    * [James’ Reply: I don’t need to give you articles or suggestions, other than, maybe, that I suggest you notice that before there was “BDD” there were people who talked together and wrote things down here and there. You may not believe it, but there is really is a history of practice that predates yours (and mine).]

    You also come across as vary dogmatic. You seem to dimiss other people’s experiences in many cases. (Again, I’m going by how you phrased your responses to comments.) I realize you have strong opinions and that’s great but your style of presenting them seems better suited for someone who wants to discourage communication. This post was my first and only viewpoint into your thinking and presentation style and while I probably agree with many of your points, I find myself really turned off by the way you communicate. I realize you probably don’t care about that one way or the other but I just wanted to give one person’s take on what I’ve seen in what is otherwise a very interesting post.

    [James’ Reply: Tatiana, I don’t know who you are. But you know who I am. Why is that? It started a long time ago, when I first decided that it’s important– at least for me– to be a defender and promoter of insight and excellence. It’s so important that I will risk annoying people and hurting their feelings. It’s like tough love. That’s what I dish out.

    I have been rewarded for being the way I am. I’m probably the single most famous tester in the world. Many thousands of people have seen my testing videos. You can Google me to verify this to your satisfaction.

    I appreciate that there are people who are warm and friendly in the world. We need people like that. I’m married to one. And I can be like that, too, occasionally. It’s just not my style.

    One of the reasons I’m tough is that I’m 45 years old. I’ve been in the industry for 29 years. I’ve done a lot of special research that no other testers except those in my circle seem to do. And basically I know what the fuck I’m doing. People who dismiss the things I say are only hurting themselves. That’s their choice.

    Having said all that… I think you have picked a few examples where I am responding to fools or trying to wake folks up. If you read other comments I’ve responded to, you’ll also find examples where I use a different tone. I am large. I contain multitudes.]

  26. “Tatiana, I don’t know who you are. But you know who I am. Why is that?”

    Actually, James, I had no idea who you were. I stumbled upon your blog by accident, while reading other blogs. I NOW know who you are.

    I appreciate the insight into yourself. It’s been very enlightening, to say the least. As you continue “to wake fools up”, I assume you realize that someone who disagrees with you is not necessarily a fool nor are they necessarily ready to “dismiss the things [you] say.” Perhaps they are just learning and they do not have twenty-nine years of experience in the field yet. Sometimes we have to learn through the “slow path” and not just take something that an “authority” — even the “single most famous tester in the world” — has to say.

    Forgive those of us who just aren’t quite up to your level yet. Hopefully we can all get to where you are and then we’ll all “know what the fuck [we’re] doing.”

    [James’ Reply: This has nothing to do with agreement or disagreement, or authority. I don’t teach from a position of authority. I told you those things I told you because you seemed to be of the opinion that my style is some sort of unfortunate aberration. It isn’t. It’s mostly intentional. I’m established in this craft.]

  27. I just came across this post in a google search. So bear with me, I haven’t read your entire site, just this one post.

    The problem you’re doing here, is you aren’t making a fair comparison. you have two scenarios, the BDD scenario you have locked down to ONE test. The human tester version you throw in a ton of tests.

    [James’ Reply: How is that not a fair comparison? It’s a genuine depiction of how much more human testers do.]

    If you want to make it fair you do this – you have planning break an epic story (of a working ATM machine) into small pieces of testing. you can’t have one test scenario cover everything. you simply cover the scope of the ATM with hundreds of test scenarios. Yes Hundreds.

    [James’ Reply: So, I guess you missed the point of this article, which is that no amount of Cucumbering equals testing. You are suggesting a huge amount of work in order to compare fairly to an ordinary human tester? That’s like saying it’s not fair to compare a program that prints canned text statements to a human speaker, because to compare fairly you’d have to write a much larger program that included every single possible utterance a human might make. Get real, man.

    I’m saying that Cucumber-type checks do not approach the power of testing. Can you dispute that without suggesting that I work day and night on goofy little “test cases?”]

    I prefer cucumber, but where I work we use GEB (same idea, but written in Groovy) – we have nearly 600 test cases that are automated and cover the complexity of our site. JUST like you would if you were making a test plan with manual testers.

    [James’ Reply: Sorry, no. Who told you that? You must have learned testing from an idiot. Competent human testers don’t create Cucumber-like test cases and they certainly don’t follow them.]

    In reading your previous comments, I see you feel BDD is worthless/crap, and that automation I guess you feel is impractical and that people since the stone age have done testing outside of that paradigm so why change it? At least that’s what I get from your replies.

    [James’ Reply: BDD is not necessarily a complete waste of time. I can imagine examples of it that might be worthwhile. But the examples I’ve seen are pretty lame, and all of them are promoted by people who seem not to know how to test.

    Well, like you say, you aren’t familiar with my work. So, you don’t yet know that I am constantly innovating. I regularly attack the dominant paradigm (which we call the Factory School) for clinging to 50 year-old ideas that didn’t work even 50 years ago. It seems to me that the paradigm you are suggesting is actually the old paradigm, just with more tooling. This is because you love tools, don’t you? You just really LOOOOOVE to program, right? Or it’s possible that you don’t and you are coming at this from the angle of being in awe of what magical powers programs have.

    Personally, I am a programmer. Early in my career I got very deep into automated test execution. I conceived and designed a Cucumber-like tool in 1993, including a DSL, but abandoned that approach because there was way too much maintenance involved. After that, I refocused on learning how to test. I feel good about what happened next, and I’ve been exhorting people to learn testing ever since.]

    What happens when you go to a place like Discovery Channel and they tell you “ok we are releasing now, we need the entire site regressed in 1 hour.” 1 hour? they’re time boxing QA? why? how the hell can the entire site be manually regressed in 1 hour? by human manual testing, it’s not possible.

    [James’ Reply: It’s also not possible by your method. The difference is I’m not pretending to be able to do it, and you are. As a tester, it is my responsibility to be clear and responsible with what I claim to do.

    I’m happy to use tools to help me check software. But what the tool is doing is only checking– NOT TESTING. In my community we distinguish between those activities in order to avoid the kind of confusion you are experiencing right now.

    Even though I design tooling to support human testing, I am careful to keep an eye out for the amount of work that takes and how that might distract me from testing. I would like you to do that, too. But first you’re going to have to learn what testing actually is. Apparently, you think testing is the same as checking.]

    Is their expectation wrong? maybe. Maybe they shouldn’t time box QA, but they have their reasons – more work to push down in a day and want releases throughout the week with less time on covering a full regression. so some automation has to occur to make continuous delivery a reality. is CI/CD not better then manual testing?

    [James’ Reply: Yes, it’s not better. To understand why, again, you will have to first learn how to test.]

    I’ve worked at Yahoo, where we had litterally hundreds (over 200 qa people at the time I worked there) manually testing. I’ve also worked at a shop where I was one of 3 QA people – churning out more code then my teams at Yahoo. so how the heck do we do that with considerably less people? Automate the regression, put the manual testing into new feature validation. Does it work? yeah, it’s working.

    [James’ Reply: I don’t know how you are evaluating that. But I’ve heard people say “it works” a lot of times, and then later find that they were wrong.]

    You are tossing around false data. You state in a reply, “If you think regression checks will help you, you can of course put them in place and it’s not necessarily expensive to do that (although if you automation through the GUI then you will discover that it IS expensive)”

    I actually lived through this… 90% of QA where I’m at, is automation of the GUI/front end. We HAVE to regress. you toss that out there like it’s a choice. in what world is regression a choice? 90% of bugs come from code committed to a trunk that inadvertently breaks something that previously worked. you MUST regress.

    [James’ Reply: I actually lived through it, too, kid. I’ve been doing this a long long time. You sound young and idealistic. I am covered in fucking scars and tattoos, dude. Your data, if it IS data, is specific to your situation, and not generally applicable. But, sure, I can imagine a context where regression bugs are common, especially is change control is a mess and leadership is lacking.

    I am not telling you to not use tools. I’m telling you that you’ve apparently chosen a terribly inefficient way to use them. If you are automating through a GUI, then you have to deal with a lot of maintenance that has nothing to do with bugs, right? If not, then you are doing it differently than everyone else, and you should be talking about that.

    People who invest a lot in automated checks hate to talk about how much it really costs. I understand why. I’ve been there and done that, too. Which is why I am trying to warn others about the quagmire that you embrace.]

    So we release our core web application code, every two weeks. Manual regression takes 4 days. 4 DAYS. That’s about 6 testers covering our 20 areas of regression in 5 browsers plus ipad. Not to mention mobile regression of our mobile application tha tmight be impacted by each release of our core code.

    The entire suite of 600 tests running through the UI takes about 8 hours. It’s slow, yes. But it’s 8 hours, as opposed to 4-5 days.

    [James’ Reply: Your tools are not doing the same thing as the people. Not even close. Unless your people are idiots, of course.]

    That’s my personal example… my data. yours may vary, but having worked where I’ve worked, this has typically been the case. 1 week of regression testing after each code freeze and between the release. We haven’t even gotten to human burn out. Testing the same thing in 5 browsers + ipad + mobile web.

    [James’ Reply: People who love testing, and know how to do it, don’t burn out.]

    Automation doesn’t save the world, but it certainly helps and BDD helps shape automation to a strong point.

    [James’ Reply: So, how did you get into this business? Were you a CS grad?]

  28. [Comment Redacted…]

    [James’s Reply: Listen, Kid, you seem to be having trouble comprehending my writing. But the problem may be all the preconceptions you are coming here with. You’re not a trained tester and from what you say you also aren’t a trained programmer. (For the record, there is no such thing as a “QA exploratory tester.” We just call ourselves testers.)

    You’ve left a 1441 word comment, which is too long for this forum, but shows you have passion for this subject. I doubt it will come to anything, but if you actually want to have a conversation, call me and I promise I will address every point you are making.

    My phone number is 360-376-2931. You can also contact me on Skype, which would work better for me. My user ID is satisfice.]

  29. Thanks,

    Great example. A big problem with such things as “test driven development” is that it’s called just that, “TEST driven”. But it seems more like an executable specification. Sometimes it makes you wander if TDD actually makes you drop all testing (and quality?). Well, apart from a free (and simple) regression test.

    A

  30. Hi James:

    Like the last several contributors, I googled to this blog post and have not read every entry.

    I have been thinking about “tests as requirements”, “specification by example”, BDD, and “behavior rules”.

    Two observations:

    1. A test spec is a derived requirement at a fine level of granularity.

    [James’ Reply: I don’t use that terminology. In my system a test is not a requirement; it cannot be a requirement. A test specification is also not a requirement, derived or otherwise. In my system a test is a performance, not an artifact.]

    Imagine the array of conditions that must be satisfied for mortgage approval. A test case that causes one of these conditions to be FALSE along with a result of “Ineligible” is a derived requirement and an instance of the underlying behavior rule. How many of these tests (finely-grained requirements) does a developer need in order to “understand the requirements”?

    [James’ Reply: If you want to talk about requirements, then do that. But that is not the same thing as testing.]

    Who will create the many boundary-value tests i.e. just ineligible and just eligible? I can’t imagine adequate development and testing without the underlying rule being documented and reviewed by the customer.

    [James’ Reply: Whether or not you can imagine it is not the issue. My post is about what testing is, not about documenting or discussing or clarifying requirements. Of course the process of testing, including test design, involves a lot of work with the meaning of requirements. Please don’t limit testing to that.]

    2. I understand requirements at multiple levels of granularity (fine, medium, course). Consider a course requirement to “determine vacation days”, perhaps specified as a user story. I propose the detailed (medium-grained) requirements be specified as behavior rules (rather than tests). For example, consider the following behavior rule table:

    Age = 12 18 37 45 52 60
    Service = 25 40
    Assign days? 27 22 24 24 27 27

    [James’ Reply: I’m not sure what the table means. The semantics of the table are not explained.]

    A set of rules can be complete (i.e. cover all conditions) and consistent (i.e. cover each condition only once), but still be incorrect (i.e. invalid) because of incorrect decision boundaries or incorrect functions.

    [James’ Reply: It can be incorrect because the wrong values are in it, too.]

    This table (and a data type glossary) can be used to generate a set of boundary value tests, such as the following:

    Case ID Age Service Vacation
    yrs. days?
    1.0 17 27
    2.1 18 6 22
    2.2 44 24 22
    3.1 37 25 24
    3.2 44 25 24
    4.1 45 1 24
    4.2 59 39 24
    5.1 52 40 27
    5.2 59 40 27
    6.1 60 27

    (James’ Note: I cannot seem to make WordPress format this table sensibly. It’s not Dave’s fault. It’s stupid WordPress.)

    These generated tests (finely-grained derived requirements) enable a customer review that may detect incorrect tests and demonstrate the need to fix the rule set, before any tests are executed.

    [James’ Reply: Those aren’t tests. And by the way, I don’t understand the table. If I don’t understand it, I doubt that many other people are going to have the patience to work through it, either.

    You say “enable.” I don’t think that’s an appropriate word. People are already enabled to discuss and review their ideas. They can talk, make pictures, write things down. Whatever. I guess what you are trying to say is that a tabular presentation may facilitate that review and discussion. Well sure. This has almost nothing at all to do with my post though!]

    This second review is necessary because a review of the (abstract) rules may fail to catch a defective rule, but an incorrect (concrete) test makes it easier to see the bug.

    [James’ Reply: As you know, Dave, no human action is necessary in the world of the abstract. Things become necessary only in specific contexts. It may be your preferred practice. I agree that practices like this are pretty interesting to me, too. But it’s not a test. You haven’t created a test with your table. You’ve reformatted a document.]

    These generated tests may not be the only ones needed (i.e. may not be sufficient), but I consider them necessary.

    Thoughts?

    [James’ Reply: My main thought is that it does terrible violence to the importance, depth, and breadth of testing to call a reformatted requirement a “test.”

    My secondary thought is that tables are helpful.]

  31. John here James.
    Coder for 30 years (assembler, c, cobol, java, .net mainly). Good few years in testing teams too.
    new to this blog.

    [James’ Reply: I’m a coder for 30 years, too. Started (professionally) with Assembler. Perl these days.]

    I’ve had the happy accident to repeat my career 2.5 times over. That is: solve the same business problems in different technologies and methodologies over time. The last few years I’ve been using BDD with CI… and life became a lot easier and enjoyable

    * Specificiation and verification in the same versioned doc …nice.
    * Exploritory tests added to the collection of behaviours…
    * Testers freed from the mundane to really give it a good workout
    * Business, dev and testers using a common language…

    Really what’s not to like?

    [James’ Reply: What’s not to like is the persistent confusion about what BDD is and is not; what it can be and can’t be, and how that relates to good testing. What’s not to like is when people confuse their naive intuitions and anecdotal memory for universal laws of nature. We need to fight that tendency together– help each other with it. And for that I need your help. Please be skeptical, instead of adoring and complacent.]

    Reading comments like…
    James says.. “I haven’t read that book but what I know is…”;
    James says “this BDD crap is that it’s overwrought and underpowered”;
    James says, ” I have not read it.. but”;
    James says “The concept of an automated spec is a dumb idea in itself”;
    James says; “and I can do all of that cheaply without cucumber”.

    To a new reader you seem a bit closed and defensive…

    [James’ Reply: You’re not only a new reader– you’re not a tester. You are not in my community, are you? Of course a physicist sounds defensive to a mystic when defending physics, but why should he care what the mystic thinks?

    I’m not saying you are a mystic. I AM saying that you may be reading a blog that comes from, and is aimed at, people who have a different education, background, skills and sensibility than yours. That limits how seriously I am going to take your opinion.

    What you need to do first is establish your credibility. This can be done in several ways. One way is to show that you have read the post that you are commenting on. Your comment is nearly unrelated to the point that I was making in my post, so you have not yet demonstrated that first bit.

    I am saying that BDD is pathetically simplistic compared to testing. Do you have a comment on that?]

    Have you read Matt Wynne’s book yet?

    I’d certainly value your opinion more if I had the idea that you had read the books and given it a good go. Me and my buddies have and all I can say It’s allowed us to save money and enhance our systems without any major catastrophe befalling us.

    [James’ Reply: I don’t care if you value my opinion. I’m not seeking to be respected in the community of people who don’t study testing or care about testing. I’m focused on how to understand the risks associated with technology. BDD is someone’s fever dream about that. I think you should read Introduction to General Systems Thinking and also Tacit and Explicit Knowledge.]

    I’m really pleased I didn’t read this blog before we started, otherwise we would have been put off and never achieved those benefits. Which were achieved through trial and error and thinking….

    [James’ Reply: What does that have to do with testing? How do you know any of that works?]

    Thanks Dan North, Matt Wynne et al.

    John

  32. Hi James,
    Great post and discussion although you do come off as somewhat brusque in your replies which almost prevented me from seeing the vale of what you were saying.

    [James’ Reply: My brusqueness is not arbitrary. It is part of the communication. It is an implicit message that I expect readers to apply themselves to understanding this material, rather than treating it as a inkblot into which to imagine any ugly or silly thing they happen to be obsessed with.]

    I was going to have a long ramble, but instead might asking if you would be so kind as to provide a starting point for research into automated tools to assist in *requirements testing* and the most brief overview of what you consider “tests” role in this? My apologies, I am not a test engineer but I am genuinely interested and you seem to imply that there are older, better tools than BDD which address this.

    [James’ Reply: The only tools I think I have ever used to assist with requirements testing is Excel and Word. I use them in the obvious ways. For instance, I break out sentences and sometimes the clauses of requirements into cells of a spreadsheet and then tag them in various ways. I look for omissions, contradictions, ambiguities, and testability. I also use the commenting feature of Word, of course. I often rewrite requirements as part of the testing process.

    Testers have a profoundly important role to play with requirements, not just in review, but in the performance of live experiments. I have written here.]

    To condense and address what I wanted to say in regards to this post:

    BDD is not a tool that belongs in your domain; it belongs in the domain of the project manager, the business analyst, the requirements engineer and the developer. The fact that in some projects this gets thrown at “test” is a failure of process itself. How often do you get given the legal contract for a project and asked to “test” that all clauses have been met? (although with 30 years experience it wouldn’t surprise me to hear “once or twice”).

    [James’ Reply: At Borland the testers were obliged to add review of the EULA to our test strategy after an unfortunate incident where the lawyers changed it in a way that infuriated our customers. Reviewing things is certainly an important part of a testers work. I agree it’s not common to “test” contracts, so I take your point.

    More to the point, BDD does cross into our domain, because our domain is the running of experiments to assess the truth of the product’s behavior. BDD certainly has value other than for testing purposes, but it is an expensive and brittle process compared to skilled testing. What I would argue is that there is a far greater value doing what BDD is trying to do via strong testing, and less automation.

    I get the distinct impression, when I have seen BDD demoe’d, that the real point is to seize upon any excuse to make a computer do “cool” things, rather than to do the “boring” thing like have a fucking conversation about what the fuck you want the product to do (see, that was a bit of brusqeness that reflects my impatience with the starry-eyed unjustified value some people place on BDD, which drains resources that could be used for other things, such as finding or preventing fucking bugs before it’s too late).]

  33. Hi James,

    I am just wondering if anyone has done some measurement on the performance (such as in term of cost and defect removal) of the whole development and testing process when we are using BDD versus not using BDD. Also would it favor if we are doing a lot of regression?
    I agree with the link posted here on “testing” versus “checking” by Michael Bolton. And merely want to find out if there are some study done to measure the actual impact of adding this practice to the whole development process versus of not having it. Thanks.

    Regards,
    Rommy

  34. Hi James,

    It is very encouraging to read this blog post. I couldn’t agree more with your views!

    I often find resistance from my leaders when I try to make a case for not adopting automation for the sake of automation.

    The way I look at BDD is that it is a thought process that a tester can adopt to come up with test cases or scenarios that will ensure that the application is doing what it is supposed to do without having to automate the testing process.

    [James’ Reply: A thought process? I don’t understand. What thought process are you talking about? Testing is a thought process. Are you just talking about testing?]

    Because there are scenarios beyond BDD that need checking and validating and you need good intellectual human beings who have a good understanding of testing.

    [James’ Reply: Doesn’t testing already do everything that BDD supposedly does?]

    I am going to make my team go through your blog and help me endorse the importance of interpersonal communication in testing and also use of excel and word. 🙂

  35. I’m afraid I see a lot of generalized comments in this thread.

    [James’ Reply: Of course. All comments discussing a method without discussing a particular case are generalized.]

    Testing and using different tools and/or ways of working is clearly context dependent in how much value it produces. So how can we make statements on whether Cucumber adds value (in comparison to other alternatives) or not without knowing the details of the context where it is applied.

    [James’ Reply: We can do that because we do know a lot about technology, projects, and testing. Our statements will not have perfect reliability, but perhaps have sufficient reliability for the general purpose of supporting my claim that testing is generally worth doing, Cucumbering is definitely not testing, Cucumbering is easily confused with testing, Cucumbering generally requires the development and maintenance of a bunch of glue code, there are inexpensive alternatives to Cucumbering, and therefore the general hoopla surrounding Cucumber-like tools is a symptom of a tool fetish rather than excitement about a good solution to a genuine problem.

    More to the point, though: The floor is open for anyone to contradict me with some sort of argument, even a generalized one. And if I think it is too generalized, I’ll say so. This is how normal technical discourse works.]

    Our world is not black or white and I believe Cucumber and BDD can add value in certain contexts, as I perceive it does in my context. That does not stop me from performing exploratory tests, driving end user acceptance tests campaigns, writing unit tests, reviewing code and discussing requirements. BDD implemented with Cucumber is yet another tool and as with all tools I’m responsible for using my brain to evaluate the value it can add in my context.

    [James’ Reply: Why bother saying that? You offer nothing to the conversation by doing so. Your argument is “I like it.” Implied in your argument, I suppose, is an implicit claim that you are not stupid. But I would like to see some evidence of that, please.]

  36. I think this is over-simplification. What matters is that the scenario is easily understandle for business, and clearer point of agreement/sign off. It is the code behind the innocent gherkin lines that will determine the quality of the test, not the gherkin language itself. Gherkin is not a hard and fast set of rules – the point is to build a relevant and reusable DSL.

    [James’ Reply: It’s such a simple post, and yet so many people, such as yourself, apparently aren’t understanding it. I don’t know what I should do to make it any clearer except maybe to repeat the point louder?

    It’s not an oversimplification, it’s a demonstration of fact: BDD is not testing. Testing is important, and the ridiculously simple-minded scripts in Gherkin don’t come close to it.

    The code you are referring to is often a terrible waste of time even to attempt to write, as my team discovered in 19-fucking-92 when we pioneered DSL at Borland. We abandoned that effort (although a nearby team kept going with it and patented it).]

  37. Bdd, we are going back to f…. 80’s again????? What a crap of semantic my god.

    [James’ Reply: What is it with you? Have you been drinking? How do you operate a keyboard in that state? Do I have to come to your home and read you my posts word-by-word, in a friendly voice and speaking slow, so that you won’t hallucinate or get a seizure? Do you know anything about testing in the 80’s? Or the 90’s? Or ever? Have you ever thought about what words like “testing” mean? Do you know that words have meanings? English is not your first language, so are you finding it difficult to work through the semantics of my sentences?

    I’m not even entirely sure whether you are supporting or complaining about my post, but if you have something to say, kid, how about making your case in some coherent way. Do you want to try again?]

  38. Hi James,

    I try to understand your position and respect it, I like to share my experience using TDD on my work, I used BDD on a one application with some medium complexity business logic, and not use a certification human process (not *testing* at whole word, sorry for that but my company not have a really test team), but a team for develop that application had a tester skill, the result is that at today, the application is working well without any ticket of problem since the application was released on production, and this application is the only that has used BDD in his elaboration (a more than one year ago), may be, thanks full a automatization tools for tester, developers and others, the people can do more, and that include fusion of skills or roles for building app.

    I think that the *test* should be included in each line of code, business rule or functional features, the tools like cucumber or rspec for example, they giving a hope for building the next fast platform for building software more accurate new generation of test.

  39. Hi James,

    We go way back to your days at Borland. It’s been awhile!

    [James’ Reply: Wow! Bob! I miss you! The testing world would be better if you were a more public part of it.]

    I just happened upon your conversation and have enjoyed the counterpoints. I just left a company that quickly removed QA (fortunately, I saw the writing on the wall and left beforehand) and replaced it with Cucumber/BDD. Before I left there were discussions on the feasibility of being able to test the product 100% via Cucumber automation and I was vehement on the futility of such approach. It’s impossible to test 100% of anything. The time wasted in coming up with simple scenarios and automating them digs into the time it takes to really understand how to break the application.

    [James’ Reply: When we were at Borland, my team (C++ debugger test team) experimented with Cucumber-like technology. It was an extension of what my team first created at Apple (a keyword-driven testing language compiler we called Tesla). We abandoned it because it was too complicated for too little return on investment. The Pascal team adopted the idea and actually patented it… not that the patent did them any good. Like certain drug use, the siren lure of over-complicated tools to automatically find bugs is seemingly immune to advice from old guys who’ve been there and done that.]

    My best bugs have never come out of automation, they always come out of much contemplation and an emerging understanding of how things fit together. It’s when you realize there is something missing or wrongly put together.

    That said, I do like some level of automation. Why? Because, let’s face it, computers do repetitive things faster than humans do. So the mundane activities can be put into those scenarios. So “smoke tests” are valuable. They say, “if that is broken we reject the build.” But, that’s about it.

    [James’ Reply: I am festooned with tools. No one rejects tool use! What we must guard against is obsessiveness.]

    BDD represents an evolution in product development. But it not “testing or QA.” Instead it is a way for product managers (stakeholders), development and even QA/Test to come together to question what would satisfy the stakeholder and users of the application. It also brings together these groups at the inception of the product something that, many times, never happens. It questions features and their requirements. So I like that part of it but it’s not exhaustive intellectually. There are many other ways to make all this happen.

    [James’ Reply: I think you and I both know that this communication can, does, and has happened, many times, without the need to communicate in a stilted Gherkin language and without the need to write thousands of lines of glue code.]

    A big downer for BDD and automation, in particular, is the amount of time and effort it takes to create and maintain the “checks.” Also, a downer is that the checks tend to be fragile both in conception and in implementation. How many times have you been in a situation where there was a change in a field or button on the page without any warning? Or where a change was made by some underlying functionality that slows down your checks and everything is failing?

    [James’ Reply: This is one reason we abandoned the approach at Borland.]

    So automation is not testing. It is just checking and checking, usually, at a very thin level. Although, I have seen negative scenarios in automation (and in Cucumber automation) it is VERY haphazard and mostly simple and tends take the “happy path.” So we should call it was it is, “expensive happy path scenario checking.” And, BDD does nothing to address these issues.

    [James’ Reply: Yep. Thanks for writing.]

  40. [Comment Redacted]

    Listen, man. If you are going to comment on my blog, please read the post that you are commenting on, first. Then you can respond to it. The comment you wrote was a long statement about what you think is great about BDD. It has nothing to do with my post, which makes the point that BDD is NOT TESTING.

    So, publish your BDD shit somewhere else. I’m not interested.

  41. Of course, I have not read not only this article and all the comments but as well other articles and i have not seen some speeches of you and recommended it to colleagues and i have not met you and of course i don’t own and love the context-driven approach book…

    [James’ Reply: Your “nots” may have gotten away from you there. I’m not sure what you are saying. Maybe you are saying you read the post. But I could detect little in your comment that was responsive to the point I was making, nor the reasoning on which I based that point.]

    However i have tried to summarize all the arguments and try to post my point of view about it and explain why i think it’s worth considering BDD as TESTING among other methods.

    [James’ Reply: BDD is a wasteful, narrow, impoverished approach to testing. If it is testing at all, that is only by accident. It could only have been conceived by people hostile to or woefully ignorant of testing.]

    I thought that you might be interested in constructive discussion. I didn’t know you were only looking for people that agree to your points and nod with their heads. This was a misunderstanding. I apologize. After all i could have known better as you are a buccaneer 🙂

    [James’ Reply: I am a buccaneer, so I don’t put up with bullshit. Agree with me or disagree, but don’t bullshit me or my readers. If you look at the comments, you see that I publish many comments that disagree with my point of view. The only time I don’t publish a comment is if, in my judgment, it has no value. Your comment was a glowing endorsement of BDD that ignored the points I made. You can publish that shit on your own blog, not mine.]

  42. What do you think about the (what appears to me) the uptake in BDD as an attempt to aid the process of developing quality software over the last few years – I’ve only been working in the industry for a couple of years and haven’t experienced testing outside of a BDD context.

    [James’ Reply: Generally speaking I think it’s a technocratic obsession promoted by people who like to write code instead of talking to each other.]

    To me BDD seems like an unnecessary overhead when it comes to testing (though I have seen it as a positive influence on communication between different facets of a product team, ie the analysts and the devs and the testers and so on). It adds an additional layer of complexity to writing automated checks, and I don’t feel that it necessarily adds much confidence in the SUT. If anything it’s a communication and documentation methodology, but even in that box it can’t live all alone.

    [James’ Reply: I agree.]

    The job I perform is considered to be ‘Test Automation’, I’m very skeptical of its value, probably something that has been compounded since reading your blog, at least in the way the company I work for commonly utilises it. I find automation to be unreliable when it’s done through a UI, especially those that are constantly changing, though it is reliable in other contexts. The overhead of constantly maintaining suites of regression tests seems counterproductive. There are points in the last year where I’ve spent weeks at a time trying to keep an automation suite up to date so that it is working with the latest version.

    [James’ Reply: Yes, that is not just a common experience, that is a universal experience. GUI level automation is famously expensive and fragile. I generally avoid it. It usually is a dumb idea.]

    Since I’ve been posted on a project as the sole ‘Automator’ I’ve changed my approach to test automation. I think going forward I’ll use much of the ‘test automation’ process to do the heavy lifting for the manual testers where possible, and provide them with data at the end to analyse (screenshots, logs, etc).

    [James’ Reply: You can’t automate testing, anyway. You can support the test process (which is always a human process) with tools.]

    One example is where the team were rewriting a legacy application to bring it up to modern standards. The fundamental spec for the app was that it must produce exactly the same output as the legacy app. This app writes millions of entries to a database each year, the people who originally commissioned and developed the application are long gone.

    The team wanted go about testing through writing BDD acceptance criteria, but how do you confidently do that when your endgame is to create an application that completely replicates the manipulations of another application? What I ended up doing was scrapping the BDD approach and writing a piece of software that would pump decades worth of data through both old and new applications, comparing both sets of values, logging the discrepancies, and then writing scripts that would parse those logs to summarise the issues found. The tester could then use the logs and the scripts to inform their understanding of the bugs found.

    [James’ Reply: That sounds good on its face. But again, doing that through the user interface might be challenging. Hopefully you an do this underneath the GUI level.]

    However it’s not always as clear to me as it was in this case on how to decide if something is worth automating, especially when it comes to UI’s. (Do you have any guidelines on this?)

    [James’ Reply: I run through lots of different ways to test in my mind, then choose the one that seems to have the best chance of finding important problems quickly while not spending much time or effort. I also keep an eye to the future and invest to some degree in tools and infrastructure that will lower our testing cost over time.]

    Proponents of BDD say that it allows you to think clearly about the problem and provides a common language for all stakeholders.

    [James’ Reply: That’s bullshit. It doesn’t “allow” you to “think clearly.” You already could think clearly. You already can talk things through. They think it makes the conversation better– I doubt that. But even if it does, it does so at a potentially outrageous cost.]

    I think that while this may perhaps possibly maybe be true in some cases, there is also a risk that it boxes your thought process and limits your approach to coming up with solutions.

    [James’ Reply: Exactly. That kinds of things you do with BDD are the kinds of things that BDD does easily. Other things you ignore. For instance, BDD tools are not conducive to long and looping processes. They are biased towards simplistic checks.]

    Well I took quite a tangent there, but there we are.

  43. I want to share an experience. Unfortunately this is what happened to the cucumber tests I wrote. QA team looked at the jenkins to see how many tests are green if their is a failure then they would file a task ticket for me to fix the test. And product owner, well s/he never looked in to cucumber test specification.
    I am yet to understand which communication gap was bridged using cucumber in this fashion.

  44. James, I understand when you say that you could achieve the design benefits of using cucumber in a cheaper manner, simply having team conversations and so on. But even though Cucumber will be fairly more work to accomplish similar goal you get the benefit of being able to re-run it anytime. Some examples are having setup a daily test suite run so you have a higher level of confidence that the system continues to work as it’s supposed to or even before releasing a production version of your product you can simply run your test suite and make sure it behaves as expected in different environments. Just mentioning a few benefits I consider important when considering spending the time.

    [James’ Reply: It’s not a test suite. It’s not testing. It’s a pile of automated fact checks that can be done more efficiently– assuming you really need such things at all– by using bespoke, targeted, data-driven scripts.

    Using Cucumber is like wearing a corset: a restriction of our freedom of movement and expression in support of a fashion trend not related to health or practical living. But some people find them to be a turn-on, I guess.]

  45. Absolutely agreeing with James, that no tool can replace the human tester.
    Working with cucumber for over a year, these are the advantages of it I found yet.
    1) The communication between stakeholders, management, developers and qa is better.
    2) it’s easier to involve QA people to “QA” the spec, the user stories and find missing or ambiguous points in them ,before a single line of code is even written by developer.
    3) It’s a great framework tool for UI automation. Used with Calabash, I am able to run 70% of regression suite with every build. And we can have 1-2 a day.
    So I only have to do smoke, and basic functional testing with every build. And it also frees up a lot of time for more exploratory testing.

    [James’ Reply: This doesn’t address my concerns and questions, though. I’ve had a lot of interaction and experience, much of it bitter, with this sort of thing. (My team created a DSL and supporting framework to run scripts in 1992. We abandoned it.) So when I hear this, I naturally wonder what your standards of testing are, and why it makes communication better to write stilted gherkin statements instead of more flexible and deep natural language statements, and why you are not telling me about the development and maintenance costs of the glue code that makes Cucumber actually do anything, and why you aren’t also concerned about the chilling effect that the Cucumber structure has on the freedom we normally have to script what we like, how we like, when we like.]

    • Everything is relative of course and for someone starting in automation like me, Cucumber+Calabash pairing looked like an easy way to jump into UI automation. Just learn some Ruby and that’s it. I just didn’t have experience with too many other tools out there. I am sure there are better alternatives for someone more experienced in coding.
      To be exact, for some super easy scenarios like verifying username/password fields,etc.. where if you do it everyday, with every build, a tester can get bored (we all humans) start assuming and skipping tests, or not noticing something, where a good written automation code will do this job every time 300-4000 tests, even if there are 3 builds day.

      As to management/developer communication, it’s personal experience in the company I work. iOS developers were really happy with User stories written in that format, strict and straight-forward, less questions, more clarity on app flow. Easier for QA to jump in on early stages, as before that reading of specs from project management with all the language they use to describe things, and understanding attached diagrams and screenshots and putting it all together took more time, that’s it.

      [James’ Reply: I see you are new to this.]

  46. What I like about BDD is that the “limited English” with specific verb/noun constructs provides some level of description of the desired behaviour. This verbiage is validated when running the “checks”. I really appreciate your differentiation between test and check. These “checks” provide the normal case exercise which then enables a manual tester the bandwidth and goal to perform destructive and “everything else testing you describe”. Now I know this can be done without BDD but in the absence of a free text “check” compiler I see value in BDD.

    • Reading through other responses you shared that you had generated a tool Cucumber like but gave up on in favor of focussing how to test.
      If you will please share some insight you have obtained. Thanks.

      [James’ Reply: Yes, twice. First, in 1988, the team I led (the Special Projects and Methods team within the Development Systems QA department at Apple Computer) created a “testing language” compiler called TESLA. We invented the language ourselves (it was a lot like TCL). We used this to create what is now called keyword-driven automation. Basically we could specify output check procedures as a series of lines in a spreadsheet whereby the first column of each line was a method word and the other columns were inputs to that method. This allowed us to write a lot of glue scripts that were separate from the “intent” part of the check.

      This was complicated but not so bad– because we were testing command-line tools. Even so, I could do the same thing, today, rather easily using Perl.

      Then I moved to Borland and encountered GUI testing for the first time in a big way. So we created what we called “model-based automation” in 1993, which involved creating a layer on top of the GUI automation tool that understood the semantics of the product under test. We could then write high level output checks (what most people call test scripts) that would be relatively immune to changes in the specifics of the GUI. We could automatically test with mouse-only or keyboard-only configurations without changing our scripts, too. Today what we did would be called “creating a domain-specific language” for our product.

      At the point where we had burned six months and written 50,000 lines of C++ code to get this going, and yet were still some months from doing anything serious with the framework, we abandoned the project. My counterpart test manager in the Pascal team then took the idea and ran with it, eventually getting a patent (US5475843). I still think it’s a bad idea. And the reason I pursued it was mostly that I was not yet comfortable in my understanding of how skilled testing works. People who don’t really get testing tend to want to turn it into a programming problem, instead.]

      • “People who don’t really get testing tend to want to turn it into a programming problem, instead”

        This I think summarises the problem completely.

  47. Hi James,

    I am a Tester and I am completely new to BDD. How is it related to testing?

    [James’ Reply: It is related to testing in that both BDD and testing serve a purpose of discovering problems in the product, and both can serve to help clarify the intent and design of the product.]

    When would it come into picture?

    [James’ Reply: I am not aware of a context where BDD is a good idea. But it may come into the picture because certain technocrats think that it’s cool, and have an ideological commitment to waterfall-type development– wherein we try to think of everything we are going to do before we do it, instead of producing something and studying it retrospectively]

    Is it come thing like writing the test cases very highly and doing the exploratory testing of the application.

    [James’ Reply: BDD is not about writing test cases (which is already a practice fraught with trouble) it’s about automating a series of output checks before the product is implemented.]

    Please let me know .

    Thanks,

  48. Hi James,

    I am a test manager. I found this blog entry after being asked to implement BDD/Cucumber my testing team. The request has come because other departments (development, management) have strong views on us being what they refer to as ‘cutting edge’ with our technologies. (Somebody at a conference heard a buzz word somewhere probably)

    I have politely humoured the request & explored the options. After looking at BDD from a testers perspective I became concerned about the granularity BDD provides for testing which prompted me to Google for discussions on the subject. I feel wholly uncomfortable about changing the current format we already use for defining specifications, for discussion between devs, ba’s and testers and for testers to perform system analysis and write their tests. The current format works.

    Your answers have made enjoyable reading and I wanted to thank you for protecting the sanctity of testing and what it is to be a tester.

Leave a Reply

Your email address will not be published. Required fields are marked *