Stephen Freeman Rotating Header Image

Responding to Brian Marick

Brian’s been paying us the compliment of taking our book seriously and working through our extended example, translating it to Ruby.

He has a point of contention in that he’s doubtful about the value of our end-to-end tests. To be more precise, he’s doubtful about the value of our automated end-to-end tests, a view shared by J.B.Rainsberger, and Arlo Belshee and Jim Shore. That’s a pretty serious group. I think the answer, as always, is “it depends”.

There are real advantages to writing automated end-to-end tests. As Nat pointed out in an extended message to the mailing list for the book,

Most significantly to me, however, is the difference between “testing” end-to-end or through the GUI and “test-driving”. A lot of people who are evangelical about TDD for coding do not use end-to-end tests for driving design at the system scale. I have found that writing tests gives useful design feedback, no matter what the scale.

For example, during Arlo and Jim’s session, I was struck by how many of the “failure stories” described situations where the acceptance tests were actually doing their job: revealing problems (such as deployment difficulties) that needed to be fixed.

Automating an end-to-end test helps me think more carefully about what exactly I care about in the next feature. Automating tests for many features encourages me to work out a language to describe them, which clarifies how I describe the system and makes new features easier to test.

And then there’s scale. Pretty much anything will work for a small system (although Alan Shalloway has a story about how even a quick demonstrator project can get out of hand). For larger systems, things get complicated, people come and go, and the team isn’t quite as confident as it needs to be about where things are connected. Perhaps these are symptoms of weaknesses in the team culture, but it seems wasteful to me to take the design experience we gained while writing the features not encode it somewhere.

Of course this comes at a price. Effective end-to-end tests take skill, experience, and (most important) commitment. Not every system I’ve seen has been programmed by people who are as rigorous as Nat about making the test code expressive or allowing testing to drive the design. Worse, a large collection of badly written end-to-end tests (a pattern I’ve seen a few times) is a huge drag on development. Is that price worth paying? It (ahem) depends, and part of the skill is in finding the right places to test.

So, let me turn Brian’s final question around. What would it take to make automated end-to-end tests less scary?


  1. […] This post was mentioned on Twitter by SteveF , Chris Matts. Chris Matts said: RT @sf105: Finally responded to @marick on automated end-to-end tests […]

  2. James Carr says:

    A reduction in fragility and time consumption. 🙂

    Several teams I’ve been on at a past employer had several automated end-to-end tests written in fitnesse. Although they were expressively written and did execute an end-to-end test, they often took forever to run and quite often touched on too many different systems.

    Of course the problems can be fixed, it’s just been my experience that too much focus on end-to-end tests as “the holy grail” can result in disastrous results … I like to keep these kind of tests as few as possibly, focusing more on the single system or individual components and connecting them together.

  3. Social comments and analytics for this post…

    This post was mentioned on Twitter by sf105: Finally responded to @marick on automated end-to-end tests

  4. Lisa Crispin says:

    I’ve certainly been influenced by Jim, Arlo, Brian and others to be careful when automating end-to-end tests. However, I’ve worked on four teams in the past ten years where we got excellent ROI from automated end-to-end smoke tests. The keys are:
    1. Whole team solves the automation problems
    2. Whole team experiments with and chooses appropriate tools
    3. Continuously revisiting and refactoring the tests, improving the design

    We’ve certainly gone too far one way or the other – too many automated tests that got expensive to maintain, and holes where we decided not to automate edge cases, and got bitten by a regression failure getting out to production. Overall, though, our “pyramid” approach to testing – pushing tests down as far as they will go, but feeling confident that serious regressions are detected immediately – has worked.

  5. Steve, please let me clarify a part of my position. I doubt the value of end-to-end tests specifically for finding basic correctness problems and because I don’t feel they put enough positive pressure on my design. As a result, I use end-to-end tests mainly to find spooky-action kinds of problems as well as to check how our system adheres to various constraints, like execution time, reliability when scaling, and so on.

    I find teams write too many end-to-end tests, attempting to make them exhaustive. I find this problem so often that advising teams to stop writing end-to-end tests has become near universal, at least in my little universe. This means that I often forget to make that assumption clear. I need to do better in order for people to understand me more the way I intend.

    That said, I would have written fewer end-to-end tests than you did in test-driving your example. Probably.

  6. I think it takes effort to find the right balance for “outer” tests–usually through trial and error. I’ve seen a failure mode which is to write a whole load of outer tests and never revisit them (except to turn them off). And I think teams forget to let difficulties in testing guide them to better system design decisions.

    As for the book, the trouble with the example is that, long as it is, it’s still too short to be realistic. As the system scales up, we might well back off /some/ of the end-to-end testing in favour of lower level testing, considering our risks carefully as we go (at least, that’s what we say in public :).

    Interestingly, there’s always been a counter-opinion which is that teams should only write functional tests since the unit tests get in the way (see Cedric Buest, Andrew Binstock, and Goeff and Emily Baches, for example). I don’t like that either, since I think it misses the internal design feedback.

    Plenty of scope for discussion, then.

  7. Arlo Belshee says:

    Returning to this many years later, I continue to revert to type. As in, I’m a developer so I agree with the tester.

    I really like “Tell, don’t Ask” style. I find the outside-in approach a good way to avoid primitive obsession and discover those missing whole values. And I think that automating too much of this is a source of common failure in teams I work with.

    My big kick of last year was to really think about the learn vs. prove distinction. Basically, there are 2 reasons to test: either I want to prove / validate that X does Y in circumstance Z, or I wonder what will happen if I poke X in way Y and want to see what will happen. Validation is the realm of automated “tests” run by devs and open-ended discovery / learning is the realm of manual exploration by, well, everyone.

    Many teams fail by trying to use “proof” techniques when they really should be learning. Fitness to purpose is my canonical example of this. If I am trying to describe what the full system does, then I don’t usually care about that. I care that the system does “whatever the user really wants it to do so that she can get her job done.”

    At acceptance / integration level, I care about 3 things:
    * No surprises.
    * No frustration. (don’t make me do more than I think I should)
    * Accomplish my goal. (which is larger than your stupid software)

    Sorry devs: this cannot be automated. It changes all the time. In fact, any solution which gives me these will change my definition of these such that it no longer delivers on them. (Fortunately, that change in requirements will instantly impact my competitors as well, and they don’t know they’ve changed.)

    So what I really need a the system level is an oracle for human surprise, frustration, and joy of accomplishment. There is only one such oracle: a human. The whole point is to learn about people. We need people to do that.

    Correctness, while interesting, can be fully proven at a local scope. I argue (based on my own code experience) that one can design systems without emergent behavior. No spooky action at a distance. Accomplishing that requires continuous design. It requires refactoring. But with that, then TDA or pure FP are very easy ways to drive extreme cohesion. With extreme cohesion comes easy testing. A reinforcing cycle in the human system results.

Leave a Reply

Your email address will not be published. Required fields are marked *