Stephen Freeman Rotating Header Image

Programming, it's really about language

Yesterday, during the XpDay Sampler track at QCon, Keith Braithwaite presented the latest version of his talk on measuring the characteristics of Test-Driven code. Very briefly, many natural phenomena follow a power law distribution (read the slides for more explanation), in spoken language this is usually known as Zipf’s Law. Keith found that tracking the number of methods in a code base for each level of cyclomatic complexity looks like such a power law distribution where the code has comprehensive unit tests, and in practice all the conforming examples were written Test-First; trust a physicist to notice this. This matters because low-complexity methods contain many fewer mistakes.

Keith used jMock as his example of code at the “good” end of the scale (thanks Keith) and, as he was showing some examples of its implementation, it struck me that a great many of those small, low complexity methods were syntactic sugar, they were there to attach a meaningful name to a little piece of code. We put a great deal of emphasis in our coding style on readability, on teasing out concepts and expressing them directly in code and trying to minimize the accidental noise from the language; we don’t always succeed, but that’s what we’re trying to do.

Is this why our code conforms to Zipf’s Law, because we’re trying to think in terms of language and expression, rather than in terms of procedures? Hmmmm.


The other question about Keith’s discovery is that it doesn’t yet say anything about causality. The first conclusion one might come to is that Test-Driving code leads to power-law structure, but I’ve seen TDD code that definitely does not have that characteristic. An alternative explanation might be that the sort of people who write that sort of code were amongst the first to be drawn to TDD, and that maybe TDD encourages the trend if you’re already mostly there. I’m not sure what an appropriate experiment would be, perhaps mining some old code that the TDDers wrote before they learned the practice? There are just too many variables.

4 Comments

  1. Causality is the hard bit. It’s notoriously easy to find a power-law looking distribution where there is actually something else (log-normal, for instance). The only way to be sure is to figure out the underlying dynamics, then crank some Monte Carlo simulations. Tricky stuff to do with rich human behaviors like TDD…

  2. Anonymous says:

    TDD at QCon…

    Keith Braithwaite’s latest version of his talk about measuring the characteristics of TDD code….

  3. David Harvey says:

    So … you start with TDD, and end with a DSL…

  4. Maybe. We might have overdone the DSL thing, but even with a more “conventional” API, I think the same principles apply.

    As you know, the conventional Smalltalk style is to have lots of little methods so that everything is deferred. It’s typically Californian, there is no “there” there.

Leave a Reply

Your email address will not be published. Required fields are marked *