Stephen Freeman Rotating Header Image

March 14th, 2008:

Programming, it's really about language

Yesterday, during the XpDay Sampler track at QCon, Keith Braithwaite presented the latest version of his talk on measuring the characteristics of Test-Driven code. Very briefly, many natural phenomena follow a power law distribution (read the slides for more explanation), in spoken language this is usually known as Zipf’s Law. Keith found that tracking the number of methods in a code base for each level of cyclomatic complexity looks like such a power law distribution where the code has comprehensive unit tests, and in practice all the conforming examples were written Test-First; trust a physicist to notice this. This matters because low-complexity methods contain many fewer mistakes.

Keith used jMock as his example of code at the “good” end of the scale (thanks Keith) and, as he was showing some examples of its implementation, it struck me that a great many of those small, low complexity methods were syntactic sugar, they were there to attach a meaningful name to a little piece of code. We put a great deal of emphasis in our coding style on readability, on teasing out concepts and expressing them directly in code and trying to minimize the accidental noise from the language; we don’t always succeed, but that’s what we’re trying to do.

Is this why our code conforms to Zipf’s Law, because we’re trying to think in terms of language and expression, rather than in terms of procedures? Hmmmm.

The other question about Keith’s discovery is that it doesn’t yet say anything about causality. The first conclusion one might come to is that Test-Driving code leads to power-law structure, but I’ve seen TDD code that definitely does not have that characteristic. An alternative explanation might be that the sort of people who write that sort of code were amongst the first to be drawn to TDD, and that maybe TDD encourages the trend if you’re already mostly there. I’m not sure what an appropriate experiment would be, perhaps mining some old code that the TDDers wrote before they learned the practice? There are just too many variables.