Friday, October 23, 2009

Factors Driving Naming

As I lay awake in bed after about 4 hours of sleep, it suddenly dawned on me that we've not given enough thought to why we need good names and when we need good names. My blanket statement of position is strong enough: code that is hard/slow to understand is hard/slow to change reliably. When we use better (not just longer) names, we find our code can be easier for others to understand.

We can vet our naming system by pair programming. If we and our partners cannot devise a way to make the code more readable, then it is probably readable enough for now. By using TDD we can create executable specifications that further explain why the code might have an odd turn or surprising detail. Further, when a complex problem has a simple and generic solution, the tests show that we've covered all the bases (or that we haven't). Between the clear and obvious tests and the clear and obvious code we can largely eliminate the need for comments and frequent vertical line breaks and flowerboxes. Code + Test can be clear enough.

There seem to be three factors that drive the need for clarity in naming.
  • Distance from declaration. We approximate this with "scope". An iteration variable in a list comprehension is created and used all in one expression, and need not carry the context that would be needed for a class name from a distant package/namespace or (God forbid!) a global boolean variable. We need to contextualize things that will be used far from their point of declaration. Things used near declaration/initialization need not carry as much context via naming.

  • Number of names that are "in play" in a given routine will drive the need to create clear distinctions between the objects they name. In a method like Math.Min(int x, int y) there is not much we need to know about x and y. They are just two ints, and we want to know which is lesser. But if we have a function that is manipulating 11 variables in 4 lines then we start to have a problem with variable density. People with particularly strong math skills don't feel the need as much because they learned earlier to deal with extremely economical notation, but that knack is more a rite of passage than a readability ideal to be propagated. In a name-crowded space, it is simply harder to differentiate one thing from another and people may type 'r' meaning 'k'. As my colleague Vadim points out, it is more important here to have an easy visual distinction between names (something overly long names may hurt more than help).

  • Infrequency of use drives to longer names. The less a name is used in any given context, the more it must describe its own purpose. Conversely, a name that is used repeatedly in many contexts becomes familiar to developers, and having a long name merely makes it tedious to read and easier to confuse with similar long names.
I have long approximated these issues with a rule (from James Grenning, I think) that the length of a name should be in direct proportion to its scope. My colleague Vadim has challenged the simple rule but I was not really ready to think past it until a recent amicable disagreement with Bob Martin on naming, followed by a serendipitous period of activity-free nocturnal wakefulness.

Now I think I'm seeing naming as trade-offs between these forces. As always, I am interested in counterpoint and comment.