Monday, June 29, 2009

Code Perturbation and Extensive Branch Mods

Balance these facts:
  1. Refactoring is a good thing, but is also perturbation.
  2. Gratuitous perturbation is a bad thing.
Refactoring is good because confusing, stupid, repetitive, complicated code is the devil. We can't even pretend it is not the devil, because we catch bad code sneaking away with a slice of our souls from time to time. Bad code is bad. Refactoring makes bad code better. Some perturbation is a very good thing.

Gratuitous perturbation of the code base is a bad thing. If I make a million changes to a million places, then diffing (and therefore merging) are going to be a slice of hell. This is a different devil, but still a devil.

The problem with perturbation is that it makes it hard to maintain branches. Branched development is a good thing sometimes, providing some isolation for a very short period of time, and some ability to compose and recompose a release. Branching becomes odious, however, when the code in a branch differs greatly from the code in a shared codeline (often trunk). Merging becomes difficult, manual, and fraught with error.

In a branch, you want to perturb less, and refactor new areas of code. Your merges will work. Alternatively, you want to refactor the trunk first and then merge it to your branch, so at least they're both similar code bases.

Gratuitous perterbation would be reformatting the source base, renaming globals, and the like. In the branch, you don't really want to do that. In the shared trunk, you might want to do that. It depends on how the people in branches will handle it. By the way, you probably DO want to do big things like reformatting all the code and eliminating all globals. You just have to do it when not much else is going on, or else you want to coordinate with people working in branches.

Given some routine:
pubic int doSomething() {
// 15 lines of old code
// 12 lines of new code goes here
// 30 lines of old code
// 2 lines of new code goes here

61 is a horribly unhealthy line count. Clean functions are under a dozen lines, and closer to 6. I'll bet you those 14 lines of new code don't line up perfectly under the name and intent of the function, nor do they have the same level of abstraction (except in this made-up case, where they "do something").

You can put those new lines inline, as shown, but that is a mess. You might not be able to sleep at night or ingest foods afterward, but you are physically capable. If you can add code inline without becoming ill, then you probably should retune your sense of smell.

You can refactor. I'm betting doSomething will break down into a hierarchy of smaller functions that have meaningful names. But if you refactor this in a branch, then the shared codeline merges will be a big pain. If you want your merges to be reasonably easy, you'll have to either do the refactoring to both code lines or do them in the shared line and merge them down to the trunk. It's double-work now, but it avoids harder work later and who is to say it will happen only twice? It's an insurance premium. It could be wasted, but it could pay off bigtime.

The other option is to add only two lines of code to the existing, ugly function. Those two lines would be to functions, which contain all the rest of the new stuff. This causes minimal perturbation. The diff shows two lines inserted in the function, and two new functions being added beneath. Further merges will be pretty easy to deal with. Once the branch moves to trunk, then refactoring in trunk might be more reasonable.

Here then is the moral of our tale, and the motto to live by:
Don't be caught with extensive changes in a branch.