Wednesday, July 8, 2009

The U Controversy

It is a tempest in a teakettle, to be sure, but among the points of friction and discovery today was one about the use of the letter 'u' as a variable name. The case in point was a nice, small function, not much larger or more complex than this stupid example:

public void Whatever() {
User u = new User();
if (u.hasSomeAttribute()) {
u.setSomeValue();
}
}

The focal point is the letter U as a variable name. Because of my involvement in a particular book project, I was summoned into the conversation and carried half of it for some time.

I would rather see "user" than "u" because of my rules about pronounceable, grep-able names that don't require any mental mapping. That is my preference. Therefore, I can clearly NOT choose the variable name "u."

I also have publicly stood by James Grenning's assertion that the length of a name should have some correlation with its scope. A name that serves as a loop counter is inappropriate as a parameter name, function name, property name, class name, or module name. In that case, the variable clearly should not have been named "user."

Now, the team has chosen fully-spelled English word names, so I can clearly not choose to name the variable "u", however there is no additional content or usage exposed by using the longer name so I can clearly not choose to name the variable "user."

The name 'u' does not actually violate the "pronounceable names" rule since it can be pronounced "ewww", so I can clearly not choose to call it "user". Yet if I pronounce it as spelled, it is "uh" which makes me sound like an idiot so I can clearly choose not to call it "u."

The programmer in question is a bright, well-educated, professional man with a strong mathematical and scientific background, well accustomed to effortlessly dealing with single-letter variables in larger and more complex contexts, so he may (without harm) clearly choose not to call the variable "user."

The past misuse of short and cryptic variable names have left a very foul taste in the mouths of our programmers, who have spent many hours trying to figure out what xtmp and z and t and c stand for when they are stumbled upon in the bowels of some deep function, so to distance ourselves from the perpetrators of these bad names we can clearly choose not to call the variable "u."

Ultimately, the conversation needed to fall upon two relevant points. One is that the name 'u' is perfectly acceptable in the limited context in which it appears, and the other is that the team has chosen to use fully-spelled English names.

That set of facts would leave us with a single, clear path. The variable should be named 'user' in compliance with the desires of the team, with the option of revisiting the rules to allow shorter names in limited context at the next retrospective. But along the way, the programmer should also be given some validation that the name is really not problematic in this context in any other way.

Programming in its current agile form, is as much a social discipline as a technical discipline. While there is no reason to surrender one's mind and taste at the door it is reasonable that we recognize the will of the team and try to work within its boundaries.

I am not condoning an oppressive environment or coercive control through peer pressure, but rather that we join a team on its terms. When I have come to a team on my own terms, I have not been as valuable a member as I intended to be. I have learned that it makes good business sense to choose ones battles instead of fighting them all, and if possible to win our battles through technical merit instead of force of will. A reasonable amount of sensitivity goes a long way.

Monday, June 29, 2009

Code Perturbation and Extensive Branch Mods

Balance these facts:
  1. Refactoring is a good thing, but is also perturbation.
  2. Gratuitous perturbation is a bad thing.
Refactoring is good because confusing, stupid, repetitive, complicated code is the devil. We can't even pretend it is not the devil, because we catch bad code sneaking away with a slice of our souls from time to time. Bad code is bad. Refactoring makes bad code better. Some perturbation is a very good thing.

Gratuitous perturbation of the code base is a bad thing. If I make a million changes to a million places, then diffing (and therefore merging) are going to be a slice of hell. This is a different devil, but still a devil.

The problem with perturbation is that it makes it hard to maintain branches. Branched development is a good thing sometimes, providing some isolation for a very short period of time, and some ability to compose and recompose a release. Branching becomes odious, however, when the code in a branch differs greatly from the code in a shared codeline (often trunk). Merging becomes difficult, manual, and fraught with error.

In a branch, you want to perturb less, and refactor new areas of code. Your merges will work. Alternatively, you want to refactor the trunk first and then merge it to your branch, so at least they're both similar code bases.

Gratuitous perterbation would be reformatting the source base, renaming globals, and the like. In the branch, you don't really want to do that. In the shared trunk, you might want to do that. It depends on how the people in branches will handle it. By the way, you probably DO want to do big things like reformatting all the code and eliminating all globals. You just have to do it when not much else is going on, or else you want to coordinate with people working in branches.

Given some routine:
pubic int doSomething() {
// 15 lines of old code
// 12 lines of new code goes here
// 30 lines of old code
// 2 lines of new code goes here
}

61 is a horribly unhealthy line count. Clean functions are under a dozen lines, and closer to 6. I'll bet you those 14 lines of new code don't line up perfectly under the name and intent of the function, nor do they have the same level of abstraction (except in this made-up case, where they "do something").

You can put those new lines inline, as shown, but that is a mess. You might not be able to sleep at night or ingest foods afterward, but you are physically capable. If you can add code inline without becoming ill, then you probably should retune your sense of smell.

You can refactor. I'm betting doSomething will break down into a hierarchy of smaller functions that have meaningful names. But if you refactor this in a branch, then the shared codeline merges will be a big pain. If you want your merges to be reasonably easy, you'll have to either do the refactoring to both code lines or do them in the shared line and merge them down to the trunk. It's double-work now, but it avoids harder work later and who is to say it will happen only twice? It's an insurance premium. It could be wasted, but it could pay off bigtime.

The other option is to add only two lines of code to the existing, ugly function. Those two lines would be to functions, which contain all the rest of the new stuff. This causes minimal perturbation. The diff shows two lines inserted in the function, and two new functions being added beneath. Further merges will be pretty easy to deal with. Once the branch moves to trunk, then refactoring in trunk might be more reasonable.

Here then is the moral of our tale, and the motto to live by:
Don't be caught with extensive changes in a branch.

Thursday, June 25, 2009

Anti-IF Campaign


I have joined Anti-IF Campaign

Friday, June 19, 2009

SVN fail most sighted

The svn fail I see most is like this:


<<<< working
# some line
====
>>>>> other


Now, how is it a conflict that I added a line of code and trunk didn't? I will freely admit that the diffing stuff is very smart and not very easy. I want to cut a lot of slack, and I'm happy that this is easy to resolve, but I really have to wonder how that is not an update rather than a conflict.

Preferences On Code Style

Please help me read your code. I know you don't owe me anything, and you can run your code even if it doesn't pass the Agile Otter Sniff Test. I appreciate all of that. But I think that you and I can both do a better job if we're just up-front about things.

I find little speed bumps in most code, and it breaks my fragile concentration . Maybe writing on index cards has made me parsimonious, but now I believe that less is more. I can read your code better if there is less of it, and it's more obvious.
  1. A function should not have many effects on the code. Don't code things into the same function just because they happen at nearly the same time.
  2. You do not have to shoehorn your new code into an existing class. Clear a space for it.
  3. Extract classes when it makes sense to do so.
  4. Use less horizontal space. Long lines and lines with long blank leaders cause my eyes to cross and make me scroll my windows to see if there's something I want to read. This is more important if I'm using an IDE, because I'll have tiled views to the left and right.
  5. Use less vertical space. Don't double-space everything. Don't add meaningless blank lines. All you're doing is making me scroll more. Don't put a space between a comment and the line of code it is explaining. This is more important in an IDE because toolbars and other window tiles take up the top and bottom. Sometimes I'm stuck in a 60x12 space trying to read 120 x 240 functions.
  6. Your functions do not need a blank line after the opening brace and before the closing brace. Get value for your vertical whitespace, as if it were costly.
  7. Stop flowerboxing all your comments. I used to like that, but now the signal-to-noise ratio makes me nuts. A one-line comment should be just one line long.
  8. Do not make needless comments. If the code says what it does, the comment doesn't have to. Face it, non-programmers are NOT going to read your code. Needless or redundant comments are annoying and distract me from the code. I delete them without asking for permission, so expect to lose them.
  9. Consider removing the big banners telling me that the default constructor I'm looking at is a default constructor. I'm simple-minded and particular, not stupid as a rock.
  10. Pay attention to naming. When namespace names and variable names and filenames don't stand apart crisply, I forget which is which. I'm simple-minded, so make the distinctions clear and meaningful. In particular, don't use two names which vary by one phoneme.
  11. The same class names in two different namespaces is confusing. You can't prevent it all the time, but you can try to make sure that when it happens it is meaningful.
  12. Don't be afraid to extract methods, introduce variables, etc to make the code more obvious. Obvious counts.
  13. Don't make me have to remember what functions were called prior to this function call. Relying on other calls to initialize fields to certain values will just tick me off. I can only hold a little context in my head at once. When I'm looking to make a change, I don't even want to hold all of your class' context in my head. I have other things in mind.
  14. You commented out several paragraphs of code, and now I have to skip over all that crap to read the live code. If you need to keep it around, use version control.
Those things help me. How can I make my code more pleasant for you?

Not Fitting In An Iteration

I was in denial for quite along time. I thought that there were really no tasks that couldn't be broken down and implemented in phases. I'm in a change now that is trying my ideals.

Of course, this is a cross-cutting concern that deals with a big "ility." In particular it deals with scalability but I don't want to provide a bunch of detail that will distract us from the point.

The code is legacy in both the MFeathers meaning "without unit tests" an in the sense of "handed down from one generation to another, unsuspecting one." The new generation has done some excellent work getting huge tracts of land cleared and fenced with TDD and AT and what-have-you. Really, quite the transformation. The original designers had a philosophy and working style that did not survive the transformation (we think for the better) so there are architectural/design decisions being unmade on a regular basis.

In particular, there is this giant jellyfish of a design decision that's gotten in the way. It has long, long, long tentacles that extend far into the depths of layers of code, across the type system in funny ways, and into the realm of architectural concerns. When it's fixed, it will make the system better in many ways, and will clear the way to a whole host of other improvements. In short, it may be the coolest subproject in the whole company.

The jellyfish represents the munging of two or three separate concerns in one mechanism. It is a facility that was so amazingly handy that developers used it whenever they could. Remember that one man's fuzzy boundaries are another man's flexible solution. Now the concerns have to be split and the mechanism changed.
... one man's fuzzy boundaries are another man's flexible solution ...

We've managed to dredge up one stinging tentacle after the other, but there are still several more. In the course of doing so, we've had to make a branch (a short-term fork, really) and we spend a pretty significant amount of time merging code from the trunk.

I was commenting to a pair partner (Hi, Nick) the other day that we should have worked out a way to get this thing out in iteration-sized buckets. As soon as I said it I realized that we would have, had we known that the finished result was going to look like it does now/so-far.

This is not the first jellyfish I've met while swimming in legacy waters. In another company, Ed worked on a problem for an entire year and yet there were unexpected avenues in data access that still complicated the process. Not because Ed wasn't thorough and smart, but things can get out of hand politically and technically. Politics complicated the technical work, and there was little fun to go around.

I'm trying to recover and determine how we could have made these changes in smaller steps, staying in a nice, green, running trunk with the rest of the team. I just can't see how we could have done it without knowing the many things we learned through refactoring and exploring and periodic cul-de-sacs in the code. It was bigger than any of our heads.

So what is the point?
  • Is the uncertainty the problem, and could we have killed it first?
  • An opportunity for links & advice from my small, but wise, readership.
  • The merge I'm waiting on is sucking all my CPU and enthusiasm, and I had to do something.

svn and patches don't mix

Patch files are no fun. Look at this little bit from the SVN Red Bean Book
In this particular example, there really isn't much difference. But svn merge has special abilities that surpass the patch program. The file format used by patch is quite limited; it's able to tweak file contents only. There's no way to represent changes to trees, such as the addition, removal, or renaming of files and directories. Nor can the patch program notice changes to properties. If Sally's change had, say, added a new directory, the output of svn diff wouldn't have mentioned it at all. svn diff outputs only the limited patch format, so there are some ideas it simply can't express.

I only mention this because I've had a recent hiney bite in the combination of patch and svn.