Tuesday, September 29, 2009

Tragedy of the Doubtful Solution

The Background

I (re)tell a story frequently about a place I worked in the 90s. There was a piece of code with an absurd cyclomatic complexity score, running literally hundreds of lines in length, and being called from myriad places in the code base.

The code was written to check to see if two ranges were overlapping. Being written in a poor 4GL, it took four parameters representing the starting and stopping dates of two time ranges. Such a simple task for the code to be so horrible and lengthy. In C it would look rather like this:
 bool ranges_are_overlapping(int a, int b, int c, int d){
 }

It quickly became clear to the most casual reader that the ranges were a..b and c..d. Well, sort of. The code was defensive and was written with the understanding that the range-defining arguments could be somewhat unordered:
 if ( (a<b) && (c<d) ) {
    // blah blah
 }
 else if ( (a==b) && (c<d) ){
   // blah blah
 }
 else if ( (a>b) && (c<d) ){
    // blah blah
 }
 else if ( (a<b) && (c==d) ) {
    // blah blah
 }
 else if ( (a==b) && (c==d) ){
   // blah blah
 }
 else if ( (a>b) && (c==d) ){
    // blah blah
 }
 else if ( (a<b) && (c>d) ) {
    // blah blah
 }
 else if ( (a==b) && (c>d) ){
   // blah blah
 }
 else if ( (a>b) && (c>d) ){
    // blah blah
 }

Now, of course, even once we square away the range markers, there are a whole host of possibilities. A could be less than, equal to, or greater than C, and B could likewise be greater than, less than, or equal to D.
     if ( (a<c) && (b<d)) {
        // Lets call this "inner block"
        if ( b > c ) {
        }
        else if ( b == c ) {
        }
        else if ( b < c) {
        }
        // And on we go ....
     }
     else if ( (a<c) && (b==d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a<c) && (b>d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a==c) && (b<d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a==c) && (b==d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a==c) && (b>d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a>c) && (b<d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a>c) && (b==d)) {
       // cut/paste/edit "inner block" from above
     }
     else if ( (a>c) && (b>d)) {
       // cut/paste/edit "inner block" from above
     }

Of course, each permutation of the range start and end relationship would require a full repetition of the comparisons of the range starts and ends complete with cut-n-paste-n-edit inner blocks. We do not attempt to recreate the entire mess here.

I bet the programmer responsible for this routine wrote more lines of code that day than anyone else on the team. What he lacked in efficiency, he made up in diligence.

The code was correct in its results, but it was huge and slow and tedious to desk-check. It had no tests, automated or otherwise.

I replaced it with the rather ordinary and obvious solution:

 int left_start = min(a,b);
 int left_end = max(a,b);

 int right_start = min(c,d);
 int right_end = max(c,d);

 bool distinct = ( left_end < right_start) || (right_end < left_start);
 return !distinct;

Straighten out the begin/end so you don't need a Cartesian explosion of if statements, and then look at the ranges. When either range ends before the other starts, there is no overlap. Otherwise, you have overlap.

It is hardly a miracle of profound logic or deep mathematical insight. Nor of excessive typing and careful editing.

Young Tim actually had to work that out on paper. It worked great, was much smaller and simpler, and ran much faster than the ugly monstrosity that was there before.

The reason I'm writing this is not to express that I'm revolted by bad code or to brag that I replaced it with good code. It is not to ridicule diligent-yet-misguided junior programmers. The point of this blog is what happened next.

What Happened Next

I was called into the boss' office. Someone noticed my code improvement! He showed me the old code and the new, both printed out in neat stacks on his desk. I grinned and said "Yes, it's much faster now."

He didn't smile back. He frowned.

He told me that it was clear from the original code that the author had thought through all of the possible scenarios and had accounted for them. Mine, on the other hand, showed no such diligence. I clearly had not put any real effort into my work. My answer was too small. Even though they couldn't make it fail in testing (yet), he knew that I must have left something out. As a result of disbelief in my abilities, he was rolling back my change.

I was upset that a small working algorithm was about to be tossed out and a messy steaming pile of code put back in. I was more upset with the accusation that I did not think through the cases, and that (despite all evidence to the contrary!) I had written an insufficient piece of code.

I tried to defend my solution, even pulling out my scrap paper with all the overlap scenarios on it, but it was too late. The decision was made. I and my solution were obviously inferior. Chastised, I returned to work at my desk.

I began to lose interest in the company where I had previously intended to spend my entire career. In time I left and have had a good time of it.

The Payoff

Why blog about it in 2009? Because my TDD associates have blogged and tweeted about how, in TDD, the code becomes more generic as the tests become more specific.

If the Tim of 1994 had known about TDD, he would have built up a large and specific base of tests covering all the cases represented by the old code. His tests would have demonstrated that he'd thought through all the permutations, and the simpler solution might have been fielded. Tim would have been saved a moment of humiliation, and that poor application would have gotten a boost in performance.

TDD would have made the code better, and it would have improved the experience I had there. It would have given me visible evidence of my thinking. With a body of unit tests, there is proof that we've thought things through. An oblique, small solution cannot provide that on its own.

Young programmers: consider this advice. The more elegant solutions you devise will need a body of proof if you are to survive clue-challenged technical managers. If you don't do TDD for the sake of the code, do it for yourself.

5 comments:

  1. The tests would have fattened the printout too.

    ReplyDelete
  2. "What, you wasted time writing these tests instead of desk-checking and analyzing the code? We don't even have a place to put them. And it's not anything our testers can understand."

    ReplyDelete
  3. I need to read more about the "TDD" testing process.

    Do you have any good references?
    (Yes that is a rhetorical question, but a serious one.)

    ReplyDelete
  4. I don't like it when TDD is mixed with the general effects of automated tests.

    ReplyDelete
  5. When you are done test-driving, the tests are just automated tests. I don't think we need to be apologetic about that. Automated tests also are valuable.

    But the point is that I didn't test drive back then. If I had then I would have finished my algorithm and had all the tests in hand on that very day, and it would have taken me no longer than the time I spent drawing and working it out.

    Not only would I have had the algorithm and some useful automated tests, I would have had proof that I'd thought this thing through.

    In that case, what is not to like?

    ReplyDelete