Monday, September 29, 2014

Programming Is Mostly Thinking

Pretend you have a really great programming day. 
You only have to attend a few meetings, have only a few off-topic conversations, don't get distracted or interrupted much, don't have to do a bunch of status or time reporting, and you put in a good six hours of serious programming [note: this RARELY happens in an 8-10 hour day]. 

I want to review your work in the morning, so I print out a diff of your day's work before going home. 

Sadly, overnight the version control system crashes and they have to recover from the previous day's backup. You have lost an entire day's work. 

If I give you the diff, how long will it take you to type the changes back into the code base and recover your six-hours' work?

Programming is 11/12ths Thinking

I've been touting this figure for some time now, and people keep asking me where the study is that produced such an odd number. 

Well, it's not pulled out of thin air and it's not the result of a thorough scientific study. 

I have done informal polls now for a few years, though I've not kept good records. My goal was not to become the scientist who cracks the statistical/mathematical code for programming activities. I was looking for a reasonable answer to a reasonable question.

However, this answer surprised me. I really expected a larger typing component.

Software Factories

I have seen the stickers and slogans on stickers and social media for a long time that "typing is not the bottleneck" (though every once in a while the inability of some programmers to type is a bottleneck).

I am keenly aware that most management still subscribes to the idea that motion is work. They are fairly convinced that a lack of motion is a lack of work. That makes sense in a lawn care service, a factory assembly line, or a warehouse operation.

Nearly all of the visible work done in producing physical goods is motion. People roll steel, stamp, press, mill, pick and place, bolt/screw/rivet, and on. 

Modern factories produce goods with Computer Numerical Control machines, which produce perfect copies of an original model that may not even exist in real life. These machines work from abstract models -- just data, really -- and perform perfect motion. Humans tend the machines, rather than working the wood by hand.

I have some great guitars that were produced at affordable costs because of the degree of automation brought by such machines. 

Great boutique guitars are produced entirely by hand at higher cost and I don't put down that effort either. The world has room for both.

Software developers have perfected the factory. It runs flawlessly bit-perfect copies. You just click the "copy" or "download" button. It's so cheap that the purchasers happily cover the costs of the factory. Those who are cautious will double check the checksums that come with the download, but most people don't bother. The machines are reliable and efficient and quick and cheap. 

Once the initial model (really, just data) exists, then the marginal cost of all the bit-perfect copies is essentially zero. Yes, this is just copying and not creating, but that's what factories do. Custom shops might produce unique items (like guitars) but factories create copies of originals.

The software factory tends to give you a progress bar, so you can visualize the motion of bits, but in many ways you can say that the product doesn't really exist. It's a pattern of tiny charged v. uncharged areas of metal on a plate (well, probably) and you don't even pay for the plate or the magnet or the laser when you create the copy. It's already there.

Software is an intellectual good.

The Design Shop

In my years of working with Uncle Bob Martin, I heard him continually tell customers and students that software development is not a fabrication operation, but a design operation. Once the initial design is done, all the duplication is done by machines at nearly zero cost.

So what programmers and testers and POs and Scrum Masters and software management area all doing (if they're doing it right) is designing the data model that will later be used by the factory to create copies for use by customers, patrons, and other people in the community the software is intended to serve. 

Yet the mechanistic, Industrial-Age idea of software development as a factory persists, and developers dutifully try to make it look like they're doing physical labor at the detriment of the process. 

All intellectual activities are hard to observe and monitor. An idea that is 80% complete has no physical manifestation. It's an idea, and it's not done yet. Sometimes we have experiments or proof-of-concept code or notes, but they don't give an accurate "% complete" number as does physical work.

A chair being manufactured looks about 50% done at the 50% mark.  When it's done, it looks done.

A design for a chair may not exist on paper until it is more than 70% complete. And we don't know that it's really 70% done, because it's not finished being designed yet. 

The Answer: Really?

I have asked this question at conventions, client companies, to my peers, to colleagues, and to strangers I have met for the first time when I find out they are programmers.

The answer I receive most often is "about a half hour."

I could use the 8-hour day, ignoring meetings and interruptions and status reports, but that feels like padding the answer. I stick to the six hours doing things that programmers identify as programming work.

There are twelve half-hours in six hours. One half-hour to retype all the changes made in six hours of hard programming work. 

What in the world can that mean? How can it be so little? 

The Meaning Behind the Answer

Right now I suspect a bunch of managers are going to go yell at their programmers for putting in a half-hour's work in an 8 hour day, but that would be a horrible misunderstanding of what was actually happening.

What is really happening? 
  • Programmers were typing on-and-off all days. That 30 minutes is to recreate the net result of all the work they did, un-did, re-did through the day. It is not all the work they did.
  • Programmers are avoiding defects as best they can. In order to do that, they have to be continuously evaluating the code as they write it, hypothesizing the kinds of defects or security vulnerabilities they might be introducing. After all, they receive their harshest criticism for introducing defects into the shared code base. 
  • Programming is a kind of lossy compression. The code only says what the program must do when it is running. Why it chose one particular way over others, how it influences the rest of the system, what errors were introduce and removed, and what pitfalls it avoids are not (generally)present in the text of the program.
  • Must of the work is not in making the change, but in deciding how to make the change. Deciding requires us to understand the code that already exists. This is especially time-consuming where code is messy or the design is not very obvious in the source code. 

Six hours of intellectual work (reading, researching, deciding, confirming, validating, verifying) translates to about 30 minutes worth of net change to a code base. 

Or at least it feels like it does. No company has been willing to delete a whole day's work to prove or disprove this experiment yet.  

Programmers will gladly explain that the work they did was reading, learning, understanding, sometimes guessing, researching, debugging, testing, compiling, running, hypothesizing and disproving their ideas of what the code should look like. In short, they were thinking and deciding.
Most of what goes on is intellectual work. 

I have examined a lot of the change logs (diffs). It has consistently looked like 30+/-10 minutes of change on a good day (at least to me). 

I'm confident enough to tout this number as effectively true.

Often people who do more typing or more cut/paste are doing less thinking and understanding, which results in more errors and more burden on other programmers to understand and correct their code. 

If programming is 1/12th motion and 11/12ths thinking, then we shouldn't push people to be typing 11/12ths of the time. We should instead provide the materials, environment, and processes necessary to ensure that the thinking we do is of high quality. 

Doing otherwise is optimizing the system for the wrong effect entirely.

What if we changed our tactics, and intentionally built systems for thinking together about software and making decisions easier to make? I think that productivity lies in this direction.

Otherwise, we could replace our programmers with typists and go at least 20 x faster than we do now.

Tuesday, September 9, 2014

Dave Coplin Reimagines The Office

Understand your office situation better w/RSA Animate & David Coplin

Raised many interesting points. I still see value in being able to pair and mob, and would like to have heard more talk about that, but I think his idea about being in control of how you work is important.


Friday, September 5, 2014

Christopher Avery and The Responsibility Process (vid)

Here is Christopher Avery shows us a mental model that will help us to become more responsible.

I found it concise and helpful. I hope you may also.

Tuesday, September 2, 2014

Getting Through To Each Other

Communication is a very human process.

A quick model

Every being has its own mental model of a domain 

Connected to it is a hearing/understanding apparatus. When you tell me the sky is beautiful, my mental model suggests it is a nice shade of blue and had some light, interesting clouds. But it could be that we don't share a model, and you meant really intense lightning and fast-moving thunderheads. 

Provided that there are not too many great disconnects, though, what you tell me may provide information that I can add to my mental model. 

It's relatively easy to learn things that add more information and causal connections to an existing model.

So recognize there is a difference between what I hear, and then what I understand.  It is sometimes said that "memory is the residue of thought" so my memory of our conversation may not be my memory of the sounds and words used, but of my interpretation of the sentences as they occurred. 

Also connected are the meaning/saying side of things. From within my model is a thought I want to share with you. To communicate that I have to pick the words that I use to say it. Each of those words is a citizen of my mental model (not necessarily yours). 

So just going from what I think, to what I mean, to what I say, to what you hear, to what you understand and remember is an amazing act of human thinking, openness, and imagination. 

If our models are not too disconnected, then you can apprehend a reasonably close interpretation of the same thought I'm expressing, and you can add it to your model.

In fact, the text and drawing above just did this. You now understand what I'm talking about.

What if we're disconnected?

What if I tell you things that contradict your mental model?  It is uncomfortable. It unsettles our ability to communicate and share. Do we need to start all over? Did we never understand each other?

What are the options? 

  • Perhaps I'm being ironic. 
  • Maybe I'm wrong. 
  • Maybe I'm lying to you. 
  • Maybe I have different meanings for the words I'm using. 
  • Maybe there is something wrong with at least one of our mental models. 

It's unsure, complicated, and fairly deep.  We feel frustration, and confusion. Some people would break off the conversation. This is why a mentor of mine once told me "never tell anyone something that they cannot hear." He advised instead, bridging the gap first -- and taking the time to do that well.

We get cognitive dissonance if we assume neither of is ill-intentioned. It's easier to write the other off as a liar or kook or moron. But those are the easy, value-less way out of the bind. Better that we try to understand the other person's mental model.

Most people are mis-trained to shy away from upset and frustration that would possibly help them understand another person's model. They likewise avoid the opportunities to improve or correct their own mental models. 

Add to that the old Left Brain Interpreter, whose job it is to interpret our memories of ourselves in the most flattering and heroic terms possible. It tells us we're right and that it's wrong of people to make us feel confused or upset.

I suspect this is one of the reasons that programmers tend to plateau and quit learning after their fifth year of experience; they have enough of a model built up, and believe strongly enough in it, that they are able to easily reject anything that doesn't clearly agree with what they already know.

Conversations need to focus compassionately on the differences we have in our terminology or mental model. 

It's hard to be human, and it's not something that we can do alone.

What I'm learning from this is as follows:

  1. Every skull is a cultural boundary.
  2. Frustration exists to help me; I should learn from it instead of avoiding it
  3. Using words in a straight-forward way is a kindness
  4. Using pictures will get us past many of our terminology issues
  5. Recognizing a mismatch in terminology v. model is hard, and important
  6. Avery's Responsibility Process is a helpful mental model to help me adjust my terminology and mental model.
  7. We all live far beneath the endowments given to us mentally, and walk past dozens of lessons every day that could enrich our lives.
If you find the model flawed, or incomprehensible, or not particularly useful please join in conversation with me here. I'm happy to explain, rephrase, or even rebuild the whole model if it helps us communicate more clearly and freely. After all, you have a lot of information I could use. 

Tuesday, August 26, 2014

Why Your Code Has So Much Duplication

Here is a quite nice example code block for C#'s Dataflow blocks, specifically showing usage of the WriteOnceBlock:

ActionBlock writeToConsole1 = new ActionBlock( integer => Console.WriteLine( "Console 1: " + integer ) );
// true if the source should unlink from the target after successfully propagating a single message
// otherwise, false to remain connected even after a single message has been propagated
bool unlinkAfterOne = false;
WriteOnceBlock writeOnceBlock1 = new WriteOnceBlock( integer => integer );
writeOnceBlock1.LinkTo( writeToConsole1, unlinkAfterOne );
// prints 12 via Console 1:
// Console 1: 12
writeOnceBlock1.Post( 12 );
// create 4 additional Targets
ActionBlock writeToConsole2 = new ActionBlock( integer => Console.WriteLine( "Console 2: " + integer ) );
ActionBlock writeToConsole3 = new ActionBlock( integer => Console.WriteLine( "Console 3: " + integer ) );
ActionBlock writeToConsole4 = new ActionBlock( integer => Console.WriteLine( "Console 4: " + integer ) );
ActionBlock writeToConsole5 = new ActionBlock( integer => Console.WriteLine( "Console 5: " + integer ) );
// link those Targets to WriteOnceBlock which already holds a value
writeOnceBlock1.LinkTo( writeToConsole2, unlinkAfterOne );
writeOnceBlock1.LinkTo( writeToConsole3, unlinkAfterOne );
writeOnceBlock1.LinkTo( writeToConsole4, unlinkAfterOne );
writeOnceBlock1.LinkTo( writeToConsole5, unlinkAfterOne );

This is example code, and so there was no reason to make it particularly clean and beautiful and obvious via method and variable extraction (though look at how the author aliased the non-obvious false value as the unlinkAfterOne variable). This code illustrates how a lot of real world code is written.

I think the example code is just fine. I wouldn't change it. Examples have different requirements and goals than real code. They're meant to take up little space on a page and to be complete enough that you can follow them without investing much, and so that the function can be illustrated. I'm not picking on this author. He did a good job.

It is when real-world code follows this pattern that we need to rethink organization. Check out the signal to noise ratio in the code. Some lines have very little unique content, mostly duplicating the line above or below them.

For example code, it's really great. For REAL code, however, it's got a real problem.

What if we rearranged the code to be by subject rather than by step? It would look like this:

const bool unlinkAfterOne = false;

ActionBlock writeToConsole1 = new ActionBlock( integer => Console.WriteLine( "Console 1: " + integer ) );
WriteOnceBlock writeOnceBlock = new WriteOnceBlock( integer => integer );
writeOnceBlock.LinkTo( writeToConsole1, unlinkAfterOne );
writeOnceBlock.Post( 12 );
// create and link additional Target 2
ActionBlock writeToConsole2 = new ActionBlock( integer => Console.WriteLine( "Console 2: " + integer ) );
writeOnceBlock.LinkTo( writeToConsole2, unlinkAfterOne );

// create and link additional Target 3
ActionBlock writeToConsole3 = new ActionBlock( integer => Console.WriteLine( "Console 3: " + integer ) );
writeOnceBlock.LinkTo( writeToConsole3, unlinkAfterOne );

// create and link additional Target 4
ActionBlock writeToConsole4 = new ActionBlock( integer => Console.WriteLine( "Console 4: " + integer ) );
writeOnceBlock.LinkTo( writeToConsole4, unlinkAfterOne );

// create and link additional Target 5
ActionBlock writeToConsole5 = new ActionBlock( integer => Console.WriteLine( "Console 5: " + integer ) );
writeOnceBlock.LinkTo( writeToConsole5, unlinkAfterOne );

This is doing the same thing, but in a slightly different order.

Organizing by subject lets us see that entire swaths of code are duplicated, where organizing by step made it look like duplication was slight and very local -- one line at a time had a small change that would complicate duplication removal.

Now we can see that the code in Target2 differs from target3 in only two ways: the variable name, and the text.  The difference in the two paragraphs is one digit, repeated three times.  There is a lot of code space eaten up for such a tiny bit of unique content.

Some will note that in the new example, we don't create all the variables at the top of the block, and that bothers them because it's harder to find where variables are declared.  Bear with me, because that problem will start to disappear in a moment.

Now can you see the signal:noise issue?

We can simplify this by extracting one function. This makes us extract the unlinkAfterOne to a class field, but that's not a big deal (I think).

public void WriteToConsole(string id, WriteOnceBlock parent) 
ActionBlock block = new ActionBlock( integer => Console.WriteLine( "Console " + id + ": " + integer ) );
parent.LinkTo( block, unlinkAfterOne );

Now the naming is weak, but let's ignore that for a second...

ActionBlock writeToConsole1 = new ActionBlock( integer => Console.WriteLine( "Console 1: " + integer ) );

WriteOnceBlock writeOnceBlock1 = new WriteOnceBlock( integer => integer );
writeOnceBlock1.LinkTo( writeToConsole1, unlinkAfterOne );
writeOnceBlock1.Post( 12 );
var writeToConsole2 = WriteToConsole("2",writeOnceBlock);
var writeToConsole3 = WriteToConsole("3",writeOnceBlock);
var writeToConsole4 = WriteToConsole("4",writeOnceBlock);
var writeToConsole5 = WriteToConsole("5",writeOnceBlock);

The code is smaller already, and has a better signal:noise ratio. But if we look we see that console 1 is really the same kind of code as the others listed below. It's the same kind of thing, though it wasn't inline with the duplication we saw earlier. Let's fix that.

var writeOnceBlock = new WriteOnceBlock( integer => integer );
var writeToConsole1 = WriteToConsole("1",writeOnceBlock);
writeOnceBlock1.Post( 12 );
var writeToConsole2 = WriteToConsole("2",writeOnceBlock);
var writeToConsole3 = WriteToConsole("3",writeOnceBlock);
var writeToConsole4 = WriteToConsole("4",writeOnceBlock);
var writeToConsole5 = WriteToConsole("5",writeOnceBlock);

Now, we have used var and an extracted method to reduce the duplication, increasing the signal:noise ratio considerably.  Locally, we have a lot less code to deal with.

Overall, we need fewer comments now and we have less "fiddly bits." The code has fewer ways to be broken. There is still a lot of duplication in names. We might create a builder . But right now, it's so simple that we don't need comments. All we need is to see the helper function once and we grok this code.

Provided I need all of these consoles for some purpose other than demonstration, we could compress this further this way:

var consoles = new List>();

var writeOnceBlock = new WriteOnceBlock( integer => integer );
writeOnceBlock1.Post( 12 ); 

foreach(var id in new[]{"1","2","3","4","5"}) {

This code can be extended more simply than before, and is down to about 5 statements (depending on how you squint. Now that it's so compressed, we could inline the WriteToConsole and not lose readability, but I actually find it more confusing when the loop body is complicated by creating the node and linking it inside the loop.

The real question, though, is not merely code size. If you were to modify the first loop to add a 6th, how would you do it? By copy-paste duplication? What if you were to modify the latest code -- would you use duplication?

The problem with having duplication is that it increases the urge to create duplication. Code that is small and tight might sometimes require some rework, but it is more likely to be managed without copy-paste.

The reason code has so much duplication is that duplicate breeds duplication. Clean code has advantages because the easiest way to maintain it is not the worst possible way to maintain it.

You may not agree that this is better.  If so, join me in the comments.