Thursday, July 5, 2018

The Scatter-Gather Method of Software Development

There is a certain theory of agile development which is built on factory thinking, using division and management of labor to accomplish large projects.

The Plan


You have a big job to do?

Okay.
Scatter:

  • Divide the project into pieces
  • Resource the project (acquire the right number of "teams")
  • Assign pieces of the project to "teams" 
  • In the "teams", a lead character will divide the pieces into smaller and smaller jobs
  • The small jobs will be assigned to individuals who work for the lead

Gather:

  • Individuals complete their small jobs
  • The small jobs are integrated/combined/summed by the lead
  • When small jobs are done, the lead delivers the finished work of the "team"
  • "Teams" who completed the work are released
  • The project office then combines the parts
  • The integrated, completed project is delivered

Please note that there is an inverse pattern here: "Teams" are gathered during the scatter part of the process, and scattered during the gather part. But let's not focus on that right away.

Why do you put quotes on "team"?

A team is a group of people aligned on common goals, who work together to achieve those goals. Such a team works together every day, and each member works for the best team outcome. That is a team without quotes.

There is another use of the word, though. In many organizations, all the workers are individual contributors who are encouraged to get their own work done. 

They might occasionally offer advice or be a "second pair of eyes" (cf Rubber Ducking) for another person, but the only real connection between them is that they have the same boss. 

"Teams" (quotes included here) are an org chart conceit. 

I find that scatter-gather style work requires exquisite planning and estimation, and for one person to be tending to another person's assigned work (remember it's assigned to individuals) puts the individual's (aggressive) schedule at risk. 

Each person absolutely must complete their own piece of the work on time, else the whole schedule cascades into failure.

Quite often, the manager/lead is posted to act as a "blocking troop" to punish any person who falls behind. 

Often leads are chosen for their ability to operate as an individual contributor. 

Sometimes leads are chosen for their exquisite planning skill. 

Occasionally, a lead becomes a lead because they have a penchant for working as blocking troops. 

In every case, it is a token of respect to be "raised" to the level of a team lead. Different organizations and sub-organizations just respect and promote different qualities.

Integration

In theory, when all the individuals finish their parts, then those can be integrated and tested, and the parts passed up the tree for integration and testing at a higher level, and then finally the organization will resolve all the integration problems and mismatches and release the software.

Ideally, there will be no integration problems because of the upper-level project team's exquisitely detailed design and planning. People are often promoted to this level because of past success and because of their talent and passion for planning. The goal is to have everything worked out up-front so that integration problems do not happen.

But they do happen. They almost always happen. They're happening right now.

There are always some misunderstandings, some kind of design drift, some platform issue that had to be worked around and of course some of the small jobs took longer than anyone expected. Some late parts had to be left out (temporarily) or revised during development (thus damaging the validity of the grand plan).

How do scatter/gather organizations resolve those problems? Generally speaking, the people at the project level don't have the technical skills to resolve these problems.

Luckily, even though they tend to run as project organizations (a convenient but prevalent organizational conceit) "teams" tend to be composed of full-time employees working long-term on projects that tend to be related to a single code base. Therefore, the members from the original "team" may still be present, and may still be on the same "team."

Integration problems are addressed, then, by temporarily borrowing-back the members from the original "team" who can diagnose and resolve issues.  Once the problems are resolved and the system is sufficiently tested, then the original "project" (really a product increment) can be "released" to customers.

Remember that integration problems are treated as exceptions.

That integration problems always happen in scatter-gather is only because the planning and detail were not exquisite enough to avoid the failure, and scatter-gather is reliant on predictability and exquisite scheduling. Without sufficient up-front planning, the project will not come in on time with full scope completed.

And of course a lot of projects don't come in on time with full scope (which suggests that a more agile approach is appropriate), but the scatter-gather system is so reasonable on the face of it that many people think it the only system that could possibly work.  Sometimes people say "at scale" or "at enterprise scale." The goal of "scale" in this sense is to produce a product using several hundred or thousand programmers simultaneously and keeping them all busy.

Some organizations have made inevitable late integration problems part of their standard process by adding hardening sprints. "If you can't fix it," we are told, "feature it."

Busy Enough 

One of the problems with working at this scale is that programmers are not cheap. Even the cheap ones are not cheap. There is a constant "burn rate." An hour of time costs the same whether the teams are productive or idle. And it's a VERY LARGE NUMBER.

If you are paying, and it's a VERY LARGE NUMBER per hour, then it's important that all the people ("resources") are kept productive at all times. This isn't snark, or hyperbole, or foolishness. It's a VERY LARGE NUMBER and you need to be sure that you're getting value for the cost of that hour.

So when teams are released from one project, it's keen to have them starting another ASAP. This is a tricky bit of "scheduling tetris", but many managers above the project manager level have significant skill at scheduling. Sometimes, to make the scheduling work, they have to shuffle teams in or out of a project where the team "members" have no familiarity with the code base or domain, but we know that given time a developer (QA included) can develop skills in any code base or domain. It's a matter of good communication and management, which the "leads" should be able to handle or they shouldn't be leads.

When I first began to manage teams, it was in a scatter/gather organization. I was told that the first rule of management is to always make sure all the team members have enough work to do.

I was given the VERY LARGE NUMBER which is the hourly burn rate, and reminded that an idle team costs the company many many digits of loss per minute.

Oddly, I have no idea how much value or income the team's product produced per hour, day, month or year or what the ratio of cost:value was for the project. I was instructed to focus on the costs and get the maximum quantity of work possible to justify the burn rate.

I've heard this First Law repeated many times. It is a corporate mania that the profitability of the organization will plummet at a rate of $VERYLARGE/HOUR if employees are not busy enough. It could destroy the company (and certainly would, if the burn rate exceeded the rate of income for even a relatively short period of time; days, weeks, months. It's inevitable).

This First Law is one of the problems with the hardening sprint. The staff in a hardening sprint end up waiting for something to be broken. Idle resources are expensive, and people must be kept busy, so hardening sprint staff tend to be part of some other scatter/gather project and likely on the critical path.

In other words, the hardening sprint is usually another organizational conceit.

Some Flaws 

Since everyone must be busy enough, by the time an integration problem is visible (at the end of the "gather" process) most of the people who worked on the code have moved to other projects in order to satisfy scheduling Tetris.

Also, as a result of "schedule Tetris" the project code is written partly by people who didn't know the code base, or the domain, or the development stack. Because they needed to be productive and to stay busy, they did the best they could at the time. Naturally, there are some defects, integration faults, inconsistency, and schedule slips.

Errors are the natural result of putting unfamiliar hands in complicated code.

The people who can fix the problems are the people familiar with the system as a whole, and those familiar with the code that was written. Of course, they're not on the "project" anymore, and may not even be familiar with the code they wrote anymore. They've moved on.

To get the problem fixed, some of them will need to be pulled out of their new, exquisitely planned and scheduled project. They will be brought back for some time (period unknown) to the suffering complete-yet-non-working-so-unreleased project.

This puts the new project at risk, which runs the risk of cascading schedule failure.

Often the risk to the project schedule is such that more teams are brought to bear (also often unskilled in the development technology, unfamiliar with the code, and new to the domain) to try to make up for lost time. These teams are likely to introduce similar failures in the new project, adding to the integration risk.

And the increased integration risk pulls more resources into the hardening sprint, putting another project at risk of cascading schedule failure, and a combination of scope reduction and staff increase is brought to bear again.

Either a point of equilibrium is reached, projects are canceled to relieve the pressure, or some other form of hilarity ensues.

Faith

Projects are late, schedules have cascading failures, "team" members are unhappy because they can't do their best work rushing around in the unfamiliar code, everyone is frustrated that "teams" can't avoid making errors and "team" members don't help each other to learn to work better.

Managers are frustrated with descoping work to try to get projects back on schedule, with the frustration of trying to keep up on the status of each "team" member, with never knowing if the system will be delivered on time, with the nagging sense that the team is missing something or the plan is wrong in some way that will cascade into failures.

But still. Still. Even in the face of years of struggle and repeated near-catastrophes and occasional failures, they are convinced that the system is entirely workable and the problem is either:
  1. they couldn't plan and design well enough under the circumstances
  2. circumstances arose which shouldn't have, and couldn't have been predicted
  3. the plan wasn't followed well enough this time and next time should be better
  4. our people just aren't good enough
Scatter/gather seems to be the only system that could conceivably work (at this scale). 

It's a tremendous act of faith, and the organization doubles-down on faith in the face of repeated failures. Scatter/gather must be made to work. We just have to do it better.

Alternatives



Of course, much of the world has developed mature practices which use continuous integration to surface problems early, collaboration to resolve problems quickly, and using evolutionary design so that the product is runnable on day 1 and every other day.

A lot of people don't even use projects to build product increments. They add features one-at-a-time or one-"epic"-at-a-time. They build release candidates every day throughout the project. Some of them actually go live with their product increments several (to hundreds) of times a day.

I know it sounds like heresy against the One True Way, but it's still so.

I know it doesn't sound realistic. How in the world could you do this in a scatter/gather organization? Without screwing up the schedule Tetris?

And to be fair, you couldn't.

You'd have to give up the faith. You would have to give up your core identity -- you wouldn't be a scatter/gather-based scheduling-Tetris organization.

You might even have to admit that the old system that you spent (VERY_LARGE_NUMBER/hour X for N years) on is not very workable.

There are often stiff penalties for being the first one to speak heresy against the Current Corporate Way. Worst than if you'd violated the First Law. Worse than if you failed to deliver. You could actually lose your position in the hierarchy, which is the Primary Topic in a large organization.

Still, a lot of companies work alternative ways and have for decades.

Keeping the Faith

But if you have great collective faith in scatter/gather as the only possible answer, alternatives sound ludicrous and couldn't possibly work in the real world.  Anyone who thinks that they could build a working system at scale without a scatter/gather plan must have lost touch with reality -- lost the faith -- possibly their whole way of working is a peyote-inspired fantasy.

Denial is the first step in avoiding responsibility, and humanity has found a lot of success in denying facts for decades in order to maintain their conceits about how the world works. I can recommend this strategy. It keeps you from having to change right now, and if you discredit heretics then you might not have to hear from them again.

I recommend it if and only if your goal is to continue, undisturbed, in your current beliefs.

And you have supporters in the industry, so that's nice.

The true believer in scatter/gather projects, schedule Tetris, and other organizational conceits can find people in the world who are happy to sell you agile frameworks entirely compatible with your way of working.

They will also gladly rent you consultants to put the processes into place and certify you to be practitioners of the True New Way of scaling.

They'll even give you the prestigious "Agile" title to adorn your press releases and recruiting posters.

But maybe, just maybe,  if you've suffered enough for your faith and find yourself filled with doubts, you might consider some of the alternatives.