Draft: The Faros Whiplash and The Systems View

The Faros report on AI-assisted development has been rattling around in my head for a few days.

The story it tells is a strange one:

  • Teams are producing more code. 
  • More tasks are being completed. 
  • More pull requests are being created. 

And yet :

  • waiting times are up, 
  • review times are up, 
  • lead times are up, 
  • incidents are up, and
  •  bugs are up. 

The picture shows a development process that's getting busier without getting faster.

My first reaction was the same as everyone else's. Maybe the AI-generated code just isn't very good. The report certainly contains evidence that quality is suffering. But the more I looked at the numbers, the more I focused on waiting times.

I've spent enough years looking at value stream maps to have a habit of looking for queues.

A surprising amount of "developer behaviour" turns out to be queue behaviour in disguise.

  • A review problem turns out to be a queue.
  • An approval problem turns out to be a queue.
  • A testing problem turns out to be a queue.
  • A dependency problem often turns out to be several queues standing on each other's shoulders.

The Faros report shows that production increases dramatically, but review latency increases even more dramatically. More work enters the system, but much more seems to be waiting in it. People are touching more tasks, more pull requests, and switching context more often.

That doesn't look like a coding problem.

It looks like a system that has become better at starting work than finishing it.

I've seen something similar before in a non-agentic space: a feature gets split into several tickets. The database work goes one way. The service work goes another way. Somebody takes the UI. Somebody else takes the tests. Everybody is busy. If you walk around asking people how things are going, the answers are encouraging. Progress is being made everywhere. Each person's tickets are progressing...

...and yet the feature doesn't seem to finish.

There's always one more thing waiting for another thing.

  • The API is ready, but the UI isn't.
  • The UI is ready, but testing isn't.
  • Testing is done, but deployment isn't.
  • Deployment is done, but something failed and has come back.

Most of the feature's life was spent not being worked on at all. It was spent waiting for other pieces of itself.

Years ago I started thinking about that as a scatter-gather problem. We talk a lot about the scatter because that's the part where people expect to become productive. We don't spend nearly as much time talking about the gathering, which is where all the dependencies, assumptions, misunderstandings, and timing problems finally converge.

The moment we split the work, we also created an obligation to put it back together, whether we realise that at the time or not.

AI makes it easier to start work. In some cases, it makes it dramatically easier. A developer can explore more ideas, create more code, open more pull requests, and move more quickly from a blank screen to something that looks finished.

The phrase "looks finished" may be doing a lot of work there. It runs up some metrics. The cycle time is faster, the ticket closure rate is up, the per-person "velocity" (for the gullible who track such things) goes up.

Because software has a peculiar property. A thing can be finished locally and unfinished globally at the same time. Tickets are not features, after all.

  • Some code exists, but its feature doesn't.
  • Some branches exist, but their delivery doesn't.
  • The ticket is done, but no value is evidenced.

More code was being written, but the downstream parts of the system seemed to be struggling. The system appeared to be accumulating inventory faster than it could absorb it.

If that's what's happening, then the interesting question isn't whether AI writes good code. The interesting question is whether coding was ever the limiting constraint in the first place.

The Theory of Constraints tells us that accelerating a non-constraint creates inventory. 

Reinertsen tells us that inventory increases our lead time. 

Little's Law tells us that increasing work-in-process has consequences, whether we acknowledge them or not.

The Faros report don't prove any of those explanations; it only make them hard to ignore.

Does Quality Really Matter?

Defects were doubled in the study. That's significant. Also, note that those are escaped defects. How many were spotted at coding time, review time, in automated tests, or in manual testing and corrected? 

When a defect is discovered, it is returned to the developer's individual work queue (since most people haven't caught on to co-creative work). Those round-trips add up. When an item is returned for work, it must repeat parts of the journey to completion, resulting in delay of that item.

But it's worse than that. 

When development on feature B stops to remediate feature A, B is also slowed, as are all the features waiting in the queue behind B. 

Low quality disrupts and delays work. This has been known for many decades, and I suspect it is obvious, but still hear arguments that it doesn't matter as long as the coding is quick.

Maybe people have a misguided notion that AIs don't make mistakes? That would explain why this didn't seem obvious and inevitable to them?

Quality - internal quality as well as the absence of bugs, is flow protection. As long as you have significant rework, you can not have predictable work. Flow is turbulent and unpredictable. 

Why Collaboration?

Suppose the work is highly dependent. 

Suppose we're building a feature that spans a service, a database, a UI, tests, deployment, and a few things nobody remembered to mention during planning. We can divide that work and coordinate later, or we can coordinate earlier and divide less of it.

Neither choice eliminates coordination.

The coordination exists either way.

The difference is when it happens, where it happens, and what happens while we're waiting for it.

And, perhaps, whether we are approaching it in an intentional and systematic way. How long should a team keep being "accidentally" disrupted by "interruptions" of integration when they happen every week, or many times per month? When does it stop being a surprise and start being simple negligence? 

People argue that adding people to a task will make it late, and cite Brooks' Law, but that's a mistake.

Brook's law holds that adding new people to a late project will create drag that slows overall progress and delays the work.  I agree and salute the formulation.

When we talk about collaborative work, it is adding knowledgeable people who are familiar in the team and fully briefed to a task and doing it early -- initially -- so that everyone may contribute the best of their skill to achieving a full and complete task without late integration surprises. 

Collaboration is also flow protection.

That's why I have a hard time seeing collaborative work as a "mere preference". In some situations, it probably is. If we're updating unrelated websites, there may be little reason to work together.

But software delivery systems are full of work that appears independent until it encounters a shared constraint. By the time those dependencies become visible, the economics have already changed.

The story may not be that AI is making developers less effective. It may be that AI is exposing parts of the system that were already limiting flow. If that's true, then the results in the Faros report are to be expected, and possibly inevitable.

At some point, the conversation must stop being about code and start being about the system that surrounds the code.

That's usually where we start looking.

Comments

Popular posts from this blog

Programming Is Mostly Thinking

Preplanning Poker: Is This Story Even Possible?

Is It My Fault You Can't Handle The Truth?