The Faros Whiplash and The Systems View

The Faros report on AI-assisted development has been rattling around in my head for a few days.

The story it tells is a strange one:

Teams are producing more code.
More tasks are being completed.
More pull requests are being created.

And yet :

waiting times are up,
review times are up,
lead times are up,
incidents are up, and
bugs are up.

The picture shows a development process that's getting busier without getting faster.

My first reaction was the same as everyone else's. Maybe the AI-generated code just isn't very good. The report certainly contains evidence that quality is suffering. But the more I looked at the numbers, the more I focused on waiting times.

I've spent enough years looking at value stream maps to have a habit of looking for queues.

A surprising amount of "developer behaviour" turns out to be queue behaviour in disguise.

A review problem turns out to be a queue.
An approval problem turns out to be a queue.
A testing problem turns out to be a queue.
A dependency problem often turns out to be several queues standing on each other's shoulders.

The Faros report shows that production increases dramatically, but review latency increases even more dramatically. More work enters the system, but much more seems to be waiting in it. People are touching more tasks, more pull requests, and switching context more often.

That doesn't look like a coding problem.

It looks like a system that has become better at starting work than finishing it.

The Waiting is the Hardest Part

I've seen something similar before in a non-agentic space: a feature gets split into several tickets. The database work goes one way. The service work goes another way. Somebody takes the UI. Somebody else takes the tests. Everybody is busy. If you walk around asking people how things are going, the answers are encouraging. Progress is being made everywhere. Each person's tickets are progressing...

...and yet the feature doesn't seem to finish.

There's always one more thing waiting for another thing.

The API is ready, but the UI isn't.
The UI is ready, but testing isn't.
Testing is done, but deployment isn't.
Deployment is done, but something failed and has come back.

Most of the feature's life was spent not being worked on at all. It was spent waiting for other pieces of itself.

Years ago I started thinking about that as a scatter-gather problem.

We talk a lot about the scatter because that's the part where people expect to become productive. We don't spend nearly as much time talking about the gathering, which is where all the dependencies, assumptions, misunderstandings, and timing problems finally converge.

The moment we split the work, we also created an obligation to put it back together. Every "scatter" requires a "gather."

AI makes it easier to start work. A developer can explore more ideas, create more code, open more pull requests, and move more quickly from a blank screen to something that looks finished. This is generally a good thing for prototypes, but it's not necessarily a good thing for production code.

It runs up some metrics. The cycle time is faster, the ticket closure rate is higher, and the per-person "velocity" (for the gullible who track such things) increases.

Software has a peculiar property: a thing can be finished locally and unfinished globally at the same time. Tickets are not features, after all.

Some code exists, but its feature doesn't.
Some branches exist, but their delivery doesn't.
The ticket is done, but no value is evidenced.

More code was being written, but the downstream parts of the system seemed to be struggling. The system appeared to be accumulating inventory faster than it could absorb it.

If that's what's happening, then the interesting question isn't whether AI writes good code. The interesting question is whether coding was ever the limiting constraint in the first place.

The Theory of Constraints tells us that accelerating a non-constraint creates inventory.

Reinertsen tells us that inventory increases our lead time.

Little's Law tells us that increasing work-in-process has consequences, whether we acknowledge them or not.

The Faros report don't prove any of those explanations; it only make them hard to ignore.

Does Quality Really Matter?

Defects were doubled in the study. That's significant. Also, note that those are escaped defects. How many were spotted at coding time, review time, in automated tests, or in manual testing and corrected?

When a defect is discovered, it is returned to the developer's individual work queue (since most people haven't caught on to co-creative work). Those round-trips add up. When an item is returned for work, it must repeat parts of the journey to completion, resulting in delay of that item.

But it's worse than that.

When development on feature B stops to remediate feature A, B is also slowed, as are all the features waiting in the queue behind B.

Low quality disrupts and delays work. This has been known for many decades, and I suspect it is obvious, but still hear arguments that it doesn't matter as long as the coding is quick.

Maybe people have a misguided notion that AIs don't make mistakes? That would explain why this didn't seem obvious and inevitable to them?

The key concept here is First Time Through. If code is of acceptable quality when it is submitted to the quality gate "gauntlet", then back-flow is controlled. It might be that back-flow is eliminated entirely, or it may be reduced enough that it ceases to be a major issue.

Quality - internal quality as well as the absence of bugs, is flow protection. As long as you have significant rework, you can not have predictable work. Flow is turbulent and unpredictable.

Why Collaboration?

Collaboration is one of the many ways to improve First Time Through without lowering the quality bar.

Suppose the work is highly dependent.

Suppose we're building a feature that spans a service, a database, a UI, tests, deployment, and a few things nobody remembered to mention during planning.

We can divide that work (scatter) and coordinate later (gather), or we can coordinate earlier and work it together.

Neither choice eliminates coordination. The coordination exists either way.

The difference is when it happens, where it happens, and what happens while we're waiting for it.

We need to approach it in an intentional and systematic way.

How long should a team keep being "accidentally" disrupted by "interruptions" of integration when they happen every week, or many times per month? When does it stop being a surprise and start being simple negligence?

People argue that adding people to a task will make it late, and cite Brooks' Law, but that's a mistake. Brook's law holds that adding new people to a late project will create drag, slowing overall progress and delaying the work. I agree and salute the formulation.

But this isn't that situation - having enough people with the right skills to begin with is just good resourcing.

When we talk about collaborative work, it is about adding knowledgeable people who are familiar with the team and fully briefed on a task, and doing it early -- initially -- so that everyone may contribute the best of their skills to achieving a complete task without late integration surprises.

Collaboration is flow protection. It eliminates backflow of work by raising initial quality and by integrating the tasks one by one, rather than "brick building" for a huge integration at the end.

That's why I have a hard time seeing collaborative work as a "mere preference". In some situations, it probably is.

There is no reason to avoid doing perfectly independent work in parallel. This is one of the best reasons to have multiple teams and multiple products.

The reason organisations struggle with poorly organised systems (think "God Classes") is that they make truly independent work a rare and lucky circumstance.

Software delivery systems are full of work that appears independent until it encounters the shared constraint. By the time dependencies become visible, the economics have already changed for the worse.

What's that mean?

The story does not indicate that AI is making developers less effective or slower at task completion.

It does seem to suggest that any gains in speed are lost to downstream queues, activities, and possibly back-flow.

It also suggests that agentic code has more hard-to-detect errors.

Perhaps agentic coding is exposing where the system already limits flow and comprehension. If that's true, then the results in the Faros report are to be expected, and possibly inevitable.

At some point, the conversation must stop being about speed of coding and start being about the system that surrounds the code.

That's where we need to start looking.

Search This Blog

Agile Otter Blog