platform-engineering|5 min read

DORA Metrics Explained: SPACE & Productivity

Developer productivity has become a polite name for the metric a CIO is allowed to ask about. DORA and SPACE are useful, but only after the harder conversation has been had: whether the work the team is measuring is the right work, and whether the velocity it is chasing buys back the audit cost it generates.

Bugni Labs
Share

Developer productivity has become a polite name for the metric a CIO is allowed to ask about. Velocity is the impolite name. Both are usually the wrong question.

We spend more time than we would like helping engineering leaders pick a productivity framework before we have established what they are actually trying to measure. DORA, SPACE, flow efficiency, cycle time, the shelf is full of credible-looking options.

The work is upstream of the shelf.

In regulated financial services, the question is not which framework. It is what the metric is supposed to defend.

What DORA actually measures, and what it doesn't

The four DORA metrics, deployment frequency, lead time for changes, change failure rate, and time to restore, are a coherent statement about delivery flow. They were always meant to be system-level signals, not individual scorecards.

We find that the teams who use them well treat them as smoke detectors. A change in any of the four prompts a question, not a verdict.

The teams who use them badly print them on dashboards and rank engineers against them.

DORA is silent on whether you built the right thing. It measures the speed of the production line. If the line is producing the wrong product quickly, DORA will tell you the team is elite.

In a regulated environment, that is a meaningful gap. Elite delivery of a model the regulator will not let you put in production is not a win.

What SPACE adds, and what it costs

SPACE was a response to exactly that gap. It is broader by design: satisfaction, performance, activity, communication, efficiency.

The value of SPACE is that it forces leaders to instrument the human side. Satisfaction and communication are not soft metrics. They are leading indicators of the kind of attrition and miscommunication that show up later as failed audits.

The cost of SPACE is that some of its inputs are survey data. Survey data degrades quickly under measurement.

Teams told they are being scored on satisfaction tend to report higher satisfaction. The signal becomes the lie the team thinks the leader wants to hear.

We have started to think of SPACE less as a framework to score teams with, and more as a checklist for whether a productivity conversation is grown-up. If a leader is talking only about deployment frequency, they are missing four dimensions.

If they are scoring teams on a satisfaction percentile, they are misusing the framework.

The AI coding tax nobody is reporting

The thing we keep watching teams not reckon with is what AI coding assistants are doing to their numbers.

Adoption raises throughput metrics measurably. Lead time falls.

Deployment frequency rises. The dashboard looks better.

Change failure rate rises too. The reviews degrade. Bugs that a human author would have caught at write-time slip through to production.

The lift in apparent productivity is partly real, and partly a transfer of work from authoring into incident response.

In regulated financial services, that transfer matters more than in most industries. An incident in a deposits ledger or a credit-decisioning engine is not a Jira ticket. It is a regulator letter.

The teams we see handling this well are the ones who explicitly separate the throughput metrics from the stability metrics, watch them both, and refuse to celebrate one while the other moves the wrong way.

The teams that celebrate the deployment-frequency lift and then quietly absorb the stability hit are the ones we expect to feature in next year's incident reports.

What we actually measure

When a CIO asks us how to measure productivity in a regulated platform, we push them back to three questions.

What is the system supposed to do, in a sentence the regulator could read? Without that, no metric stops the team running fast in the wrong direction.

What does a good day on this platform look like, measured? That is where DORA earns its keep, fast lead times, low change failure rate, fast restore, but only against a clear product baseline.

What does a good year on this team look like, measured? That is where SPACE earns its keep, engineer retention, signal of communication health, evidence that the people who built the thing are still able to build the next thing.

If any of the three is missing, the productivity framework is decorative.

The CIOs we find most useful to work with are the ones who already know which of the three answers they cannot give. The framework is the easy part. Knowing what good actually looks like for a specific regulated platform is the work that takes the year.

The honest answer

The framework debate is a substitute for the harder conversation. Most engineering leaders know roughly where their teams sit on DORA. Many know roughly where they sit on SPACE.

What they are less sure of is whether the work they are measuring is the right work, and whether the velocity they are chasing buys back the audit cost it generates.

Those are the productivity questions worth having.

A team that ships code quickly on a system the bank cannot defend in front of a regulator has not been productive. It has been busy.

The frameworks are useful, but only after that distinction has been made. The compass calibration matters more than the speedometer, and most productivity debates are speedometer debates in disguise.

developer-productivitydora-metricsspace-frameworkengineering-leadershipfinancial-services
Was this useful?
Share

Bugni Labs

R&D Engine

The R&D engine powering our advanced software engineering practices — platform engineering, AI-native architectures, and AI-Native Engineering methodologies for enterprise clients.