How We Built the Internal Developer Platform That Ships 20 Microservices at a Digital Bank

The internal developer platform we built for a UK neobank was supposed to be about speed. Ship faster. Deploy more frequently. Reduce lead time. The usual DORA metrics pitch.

It did all of that. Twenty microservices from blank sheet to production in four months. Twelve to fifteen deployments per day. Lead time under an hour. The numbers were good.

But the thing that actually changed the programme was not speed. It was the argument it eliminated.

The argument that disappears

Before the platform existed, every new service triggered the same debate. Which CI pipeline template do we use? Where do the secrets live? Who provisions the staging environment? How do we satisfy the compliance team's audit requirements? Each question was reasonable. Together they consumed two to three weeks before a single line of domain logic was written.

The platform did not answer these questions. It made them irrelevant. A new service - repository, CI pipeline, monitoring dashboards, compliance documentation, environment configuration - was provisioned in fifteen minutes. Not because we automated the answers. Because we encoded the decisions.

I think most organisations misunderstand what a platform does. They think it is automation. It is not. It is decision materialisation. Every configuration choice, every security baseline, every compliance gate that used to require a meeting now exists as code that runs without asking permission.

The team stopped debating infrastructure. They started debating domain models. That was the shift that mattered.

The team structure that worked (and why it almost didn't)

We started with fourteen engineers. The instinct - mine, initially - was to organise around layers. A platform team. A services team. An integration team. Clean separation of concerns.

We tried it for two weeks. It produced clean architecture documents and almost no working software. The platform team built infrastructure nobody had asked for. The services team waited for platform capabilities that were not ready. The integration team had nothing to integrate.

We reorganised around domains. Four teams of three to four engineers, each owning a bounded context in the credit decisioning flow: application intake, eligibility assessment, credit scoring, and decisioning orchestration. Each team owned their services end-to-end - from domain logic through deployment to production monitoring.

The platform became a thin, shared layer that the domain teams consumed. Not a team that other teams depended on. A product that other teams used. The distinction sounds semantic. In practice, it changed everything. When a platform is a dependency, teams wait. When it is a product, they file feature requests and work around gaps.

Two of the fourteen engineers maintained the platform. The other twelve shipped domain services. The ratio felt wrong at the time - surely the platform needed more investment. It turned out that a platform maintained by two engineers who use it daily stays lean. A platform maintained by a dedicated team that does not ship domain services grows features nobody needs.

What the internal developer platform actually contained

Less than we planned. More than we expected to need.

Service templating. A single command that creates a repository from a blueprint - Go service with event-driven interfaces, Dockerfile, CI pipeline, Kubernetes manifests, monitoring dashboards, and a compliance metadata file. Fifteen minutes from command to first deployment in staging. We spent two days building the first blueprint. Every service after that started there.

Environment management. Three environments - development, staging, production - with identical configurations enforced by infrastructure-as-code. Staging ran in the same cloud region as production, with the same network policies, the same secrets management, the same compliance gates. We caught eleven issues in staging that would have been production incidents without parity. Two of them were data residency violations that would have triggered regulatory attention.

Governed CI/CD. Every deployment passed through the same pipeline: build, behavioural validation against a test corpus, policy-as-code checks for data handling and access controls, and a deployment gate. Low-risk changes - configuration updates, non-breaking API changes - deployed automatically with a canary window. High-risk changes - schema migrations, new data classification scopes - required a human sign-off within a two-hour window.

Observability from minute one. Every service got golden-signal dashboards (latency, traffic, errors, saturation) automatically. Not configured by the team. Provisioned by the blueprint. This was not optional and not customisable. When a service existed, its monitoring existed. No exceptions.

We explicitly did not build a service mesh. We evaluated Istio and Linkerd in the first week and rejected both. The operational complexity was disproportionate to the value for twenty services. Direct service-to-service communication with mTLS and structured logging gave us what we needed. If the service count reaches a hundred, the decision may change. At twenty, a mesh would have been resume-driven engineering.

The metrics

Three months after the platform was operational, we measured:

Metric	Week 1	Month 3
Deployment frequency	2/week	12-15/day
Lead time (commit to prod)	3 days	47 minutes
Change failure rate	18%	3.8%
Mean time to recovery	4 hours	22 minutes
Time to onboard new engineer	2 weeks	2 days
Time to create new service	2-3 weeks	15 minutes

The onboarding number surprised us most. A new engineer joining the team could deploy a change to production on their second day. Not because we had written extensive documentation - we had not. Because the platform eliminated the tribal knowledge that normally gates productivity. The blueprints encoded how things work here. The CI pipeline enforced the quality bar. The monitoring dashboards showed what was happening. A new engineer did not need to know the history. They needed to read the blueprint.

What went wrong

The worst mistake was building the eligibility rules engine as a platform capability instead of a domain service. We reasoned that multiple services would need eligibility logic, so it belonged in the platform. What actually happened: each domain team needed slightly different eligibility semantics, the shared engine became a negotiation surface, and changes to eligibility rules required coordination across three teams. We extracted it back into the decisioning orchestration domain after six weeks. Six weeks wasted.

The second mistake was more subtle. We built the platform with the assumption that teams would adopt it voluntarily because it was better. One team did not. They had a senior engineer who preferred their own CI setup - bash scripts, manual deployments, a monitoring dashboard they had built themselves. Their service worked. It was also the service that caused the only production incident in the programme - a deployment that bypassed the policy-as-code checks because the custom pipeline did not include them.

We made platform adoption mandatory after that. Not by removing choice, but by making the alternative explicit. You can use your own pipeline. You must also pass the same policy-as-code checks the platform pipeline runs. If you want to maintain those checks yourself, that is your engineering time. Every team chose the platform.

The thing nobody told us

The hardest part of building an internal developer platform is not the technology. It is the politics of encoding decisions.

Every configuration choice in the blueprint represents a decision that someone used to make. Secrets management approach. Logging format. Error handling conventions. Deployment strategy. When those decisions lived in people's heads, they were negotiable. When they live in code, they are visible. Visible decisions attract opinions. Opinions attract debate. Debate, if not managed, produces paralysis.

We handled this by making the blueprint opinionated and changeable. The initial choices were ours - informed by the bank's existing standards where they existed, by our own engineering judgment where they did not. Teams could propose changes through a lightweight RFC process. Most did not. Having a default that works is more valuable to most engineers than having the freedom to choose.

The platform shipped twenty microservices in four months. Every service handles credit decisions that are explainable and auditable. Zero unplanned production incidents across the programme.

But the metric I keep coming back to is the one we did not plan to measure. Before the platform, infrastructure decisions consumed roughly 30% of engineering time. After, that number dropped to under 5%. The recovered time went into domain logic - the work that actually differentiates the bank's product.

A platform is not infrastructure. It is the crystallisation of engineering judgment into something that scales without requiring the judgment to be repeated.

Building an Internal Developer Platform in 4 Months