Event DrivenPlatform ThinkingModularityGovernanceCloud Native
|Intermediate|8 min read

10 Days to Hours: Screening Platform

Discover how a major UK bank's screening platform leveraged AI governance in financial services to reduce commercial customer onboarding from 10 days to under 12 hours, with vendor-agnostic architecture and zero unplanned incidents.

Bugni Labs
Share

How a Major UK Bank's Screening Platform Went From 10-Day Onboarding to Hours

We were brought in to replace the commercial customer screening platform at a major UK bank. The existing system had been in production for over a decade. It worked - in the sense that it produced results and nobody had been fined - but it had become the single largest bottleneck in the bank's commercial onboarding process.

New commercial customers took an average of ten business days to onboard. Of those ten days, six were consumed by screening - sanctions checks, PEP matching, adverse media searches, and the manual remediation that followed when the system produced ambiguous results. Which it did constantly.

The bank's relationship managers had started warning prospective clients about the wait. Some prospects walked. The cost of that lost business was hard to quantify, but the head of commercial banking described it as "the most expensive system we never think about."

What the legacy system looked like

The screening platform was a monolith. A single Java application, deployed on-premise, connected to four external screening providers via point-to-point integrations. Each provider had its own connector - bespoke code, written at different times, by different teams, with different error handling conventions.

The architecture looked roughly like this:

Onboarding request
  → Screening orchestrator (single-threaded, sequential)
    → Provider A connector (SOAP/XML, synchronous, 8s avg response)
    → Provider B connector (REST/JSON, synchronous, 3s avg response)
    → Provider C connector (batch file upload, 45-min processing window)
    → Provider D connector (REST/JSON, synchronous, 12s avg response)
  → Result aggregation (custom scoring, hard-coded thresholds)
  → Case creation (manual review queue)

The orchestrator called each provider sequentially. If Provider C was running its batch window, the entire pipeline stalled. A single customer screening took between 4 and 90 minutes depending on Provider C's queue position. For batch onboarding - when the bank acquired a portfolio or onboarded a corporate group - screening alone could take days.

The result aggregation layer was the worst part. Each provider returned results in a different format. The aggregation logic normalised them into a common schema, applied a scoring algorithm, and decided whether the case needed manual review. The scoring algorithm had been modified 47 times over its lifetime. Nobody fully understood the interaction effects between the rules. The team referred to it as "the hairball."

False positive rates were running at 38%. More than a third of all screening results flagged for manual review turned out to be clean. Each false positive consumed 25 to 40 minutes of an analyst's time.

The decisions we made

We had twelve weeks. The bank wanted the new platform running in parallel with the legacy system before the end of the quarter.

Decision 1: Event-driven, not request-response

The legacy system was synchronous and sequential. We replaced it with an event-driven architecture built on Apache Kafka. Every screening request becomes an event. Provider calls happen in parallel. Results flow back as events. The orchestration logic is stateless - it reacts to events rather than managing a workflow.

The core event flow:

ScreeningRequested
  → [parallel]
      ProviderAQueryDispatched → ProviderAResultReceived
      ProviderBQueryDispatched → ProviderBResultReceived
      ProviderCQueryDispatched → ProviderCResultReceived
      ProviderDQueryDispatched → ProviderDResultReceived
  → AllProvidersResolved
  → ScreeningScored
  → CaseCreated | ScreeningCleared

Each provider adapter publishes its result as an event. A scoring service consumes all provider results for a given screening request, applies the scoring rules, and emits either a CaseCreated or ScreeningCleared event. The downstream case management system subscribes to CaseCreated events.

This eliminated the Provider C bottleneck entirely. Provider C still runs batch processing, but the platform no longer blocks on it. When Provider C's results arrive - minutes or hours later - they flow through the same event pipeline and update the screening result. If the other three providers have already cleared the customer, onboarding proceeds with a provisional clearance, and Provider C's results are reconciled asynchronously.

Decision 2: Pluggable provider adapters

Each provider got a standardised adapter interface. The adapter handles authentication, request formatting, response parsing, retry logic, and circuit breaking. The core platform knows nothing about provider-specific protocols.

interface ScreeningProvider {
  id: string;
  query(subject: ScreeningSubject): Promise<ProviderResult>;
  healthCheck(): Promise<HealthStatus>;
  capabilities(): ProviderCapabilities;
}

interface ProviderResult {
  providerId: string;
  matchConfidence: number;    // 0.0 - 1.0
  matches: MatchRecord[];
  metadata: {
    latencyMs: number;
    apiVersion: string;
    queryTimestamp: string;
  };
}

Switching providers - something the bank had been wanting to do for two years but could not justify the engineering cost - became a configuration change. During the project, the bank evaluated a fifth provider. We integrated it in three days. Under the old architecture, the estimate had been eight weeks.

Decision 3: Distributed search with local caching

The legacy system hit provider APIs on every screening request, even when the same entity had been screened hours earlier. We added a local search layer - an Elasticsearch cluster that caches provider results with a configurable TTL.

For re-screening (the bank re-screens existing customers on a rolling 90-day cycle), the local cache hit rate runs at 72%. A cached screening resolves in under 200 milliseconds. A full provider round-trip, in parallel, averages 4.2 seconds.

{
  "screening_id": "scr_8f2a1b3c",
  "subject": {
    "entity_type": "corporate",
    "name": "Acme Holdings Ltd",
    "jurisdiction": "GB",
    "registration_number": "12345678"
  },
  "resolution": {
    "outcome": "CLEARED",
    "score": 0.12,
    "threshold": 0.65,
    "providers_consulted": 4,
    "cache_hits": 2,
    "total_latency_ms": 4217
  }
}

Decision 4: Transparent scoring

We replaced the hairball. The new scoring engine uses a weighted ensemble model with explainable outputs. Every score includes a breakdown - which providers contributed, which match fields drove the score, and why.

This was not technically difficult. The difficult part was getting the compliance team to agree on new thresholds. The old thresholds had been tuned over years of operational experience. Nobody wanted to sign off on new ones.

We solved this by running the new scoring engine in shadow mode for four weeks. Every screening request went through both the old and new systems. We compared outcomes. The new system produced a 14% lower false positive rate with zero increase in false negatives - no true matches were missed. The compliance team reviewed 200 randomly sampled discrepancies. They approved the cutover.

Operational results

The numbers after three months in production:

MetricLegacyNew platform
Median screening time47 minutes3.8 seconds
P95 screening time6.2 hours28 seconds
End-to-end onboarding10 business days4.1 hours
False positive rate38%24%
Provider integration time8 weeks3 days
Re-screening throughput800/day12,000/day
Analyst case load340 cases/day215 cases/day

The median screening time improvement is dramatic - 47 minutes to 3.8 seconds - but it is almost entirely explained by two changes: parallel provider calls and the elimination of the Provider C batch bottleneck. The architecture was not clever. It was obvious. The legacy system just never got refactored.

The false positive reduction from 38% to 24% came from the new scoring model, but also from better entity resolution. The legacy system matched on name similarity alone. The new system uses a composite match that includes jurisdiction, registration number, date of birth (for individuals), and known aliases. More matching dimensions means fewer ambiguous results.

The analyst case load dropped by 37%. The bank redeployed four analysts from screening to enhanced due diligence - higher-value work that had been under-resourced.

What did not go smoothly

Provider C's batch API had an undocumented rate limit. When we started sending requests in parallel rather than sequentially, we hit the limit within the first hour of production traffic. Their API returned a 200 OK with an empty result set - no error code, no rate limit header. We spent two hours diagnosing what looked like a data quality issue before we spotted the pattern.

The fix was a client-side rate limiter in the Provider C adapter - 50 requests per minute, with exponential backoff. We also added a reconciliation job that re-checks any screening where Provider C returned zero matches, because we can no longer trust that zero matches means "no matches found" versus "rate limited silently."

The Kafka cluster sizing was wrong. We estimated throughput based on average daily screening volume. We did not account for the bank's monthly re-screening batch, which generates 30x the normal daily volume on the first Monday of every month. The cluster fell behind on consumer lag during the first re-screening run. We scaled from 6 to 12 partitions on the screening-results topic and added two consumer instances. It has not fallen behind since.

What we learned

The biggest performance gain came from the simplest architectural change - making provider calls parallel instead of sequential. We spent weeks designing the event-driven architecture, the pluggable adapters, the scoring engine. All of that mattered. But the single change that moved onboarding from days to hours was removing the sequential bottleneck.

When you find a system that processes things one at a time and waits for each step to finish before starting the next, the first question is always: do these steps actually depend on each other? In this case, they did not. They never had. The sequential design was an accident of the original implementation, not a requirement.

The second lesson is about shadow mode. Running the old and new systems in parallel for four weeks cost engineering time and infrastructure spend. It also made the compliance sign-off straightforward. In regulated environments, the cost of proving equivalence is always worth paying. Nobody wants to be the person who approved a new screening system that missed a sanctioned entity.

Build the proof into the migration plan. Not after. Not as an afterthought. From the start.

event-driven architecturefinancial servicesscreeningplatform engineeringKafkaregulated delivery
Was this useful?
Share

Bugni Labs

R&D Engine

The R&D engine powering our advanced software engineering practices — platform engineering, AI-native architectures, and AI-Native Engineering methodologies for enterprise clients.