Multi-Agent Financial Crime Detection Architecture
Multi-agent financial crime systems need role separation, shared evidence, and a deterministic orchestration layer to stay defensible.
We designed a multi-agent pattern for financial crime detection around one principle: agents can disagree, but the orchestration layer must stay deterministic.
The workflow involved several specialist agents. One assembled transaction evidence. One compared customer behaviour to peer groups. One reviewed policy indicators. One drafted a case summary for human review.
Each agent had a role. None owned the final decision.
The architecture
Events carried the case through the workflow. Each agent consumed the event it was allowed to see, produced a structured output, and attached evidence references.
The orchestrator combined outputs through deterministic rules. It handled thresholds, conflicts, escalation, and routing. It also preserved the full evidence trail for review.
This separation prevented a common failure: using one persuasive model answer as if it were a governed decision.
What we measured
We measured agreement between agents, evidence completeness, reviewer override rate, and time to case preparation.
The useful signal was reviewer confidence. Analysts trusted the system when they could see why each agent contributed its part and where uncertainty remained.
The lesson
A multi-agent system is not safer because it has more agents. It is safer when each agent has a narrow role and the system around them records, challenges, and escalates their work.
The fabric matters more than the model.
Rejected option.
We rejected a single generalist agent for the case workflow. It could produce a fluent summary, but the path was hard to interrogate.
When the answer was wrong, we could not tell whether the failure came from evidence gathering, risk interpretation, policy mapping, or summarisation.
Role separation made failures easier to find.
What we changed
Each agent produced a structured artefact. Evidence extraction produced cited facts. Behaviour comparison produced anomalies. Policy review produced mapped indicators. Summary generation produced a draft narrative.
The orchestrator held the decision logic. It compared artefacts, routed disagreements, and escalated uncertainty.
That gave reviewers a trail they could challenge.
Production lesson.
Multi-agent architecture should make accountability clearer. If adding agents makes responsibility harder to see, the design is moving in the wrong direction.
The best version felt almost boring: small roles, structured outputs, and a deterministic control layer.
The operating rule
The rule we kept was simple: the system should make the accountable path the default path.
That meant no hidden side channel, no manual exception that escaped the evidence record, and no output that could not be replayed later. If a reviewer changed the result, the change became part of the same record. If a threshold moved, the previous cases could be replayed before the change reached production.
This added a little ceremony. It removed a larger amount of ambiguity. Engineers knew what evidence the platform expected. Reviewers knew where to look. Operators knew which signal would trigger rollback.
The result was calmer delivery. The team still moved quickly, but each step left a trail strong enough for someone else to inspect weeks later.
We also wrote the failure mode into the runbook. That small step mattered. When the next exception appeared, the team did not have to rediscover the reasoning. They could see the original decision, the rejected alternative, the signal to watch, and the rollback path. That is the level of memory regulated delivery needs.
The practical value came from making the decision visible at the point where work changed hands. Engineers could see the boundary they were protecting. Reviewers could see the evidence they were accepting. Operators could see the rollback path before production pressure arrived. That shared view reduced the amount of trust the process had to borrow from memory. Each agent stayed accountable. That kept review clear.
The Engineering Notebook
Once a month, a long read on what we're learning building governed AI for regulated enterprises. No hot takes, no roundups.
Bugni Labs
R&D Engine
The R&D engine powering our advanced software engineering practices: platform engineering, AI-native architectures, and AI-Native Engineering methodologies for enterprise clients.
Related case studies
- Authorised payment fraud: designing for speed, signals and supervisionExperimenting with multi-agent fraud detection under tight sprint constraints.
- Economic crime prevention as a shared orchestration platformFrom fragmented point-solutions to a vendor-agnostic, event-driven economic crime screening fabric.
- Modernising customer screening for agility and oversightFrom vendor-driven black boxes to a configurable, event-driven screening platform.
You might also enjoy
Detecting AI-Generated Identity Documents in a KYC Pipeline
AI-generated identity documents require evidence-led detection, human escalation, and replayable checks inside the KYC workflow.
Field NoteCredit Decisioning Platform: 20 Microservices in Four Months
A credit decisioning platform reached production by narrowing domains, using event contracts, and making every decision explainable.
Field NoteAgentic AI in Tier 1 Banking: Architecture Lessons
Agentic AI in tier-1 banking works when autonomy is constrained by domain boundaries, event records, and human escalation paths.