Multi-Agent Crime Detection - Field Note

We designed a multi-agent pattern for financial crime detection around one principle: agents can disagree, but the orchestration layer must stay deterministic.

The workflow involved several specialist agents. One assembled transaction evidence. One compared customer behaviour to peer groups. One reviewed policy indicators. One drafted a case summary for human review.

Each agent had a role. None owned the final decision.

The architecture

Events carried the case through the workflow. Each agent consumed the event it was allowed to see, produced a structured output, and attached evidence references.

The orchestrator combined outputs through deterministic rules. It handled thresholds, conflicts, escalation, and routing. It also preserved the full evidence trail for review.

This separation prevented a common failure: using one persuasive model answer as if it were a governed decision.

What we measured

We measured agreement between agents, evidence completeness, reviewer override rate, and time to case preparation.

The useful signal was reviewer confidence. Analysts trusted the system when they could see why each agent contributed its part and where uncertainty remained.

The lesson

A multi-agent system is not safer because it has more agents. It is safer when each agent has a narrow role and the system around them records, challenges, and escalates their work.

The fabric matters more than the model.

Rejected option.

We rejected a single generalist agent for the case workflow. It could produce a fluent summary, but the path was hard to interrogate.

When the answer was wrong, we could not tell whether the failure came from evidence gathering, risk interpretation, policy mapping, or summarisation.

Role separation made failures easier to find.

What we changed

Each agent produced a structured artefact. Evidence extraction produced cited facts. Behaviour comparison produced anomalies. Policy review produced mapped indicators. Summary generation produced a draft narrative.

The orchestrator held the decision logic. It compared artefacts, routed disagreements, and escalated uncertainty.

That gave reviewers a trail they could challenge.

Production lesson.

Multi-agent architecture should make accountability clearer. If adding agents makes responsibility harder to see, the design is moving in the wrong direction.

The best version felt almost boring: small roles, structured outputs, and a deterministic control layer.

The operating rule

The rule we kept was simple: the system should make the accountable path the default path.

That meant no hidden side channel, no manual exception that escaped the evidence record, and no output that could not be replayed later. If a reviewer changed the result, the change became part of the same record. If a threshold moved, the previous cases could be replayed before the change reached production.

This added a little ceremony. It removed a larger amount of ambiguity. Engineers knew what evidence the platform expected. Reviewers knew where to look. Operators knew which signal would trigger rollback.

The result was calmer delivery. The team still moved quickly, but each step left a trail strong enough for someone else to inspect weeks later.

We also wrote the failure mode into the runbook. That small step mattered. When the next exception appeared, the team did not have to rediscover the reasoning. They could see the original decision, the rejected alternative, the signal to watch, and the rollback path. That is the level of memory regulated delivery needs.

The practical value came from making the decision visible at the point where work changed hands. Engineers could see the boundary they were protecting. Reviewers could see the evidence they were accepting. Operators could see the rollback path before production pressure arrived. That shared view reduced the amount of trust the process had to borrow from memory. Each agent stayed accountable. That kept review clear.

Multi-Agent Financial Crime Detection Architecture

The architecture

What we measured

The lesson

What we changed

The operating rule

The Engineering Notebook

You might also enjoy