Detecting AI-Generated Identity Documents in a KYC Pipeline
AI-generated identity documents require evidence-led detection, human escalation, and replayable checks inside the KYC workflow.
We added AI-generated identity document checks to a KYC pipeline after the review team began seeing documents that looked plausible at thumbnail size and failed only under closer inspection.
The first instinct was to buy another detector. We chose a narrower approach: add a detection layer that produced evidence, not a single magic score.
The pattern
Each document passed through four checks.
The image layer looked for texture, compression, font, and alignment irregularities. The data layer compared fields across the document, application, and bureau response. The behavioural layer checked upload timing and retry patterns. The review layer presented evidence to a human analyst when confidence fell below threshold.
No check was decisive alone. The system combined signals and explained why the case moved to review.
What broke first
The first version produced too many false positives on low-quality mobile uploads. That created reviewer fatigue. We changed the model from a binary decision to a risk band with evidence categories.
We also added replay. When thresholds changed, we replayed previous cases to understand how review volume would move before changing production behaviour.
The lesson
Synthetic identity risk is solved by a workflow that preserves evidence, keeps human review at the right point, and learns from disputed cases.
The detector matters. The audit path matters more.
Rejected option.
We rejected a single detector score as the production decision. It was attractive because it simplified routing. It was also brittle.
A forged document can fail for several reasons: visual artefacts, inconsistent metadata, mismatched application data, unusual submission behaviour, or a pattern seen in previous disputes. Compressing that into one score removed the explanation the review team needed.
What we added
We added an evidence packet for each escalated case.
The packet included the suspicious regions, field mismatches, confidence bands, prior similar cases, and the specific rule that triggered review. Analysts could agree, override, or mark the trigger as low value.
Those reviewer decisions fed threshold tuning. We did not let the model tune itself directly. Human review stayed part of the learning path.
Production lesson.
Identity verification is an evidence workflow. Detection is only one part of it.
The system became more useful when it helped reviewers make consistent decisions, not when it tried to remove them from the process.
The operating rule
The rule we kept was simple: the system should make the accountable path the default path.
That meant no hidden side channel, no manual exception that escaped the evidence record, and no output that could not be replayed later. If a reviewer changed the result, the change became part of the same record. If a threshold moved, the previous cases could be replayed before the change reached production.
This added a little ceremony. It removed a larger amount of ambiguity. Engineers knew what evidence the platform expected. Reviewers knew where to look. Operators knew which signal would trigger rollback.
The result was calmer delivery. The team still moved quickly, but each step left a trail strong enough for someone else to inspect weeks later.
We also wrote the failure mode into the runbook. That small step mattered. When the next exception appeared, the team did not have to rediscover the reasoning. They could see the original decision, the rejected alternative, the signal to watch, and the rollback path. That is the level of memory regulated delivery needs.
The practical value came from making the decision visible at the point where work changed hands. Engineers could see the boundary they were protecting. Reviewers could see the evidence they were accepting. Operators could see the rollback path before production pressure arrived. That shared view reduced the amount of trust the process had to borrow from memory.
The Engineering Notebook
Once a month, a long read on what we're learning building governed AI for regulated enterprises. No hot takes, no roundups.
Bugni Labs
R&D Engine
The R&D engine powering our advanced software engineering practices: platform engineering, AI-native architectures, and AI-Native Engineering methodologies for enterprise clients.
Related case studies
- Authorised payment fraud: designing for speed, signals and supervisionExperimenting with multi-agent fraud detection under tight sprint constraints.
- Economic crime prevention as a shared orchestration platformFrom fragmented point-solutions to a vendor-agnostic, event-driven economic crime screening fabric.
- Automating evidence extraction for regulatory narrativesReducing manual effort in regulatory narratives while improving traceability and consistency.
You might also enjoy
Multi-Agent Financial Crime Detection Architecture
Multi-agent financial crime systems need role separation, shared evidence, and a deterministic orchestration layer to stay defensible.
Field NoteAgentic AI in Tier 1 Banking: Architecture Lessons
Agentic AI in tier-1 banking works when autonomy is constrained by domain boundaries, event records, and human escalation paths.
Field NoteWhen a Model Upgrade Breaks Production
A Gemini 2.5 Pro upgrade caused a regression in our evidence extraction pipeline. Context adherence dropped. Structured outputs degraded. The benchmarks said it was better. Our production data said otherwise.