Field NoteEngineering · Intermediate · 3 min read

Detecting AI-Generated Identity Documents in a KYC Pipeline

AI-generated identity documents require evidence-led detection, human escalation, and replayable checks inside the KYC workflow.

Bugni Labs
Share

We added AI-generated identity document checks to a KYC pipeline after the review team began seeing documents that looked plausible at thumbnail size and failed only under closer inspection.

The first instinct was to buy another detector. We chose a narrower approach: add a detection layer that produced evidence, not a single magic score.

The pattern

Each document passed through four checks.

The image layer looked for texture, compression, font, and alignment irregularities. The data layer compared fields across the document, application, and bureau response. The behavioural layer checked upload timing and retry patterns. The review layer presented evidence to a human analyst when confidence fell below threshold.

No check was decisive alone. The system combined signals and explained why the case moved to review.

What broke first

The first version produced too many false positives on low-quality mobile uploads. That created reviewer fatigue. We changed the model from a binary decision to a risk band with evidence categories.

We also added replay. When thresholds changed, we replayed previous cases to understand how review volume would move before changing production behaviour.

The lesson

Synthetic identity risk is solved by a workflow that preserves evidence, keeps human review at the right point, and learns from disputed cases.

The detector matters. The audit path matters more.

Rejected option.

We rejected a single detector score as the production decision. It was attractive because it simplified routing. It was also brittle.

A forged document can fail for several reasons: visual artefacts, inconsistent metadata, mismatched application data, unusual submission behaviour, or a pattern seen in previous disputes. Compressing that into one score removed the explanation the review team needed.

What we added

We added an evidence packet for each escalated case.

The packet included the suspicious regions, field mismatches, confidence bands, prior similar cases, and the specific rule that triggered review. Analysts could agree, override, or mark the trigger as low value.

Those reviewer decisions fed threshold tuning. We did not let the model tune itself directly. Human review stayed part of the learning path.

Production lesson.

Identity verification is an evidence workflow. Detection is only one part of it.

The system became more useful when it helped reviewers make consistent decisions, not when it tried to remove them from the process.

The operating rule

The rule we kept was simple: the system should make the accountable path the default path.

That meant no hidden side channel, no manual exception that escaped the evidence record, and no output that could not be replayed later. If a reviewer changed the result, the change became part of the same record. If a threshold moved, the previous cases could be replayed before the change reached production.

This added a little ceremony. It removed a larger amount of ambiguity. Engineers knew what evidence the platform expected. Reviewers knew where to look. Operators knew which signal would trigger rollback.

The result was calmer delivery. The team still moved quickly, but each step left a trail strong enough for someone else to inspect weeks later.

We also wrote the failure mode into the runbook. That small step mattered. When the next exception appeared, the team did not have to rediscover the reasoning. They could see the original decision, the rejected alternative, the signal to watch, and the rollback path. That is the level of memory regulated delivery needs.

The practical value came from making the decision visible at the point where work changed hands. Engineers could see the boundary they were protecting. Reviewers could see the evidence they were accepting. Operators could see the rollback path before production pressure arrived. That shared view reduced the amount of trust the process had to borrow from memory.

Was this useful?
Share

The Engineering Notebook

Once a month, a long read on what we're learning building governed AI for regulated enterprises. No hot takes, no roundups.

Prefer to talk it through?

Bugni Labs

R&D Engine

The R&D engine powering our advanced software engineering practices: platform engineering, AI-native architectures, and AI-Native Engineering methodologies for enterprise clients.

Related case studies