ObservabilityTestabilityRuntime Integrityintermediate5 min read
When a Model Upgrade Breaks Production
A Gemini 2.5 Pro upgrade caused a regression in our evidence extraction pipeline. Context adherence dropped. Structured outputs degraded. The benchmarks said it was better. Our production data said otherwise.
Bugni Labs