Field NotePlatform · Intermediate · 13 min read

Running the agent-fabric Locally with Docker Compose

How to run agent-fabric locally with Docker Compose while keeping the same gateway, auth, agent, and web code paths used in production.

Raghu Vennam
Share
Series: agent-fabric · Part 2

Running the agent-fabric locally with docker-compose

Five services on localhost, no GCP account, a working multi-provider chat by the end of this post.

Code: github.com/FintelligenX/agent-fabric

In post 1 we walked through the design tenets behind agent-fabric: zero trust between services, guardrails enforced at multiple layers, domain logic factored into agent.yaml, Gemini as the default LLM so a fresh deployment needs only a GCP project. This post shows what those tenets look like running on a developer laptop.

The goal is opinionated: production code paths, local credentials. The same gateway-svc, the same agent-svc, the same auth-broker-svc, the same web-ui-svc that runs behind a global load balancer in production also runs on localhost:8082/8083/8081/8080. The only things that change are where secrets come from and how JWTs are verified: both controlled by environment variables. Nothing is forked, nothing is mocked at a layer that matters.

This is the part most prototype-to-platform projects get wrong. The local stack ends up being a Flask app and a mock_llm.py, the cloud stack ends up being something else entirely, and you ship the bugs that live in the gaps between them. agent-fabric treats local dev as a configuration of the production binary.

By the end of this post, you'll have:

  • Five containers on your machine, healthy, talking to each other over Docker's bridge network
  • A real OAuth client registered to a public PKCE flow
  • A working /v1/chat call against Gemini using a Gemini Developer API key
  • The same call against Anthropic, gated behind an opt-in secret
  • The React SPA at localhost:8080 running through the same auth + chat flow it uses in production

You won't need a GCP project, an Anthropic account, or anything beyond Docker Desktop, Node.js 20 (for one optional npm step), and about ten minutes.

The trick: env-driven mode switches

The platform has exactly two production-vs-local mode switches, and they're both env vars:

Env varProductionLocal
LOCAL_SECRETS_DIRunset (uses GCP Secret Manager)/local-secrets (mounted from platform/local-secrets/)
SKIP_JWT_VERIFICATIONunset (RS256 against JWKS)true for gateway-svc (decodes without verifying)

That's it. No if DEBUG: branches, no mock_anthropic.py, no second Dockerfile. secret_client.fetch_secret_with_version() checks LOCAL_SECRETS_DIR first; if a file at that path exists, it reads from there. Otherwise it falls through to the real Secret Manager client. gateway-svc/jwt_validator.py looks at SKIP_JWT_VERIFICATION; if true, it decodes the JWT payload but skips signature checking: every other behaviour (extracting sub, deriving x-identity, audience checks, expiry) runs the same code as production.

This works because the platform was designed around it. Workload Identity for agent-sa reading Vertex AI? Skipped locally: the GeminiProvider falls back to API-key mode via LOCAL_SECRETS_DIR. mTLS termination for partner integrations? Skipped locally: partner pattern isn't wired in. Cloud NAT egress IP for Anthropic? Doesn't matter: local Docker has its own default route. Everything that can be production-equivalent locally is. Everything that fundamentally can't (Workload Identity Federation needs a real federated pool) gracefully degrades to an analogous local pattern.

The result is that a bug you can reproduce locally is a bug you've reproduced in the production code path. That's worth more than any amount of polished tooling.

What the local stack wires together

platform/docker-compose.yml declares five containers:

Reference snippettext
localhost:8080  web-ui-svc       React SPA + nginx
localhost:8081  auth-broker-svc  JWT issuance + JWKS
localhost:8082  gateway-svc      JWT validation + guardrails + routing
localhost:8083  agent-svc        internal-only; called via gateway
localhost:6379  redis            rate limit + token budget counters

The compose file mounts ./local-secrets into every service that needs it, hard-codes the per-service env vars to match the local hostnames, and chains healthchecks so the start order is deterministic. Redis comes up first; auth-broker-svc and agent-svc wait for it; gateway-svc waits for both; web-ui-svc waits for gateway-svc. By the time docker compose up exits with all containers healthy, the full request path from browser → SPA → gateway → agent → LLM is reachable.

Two compose-only details worth knowing:

  • The agent-svc build context is the repo root, not platform/agent-svc/. The Dockerfile copies agents/ into the image because the platform reads agent.yaml and the system prompt from those paths. Building from platform/agent-svc/ would produce an image without any agent to serve. The compose file gets this right (context: ..); manual docker build invocations have to specify --file platform/agent-svc/Dockerfile . from the repo root.
  • web-ui-svc is built with a multi-stage Dockerfile: vite produces a static bundle in stage one, nginx:alpine serves it in stage two. A docker-entrypoint.sh renders /usr/share/nginx/html/config.js from runtime env vars at container start, so the same image runs against any environment. Locally that means AUTH_BROKER_URL=http://localhost:8081 and GATEWAY_URL=http://localhost:8082 get baked into the bundle at container start, not at image build.

The five files you need in local-secrets/

The contents of platform/local-secrets/ is the only thing you have to create by hand. It's .gitignored, lives only on your machine, and contains the secrets the cloud version reads from Secret Manager.

Code samplebash
cd /path/to/agent-fabric
mkdir -p platform/local-secrets

1. RS256 signing key: auth-broker-svc signs every JWT with this. Locally we generate a fresh one; in production it's seeded out-of-band into Secret Manager.

Code samplebash
openssl genrsa -out platform/local-secrets/oauth-signing-key 2048

2. Gemini Developer API key: the default LLM path. Go to https://aistudio.google.com/app/apikey and create one (the free tier covers all the dev usage you'll need). Drop it into a one-line file:

Code samplebash
echo -n "AIza..." > platform/local-secrets/gemini-api-key

This is the key the GeminiProvider reads when GEMINI_API_KEY_SECRET=gemini-api-key (which docker-compose.yml sets by default). In production this same code path would fetch from Secret Manager via Workload Identity; locally it reads the file. Same code, different source.

3. Anthropic API key: optional. If you want to test the claude-* model path locally, drop one in:

Code samplebash
echo -n "sk-ant-..." > platform/local-secrets/anthropic-api-key

If the file isn't there, a request with model: "claude-haiku-4-5-20251001" returns 503 provider_not_configured: a deliberate behaviour we'll look at in post 4. A Gemini-only local stack is a valid configuration and works without an Anthropic key.

4. Local PKCE client registrations: auth-broker-svc needs to know about the SPA's OAuth clients. Two small JSON files, which you can copy verbatim:

Code samplebash
cat > platform/local-secrets/oauth-client-local-ui-user <<'EOF'
{"client_id":"local-ui-user","identity_prefix":"user","secret_hash":null,"redirect_uris":["http://localhost:8080/callback"]}
EOF
cat > platform/local-secrets/oauth-client-local-ui-ops <<'EOF'
{"client_id":"local-ui-ops","identity_prefix":"admin","secret_hash":null,"redirect_uris":["http://localhost:8080/callback"]}
EOF

secret_hash: null is the marker that this is a public PKCE client: no secret expected, the verifier is the PKCE code challenge. Two clients because the SPA has two roles: User (chat, own usage) and Ops (all-identity usage, read-only clients view). The identity_prefix decides which prefix the JWT sub gets: user:[email protected] or admin:[email protected].

5. Google OAuth client secret: optional. Only needed if you want to drive the SPA login through a real Google account. For token-only tests (which we'll use below), you can skip this. If you want it:

Code samplebash
echo -n "GOCSPX-..." > platform/local-secrets/google-oauth-client-secret

A Google OAuth client comes from the Google Cloud Console → Credentials. The Console page has a "Create OAuth client ID" button; pick "Web application", add http://localhost:8081/oauth/google/callback as an authorised redirect URI, and the page gives you a Client ID and a Client Secret string. Both go into env vars when you start compose:

Code samplebash
GOOGLE_CLIENT_ID=<your-client-id> docker compose up --build

The Client ID lands in auth-broker-svc's env; the Client Secret is read from the local-secrets file at request time. Both can be omitted for non-SPA testing.

Bringing it up

Code samplebash
cd platform
docker compose up --build

First run is slow: Python dependencies download, the Vite bundle builds, the nginx image pulls. Subsequent runs are seconds. The terminal scrolls past healthchecks; once everything settles you'll see:

Reference snippettext
agent-svc-1        | INFO ... agent-svc started: agent=infra-agent tools=0 ...
gateway-svc-1      | INFO ... models_config_loaded: default=gemini-2.5-flash count=13
auth-broker-svc-1  | INFO ... auth-broker-svc started
web-ui-svc-1       | nginx: ... listening on 0.0.0.0:8080

Smoke-test from another terminal:

Code samplebash
curl -s http://localhost:8081/oauth/jwks | python3 -m json.tool | head
curl -s http://localhost:8082/health

The first prints the auth-broker's JWKS: a JSON-formatted RSA public key. The second prints {"status":"degraded","services":{"gateway":"ok",...}}: the gateway's aggregated health check now talks to every other service.

First chat: machine token, Gemini path

The platform has six identity types, but for a quick test we'll use the simplest: a service-integration JWT. The repo ships a helper script:

Code samplebash
TOKEN=$(PUBLIC_URL=http://localhost:8081 \
  ./scripts/get-non-human-token.sh local-svc local-svc-secret)
echo "$TOKEN" | cut -c1-80

(You'll need to have registered local-svc first: ./scripts/register-oauth-client.sh local-svc --type svc --secret local-svc-secret. The script writes to local-secrets/.)

Then send a chat:

Code samplebash
curl -s -X POST http://localhost:8082/v1/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "agent": "infra-agent",
    "messages": [{"role": "user", "content": "What is a VPC peering in one sentence?"}],
    "usage": true
  }' | python3 -m json.tool

A second or two later:

Config snippetjson
{
  "content": [{"type": "text", "text": "VPC peering creates a private network connection..."}],
  "usage": {
    "model": "gemini-2.5-flash",
    "input_tokens": 1370,
    "output_tokens": 38,
    "charged_tokens": 1408
  }
}

The "usage": true field opts the response into the usage block. Without it, the gateway strips usage from the response by default (the gateway still records it server-side for quota purposes; the opt-in is just about whether the caller wants to see it). 1,370 input tokens is mostly the system prompt: ~1,500 tokens of "you are the infra-agent" plus the agent-JSON suffix; the user message itself is closer to 10. The model field confirms what served the request.

Switching providers, observing the failure modes

The whole point of the provider abstraction is that switching is a one-field change. Re-run the same curl with "model": "claude-haiku-4-5-20251001":

If you seeded the Anthropic key, you get a 200 with "model": "claude-haiku-4-5-20251001" in the usage block. The system prompt token count goes up by ~30 (different tokenizer), but every other response field has the same shape. That is the canonical Anthropic-shaped response dict every provider returns.

If you didn't seed it:

Config snippetjson
{
  "error": "provider_not_configured",
  "message": "Model 'claude-haiku-4-5-20251001' requires the Anthropic provider, but ANTHROPIC_SECRET_NAME is unset."
}

HTTP 503. This is the platform's deliberate behaviour: a missing provider isn't a 500 (server error) or a 400 (caller's fault). It is a 503, telling the caller "this deployment does not currently support this model id". A web UI surfaces this as a clear configuration message. A pipeline retries with a fallback model. An ops user knows immediately what tfvar to flip.

The third case worth seeing is when the model id matches nothing in the registry:

Code samplebash
curl -s -X POST http://localhost:8082/v1/chat ... -d '{"model": "gpt-5", ...}'
Config snippetjson
{"error": "unknown_model"}

HTTP 400. The gateway validates the model against models_config.json (loaded from a Secret Manager secret in prod, a mounted file locally) before forwarding. Unknown ids don't reach agent-svc; the platform tells the caller upfront that the model isn't supported.

These three responses, 200, 503, and 400, let a caller distinguish "all good", "this deployment can't serve this", and "no deployment can serve this". Each maps to a different remediation. That's a small thing that matters a lot when an automated pipeline is on the other end.

The SPA: same auth, same chat, in a browser

Open http://localhost:8080. The landing page has two sign-in buttons: User and Ops. Clicking either kicks off the OAuth Auth Code + PKCE flow: the SPA computes a code verifier in memory, redirects to auth-broker-svc/oauth/authorize, the broker redirects to Google's OAuth screen, you authenticate, Google redirects back to the broker, the broker mints a JWT, the SPA receives it and stores it in sessionStorage (deliberately not localStorage: JWTs are session-scoped, conversations persist across sessions).

If you didn't seed google-oauth-client-secret and GOOGLE_CLIENT_ID, the Google redirect will fail with Google's own error page. That's fine: the SPA can also accept a hand-pasted JWT for local testing. The same get-non-human-token.sh you used above prints a JWT you can paste into the dev-tools console: sessionStorage.setItem('ad_access_token', '<jwt>') and refresh.

Either way, once you're signed in:

  • Chat at /chat: multi-conversation sidebar (per-identity localStorage), model dropdown showing every model in models_config.json (Gemini first, sorted cheapest to most expensive), token-usage footer toggleable per chat.
  • Usage at /usage: the calling identity's daily and weekly token consumption, broken down by input vs output.
  • Clients at /clients (Ops only): read-only view of all registered OAuth clients.

This is the same SPA that runs in production. The only difference is the runtime config, rendered by the nginx entrypoint from compose env vars: AUTH_BROKER_URL, GATEWAY_URL, USER_CLIENT_ID, OPS_CLIENT_ID. The React bundle is identical.

Where the local stack helpfully diverges from production

Five places the local stack is intentionally not like production:

  1. JWT signature verification is skipped in gateway-svc. SKIP_JWT_VERIFICATION=true. The platform decodes the JWT payload and extracts sub and azp, but doesn't check the RS256 signature. This means a hand-rolled HS256 JWT with any signing key (the integration tests use local-test-key-for-integration-tests-only) is accepted. Don't deploy this configuration; the env var has to be unset for production gateway-svc to start.
  2. agent-svc does no Bearer validation locally. In production, Cloud Run IAM enforces caller identity: only callers with roles/run.invoker on the agent-svc get through. In practice, that means the gateway-sa presenting a fresh GCP OIDC token whose audience matches the agent-svc URL. Locally there's no Cloud Run IAM, so agent-svc accepts the request unchecked. The forwarder still attaches a Bearer header (set via LOCAL_AGENT_SVC_BEARER=local-dev-token) so the request shape is identical to production, but no code on the agent-svc side reads it.
  3. Workload Identity is bypassed for LLM calls. The Gemini provider falls back to API-key mode (generativelanguage.googleapis.com) instead of Vertex ADC. The agent-svc code is the same. If GEMINI_API_KEY_SECRET is set, it uses that key; otherwise it uses ADC. The env causes the API-key branch locally. Production calls land at aiplatform.googleapis.com instead.
  4. Rate limits are extremely generous. The GUARDRAIL_CONFIG_JSON env var on the gateway container in platform/docker-compose.yml ships 200 requests/minute and effectively infinite daily/weekly token quotas. Edit that block (and docker compose up -d --force-recreate gateway-svc) if you want to test guardrail behaviour locally.
  5. No GCLB, no Cloud Armor, no managed certs. The SPA talks directly to gateway-svc on http://localhost:8082 and auth-broker-svc on http://localhost:8081. CORS is configured for http://localhost:8080 via the UI_ORIGIN env var.

Every other thing in the local stack runs identical to production: the multi-conversation localStorage shape, the per-provider summary model auto-derivation, the prompt-caching cache_control annotations, the tool-use loop bound, the agent.yaml parse logic.

Closing the loop

A practical exercise: with the stack running, edit agents/infra-agent/system-prompt.txt, change the first sentence, and run another chat through the gateway. The agent-svc container caches the system prompt for 24 hours by default, so you'll need to docker compose restart agent-svc for the change to take effect. In production, the same hot-reload happens automatically every 60 seconds: a gcloud secrets versions add to the infra-agent-system-prompt secret rolls out to every instance without a Cloud Run revision. That's the model: locally fast iteration with a clear restart signal, in production hands-off rotation.

This is what "local dev runs the production code paths" buys you: the muscle memory transfers. When you eventually push to GCP in post 3, the only new thing is the deployment plumbing. The platform behaviour you tuned on your laptop is the same platform behaviour serving real traffic.

Next: From localhost to live HTTPS on GCP with one terraform apply.

For the full local-dev runbook with troubleshooting, see docs/build-with-me.md stages 1–5.

Was this useful?
Share

The Engineering Notebook

Once a month, a long read on what we're learning building governed AI for regulated enterprises. No hot takes, no roundups.

Prefer to talk it through?

Raghu Vennam

Guest Contributor

Guest contributor to Bugni Labs field notes, writing about agentic AI platform architecture, GCP, and production operations.

Related case studies