From Localhost to Live HTTPS on GCP with One Terraform Apply
How agent-fabric moves from localhost to live HTTPS on GCP with Terraform-managed Cloud Run, load balancing, secrets, DNS, and observability.
From localhost to live HTTPS on GCP with one terraform apply
Two stages, zero manual GCP Console clicks, and an LLM-powered agent reachable at https://api-stage.example.com/v1/chat in roughly twenty minutes.
Code: github.com/FintelligenX/agent-fabric
In post 1 we walked through the design tenets. In post 2 we ran the whole stack on a laptop. This post takes it to production-equivalent infrastructure: a real GCP project, a real domain, a real Cloud HTTPS Load Balancer with a managed certificate, and Cloud Run services running the same container images you just exercised locally.
The shape of the deploy story is the part of agent-fabric most teams find counter-intuitive: there are no deploy scripts. Not deploy.sh, not Makefile, not a Helm chart. There is a terraform/ directory and a docker push. The whole creation, configuration, secret seeding, IAM binding, DNS A-record provisioning, managed-cert lifecycle, and traffic routing is two terraform apply invocations and three gcloud secrets versions add commands.
This isn't laziness. It's a deliberate property: the GCP project is bootstrapped by Terraform, not the other way around. The Terraform state is the canonical record of every resource that exists. There is no clickops drift to debug. When someone says "what's running in stage?", the answer is terraform state list, not "let me check the Console".
By the end of this post:
- A GCP project created and configured by Terraform
- Four Cloud Run services (gateway-svc, auth-broker-svc, agent-svc, web-ui-svc) running pinned to image SHAs
- A Global HTTPS Load Balancer fronting them on
api-stage.<your-domain>andweb-stage.<your-domain> - Cloud Armor WAF, Cloud Monitoring custom metrics, structured Cloud Logging
- One real chat request reaching Vertex AI Gemini over Private Google Access
- Total monthly cost in stage: roughly £85, dominated by Memorystore Redis and the load balancer
Why Terraform-only, and why two stages
Two-stage applies aren't an accident of the project's history. They reflect a real ordering constraint: shared infrastructure has to exist before per-service infrastructure can reference it. Memorystore Redis has to be provisioned before gateway-svc can connect to it; the VPC has to exist before Direct VPC egress can use it; the Artifact Registry repository has to be created before any image can be pushed and then resolved by SHA.
The split is along the natural seam:
modules/platform/: VPC, subnets with Private Google Access enabled, Memorystore Redis (AUTH-enabled), Cloud NAT, Artifact Registry, Cloud DNS managed zones, API enablement (run.googleapis.com,aiplatform.googleapis.com,secretmanager.googleapis.com, etc.), shared IAM service accounts.- Root module:
gateway-svc,auth-broker-svc,agent-svc(viamodules/agent-domain/),web-ui-svc, the GCLB (managed cert, URL map, backend services), Cloud Armor policy, DNS A records.
Apply the platform module first. Push your images to the Artifact Registry it created. Apply the root module: Terraform's data.google_artifact_registry_docker_image resolves the :develop tag to a SHA at plan time, creates Cloud Run services that pin to that SHA, and the GCLB starts routing.
The single most useful property of this split: the platform layer is stable. Once it's applied, you re-run the root apply on every code change without re-validating networking or DNS. The platform module hasn't changed in six months on this project; the root module runs every time someone pushes a new image. The state file growth on the root is much faster than on the platform, which matches reality.
Bootstrap: turning an empty GCP project into the platform
The very first apply needs a project to apply into. agent-fabric supports both paths and lets a single variable decide which one you're on.
Path A: Terraform creates the project. Leave existing_project_id = "". The modules/project/ module creates a new project with a 4-char random suffix, links the billing account, and enables the bare minimum APIs the next module needs. Requires roles/resourcemanager.projectCreator on the parent folder or org plus roles/billing.user on the billing account.
Path B: Bring your own project. Set existing_project_id = "my-project-1234". Terraform skips project creation entirely: project_name, billing_account, folder_id, org_id are all ignored. A data "google_project" lookup substitutes for module.project so downstream resources still get the project number they need for IAM bindings. This is the standalone-project path for users who don't have an org or folder to create projects under.
# terraform/envs/stage/main.tf: the gate
module "project" {
count = var.existing_project_id == "" ? 1 : 0
source = "../../modules/project"
project_name = var.project_name
billing_account = var.billing_account
folder_id = var.folder_id
org_id = var.org_id
}
data "google_project" "existing" {
count = var.existing_project_id == "" ? 0 : 1
project_id = var.existing_project_id
}
locals {
project_id = var.existing_project_id == "" ? module.project[0].project_id : var.existing_project_id
project_number = var.existing_project_id == "" ? module.project[0].project_number : data.google_project.existing[0].number
}Two other knobs follow the same "default off, opt-in when you have the parent infra" pattern: enable_dns_delegation (writes NS records into a separate parent DNS project: off unless you actually own one) and manage_org_policies (writes project-level overrides for Cloud Run ingress/egress org policies: off unless you actually have an enforcing parent org). A stranger cloning the repo gets a self-contained setup that doesn't trip over permissions they don't have.
If your org has restrictive Org Policies (iam.allowedPolicyMemberDomains, compute.requireOsLogin, etc.), the apply will tell you which one needs an exception. The error messages are good; the failure mode is "Terraform says exactly which API or policy is the problem."
cd terraform/envs/stage
terraform init # downloads providers + reads remote state from gs://<project>-tfstate
# Path A only: skipped automatically in Path B
terraform apply -target=module.project
terraform apply -target=module.platformTwo targeted applies (one in Path B), then a final full apply once images are pushed. The whole thing takes about ten minutes the first time: the GCLB managed-cert provisioning is the slow step.
The image-by-SHA deploy cycle
Cloud Run revisions on this platform are pinned to image digests, not tags. This is the single most operationally important deployment decision and it deserves explaining.
A Docker image tag like :develop is mutable. Anyone with push permissions can move it to a new SHA. If your Cloud Run service references agent-svc:develop, then a "redeploy" that happens to result in the same image content does nothing: Cloud Run sees the same tag, doesn't create a new revision, doesn't roll. Worse: if someone else pushes a new :develop while you're applying, your Cloud Run might or might not roll, depending on whether Cloud Run's caching had refreshed.
agent-fabric's images.tf resolves the tag-to-digest mapping at Terraform plan time:
data "google_artifact_registry_docker_image" "agent" {
project = var.project_id
location = "europe"
repository_id = "platform"
image_name = var.agent_image # "agent-svc:develop"
}
module "infra_agent" {
source = "../../modules/agent-domain"
agent_image = data.google_artifact_registry_docker_image.agent.self_link
# → europe-docker.pkg.dev/.../agent-svc@sha256:eb55b228...
...
}When you push a new image and terraform apply, the data source re-reads the tag, gets the new SHA, the Cloud Run resource sees its image field changed, and a new revision is created. The Cloud Run rollout itself takes ~30 seconds. The deploy cycle is exactly: push, apply, wait for revision healthy. Nothing else.
For production this gets pinned harder: tfvars use agent-svc@sha256:eb55b228... directly, never :develop. A prod deploy is "rebuild from main, tag with the resulting SHA, update tfvars to that SHA, apply." Slower (every prod deploy is a tfvars change), safer (no ambient image can drift in).
Stage uses :develop because the convenience of "push and apply" outweighs the immutability story when no real users depend on the deployment. The two-tier convention is intentional: optimise stage for iteration speed, optimise prod for auditability.
Cloud Run service shapes that aren't obvious
The four Cloud Run services each have slightly different ingress and scaling configs. They're worth knowing.
gateway-svc: INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER. The service refuses any HTTP request that doesn't arrive via the GCLB. Even a teammate with roles/run.invoker cannot curl it directly: the Cloud Run front-end rejects the connection at the L7 layer. The only path in is the load balancer with its Cloud Armor policy attached. Stage runs with min-instances = 0 (scale-to-zero to keep idle stage cheap); prod runs with min-instances = 1 (a warm instance to absorb the first request of the day). Both knobs live in each env's terraform.tfvars.
auth-broker-svc: Same INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER. Has no IAM access to gateway-svc or agent-svc. Its only job is to take an OAuth Auth Code or Client Credentials request and return a JWT, plus serve the JWKS that gateway-svc uses to verify those JWTs. The blast radius of compromising the broker is exactly: an attacker can mint JWTs. Without access to the gateway, those JWTs don't gain them anything inside the platform: they have to use them through the public edge like anyone else, where rate limiting, token budgets, and quota tracking all apply.
agent-svc: INGRESS_TRAFFIC_INTERNAL_ONLY. Not behind the load balancer. The only thing that can reach it is the GCLB Serverless NEG... wait, that's not quite right. The only thing that can reach it is the platform's gateway-sa service account presenting a fresh GCP OIDC token whose audience is the agent-svc URL. Direct VPC connections from another Cloud Run service in the same project won't work either. There is no path. This is by design: agent-svc calls the LLM, which means it has the API key. Everything else flows through gateway-svc's validation.
web-ui-svc: Same INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER posture as the API services. It's a static site behind nginx with no secrets in the bundle, but routing browsers through the same GCLB means Cloud Armor and the WAF policy apply to SPA requests too, and nobody can bypass the load balancer to hit the Cloud Run revision directly. allUsers is granted roles/run.invoker so the GCLB's Serverless NEG can forward through. The SPA then turns around and calls the gateway with the user's JWT: back through the same authenticated pipeline.
This is the third tenet from post 1 made concrete: identity flows through every hop, and the identity required at each hop is a different identity. A leaked SPA bundle gains nothing because there are no secrets in it. A leaked broker JWT gains nothing because the gateway still applies rate limits and token budgets. A leaked gateway OIDC token gains the ability to call agent-svc directly, but agent-svc only knows how to do one thing: call the LLM with the platform's API key. Even that worst case is well-contained.
Networking: PGA does most of the work, NAT only the rest
Cloud Run with Direct VPC egress sends all outbound traffic through the VPC. From there, where it goes depends on the destination:
*.googleapis.com: Vertex AI for Gemini, Secret Manager for credentials, Cloud Logging for structured logs. Traffic reaches Google's backbone via Private Google Access. PGA is a subnet-level setting; the platform's subnet enables it in one line of Terraform. No NAT involved, no external IP, never on the public internet.api.anthropic.com(when Anthropic is enabled): non-Google host, traverses Cloud NAT with the platform's static egress IP.
The setup of these two is intentionally asymmetric. Vertex AI traffic is "free" in the sense that PGA has no per-request or per-byte cost. Anthropic traffic goes through Cloud NAT which costs about £25/month per environment: small in absolute terms, but a fixed cost regardless of whether anyone uses the Anthropic path.
This produces a useful optimisation for a Gemini-only deployment: you can decommission Cloud NAT entirely. Set the Cloud Run egress mode from ALL_TRAFFIC to PRIVATE_RANGES_ONLY, remove the NAT module from the platform Terraform, and your platform is now strictly Google-internal for all egress. The trade-off is that any future claude-* request returns a network error instead of 503 provider_not_configured: you've made the deployment definitively Gemini-only at the network layer.
agent-fabric doesn't make this decision automatically. The default terraform.tfvars keeps NAT enabled because (a) the default anthropic_secret_name is still set in stage and (b) the network architecture is the kind of decision that benefits from explicit confirmation. The plumbing is there to turn it off when you're confident: docs/finops.md walks through the steps.
Seeding secrets: three you always need, one you only sometimes need
After the platform module applies, the root apply creates Secret Manager stubs: empty secrets with the right names, IAM bindings, and replication settings. The values themselves are seeded out-of-band:
# OAuth RS256 signing key: always required
openssl genrsa -out /tmp/signing.pem 2048
gcloud secrets versions add oauth-signing-key \
--project=$PROJECT_ID --data-file=/tmp/signing.pem
shred -u /tmp/signing.pem
# Google OAuth client secret: always required (the SPA login flow)
echo -n "GOCSPX-..." | gcloud secrets versions add google-oauth-client-secret \
--project=$PROJECT_ID --data-file=-
# infra-agent system prompt: always required
gcloud secrets versions add infra-agent-system-prompt \
--project=$PROJECT_ID --data-file=agents/infra-agent/system-prompt.txtThat's the full mandatory list for a fresh deployment. No Anthropic key. Gemini on Vertex AI works immediately after this seeding: agent-svc's service account already has roles/aiplatform.user from the Terraform apply, and Vertex Gemini publisher models at location=global are reachable to any project with that role.
The system prompt seeding is the one that catches people. Why isn't this in Terraform? Because the prompt text is the agent's behaviour. It changes more often than the infrastructure, and rotating it shouldn't require a terraform apply. The platform reads the latest version from Secret Manager on a 60-second polling interval and hot-reloads in-place. To update production behaviour: gcloud secrets versions add infra-agent-system-prompt --data-file=..., wait a minute, done. No Cloud Run revision.
(Worth noting: the project's actual Terraform-managed-system-prompt convention is also supported. If agent_config_raw and system_prompt_content are passed as Terraform variables from file("..."), the Terraform apply writes a new secret version. Either pattern works; the polling reload makes them equivalent. The choice is "do you want the prompt in source control with PR review" vs "do you want to edit a single source-of-truth file."Same tradeoff as everywhere else.)
The Anthropic secret stub is conditional. If anthropic_secret_name in tfvars is empty, the Terraform creates no secret and no IAM binding: Gemini-only deployment. If it's non-empty, the stub gets created and you seed it the same way:
ANTHROPIC_SECRET=$(grep '^anthropic_secret_name' terraform.tfvars | grep -oP '"[^"]+"' | tr -d '"')
if [ -n "$ANTHROPIC_SECRET" ]; then
echo -n "sk-ant-..." | gcloud secrets versions add "$ANTHROPIC_SECRET" \
--project=$PROJECT_ID --data-file=-
fiThis is the small but meaningful difference between agent-fabric's stage and a typical prototype's stage: whether Anthropic is enabled is a tfvar, not a code branch. The platform is the same; the deployment makes the choice.
DNS, certs, and the longest wait
The most visible "I just got started with agent-fabric" failure mode is "Terraform applied successfully but the cert hasn't provisioned yet, and https://api-stage.example.com returns SSL errors." This is normal. Managed certificates take 5–60 minutes to provision after DNS A records are reachable. There's no Terraform incantation that speeds this up; it's a Google-side state machine waiting for DNS to propagate and for the cert authority to issue.
The flow:
- The platform module created a Cloud DNS managed zone for your domain:
terraform output dns_name_serversprints fourns-cloud-*.googledomains.com.values. - Configure your registrar to delegate the domain (or subdomain) to those NS servers. This is a one-time clickop outside Terraform: most registrars don't have first-class APIs for this.
- The root module created A records pointing
api-stage.<domain>andweb-stage.<domain>to the GCLB's static IP. - The root module created
google_compute_managed_ssl_certificateresources for each hostname.
Wait. gcloud compute ssl-certificates describe <name> --global --format="value(managed.status)" returns PROVISIONING, PROVISIONING_FAILED_PERMANENTLY, or ACTIVE. You want ACTIVE. The most common failure is "DNS not propagated yet": dig +short api-stage.example.com from your laptop should return the GCLB IP before the cert can succeed.
Once it's ACTIVE, the platform is live. curl https://api-stage.example.com/health works. The SPA at https://web-stage.example.com loads. Real users can sign in with Google. Real chat requests reach real LLMs.
The first production-shaped curl
Same shape as the local one, different URL:
TOKEN=$(PUBLIC_URL=https://api-stage.example.com \
./scripts/get-non-human-token.sh web-chat <client_secret>)
curl -s -X POST https://api-stage.example.com/v1/chat \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"agent": "infra-agent",
"messages": [{"role": "user", "content": "ping"}],
"usage": true
}'Returns a 200, with model: "gemini-2.5-flash" in the usage block and a charged-tokens count between 1300 and 1500. The request went: your laptop → GCLB IP (35.x.x.x) → Cloud Armor → gateway-svc Cloud Run revision → over INTERNAL_LOAD_BALANCER ingress to agent-svc Cloud Run revision → out via Private Google Access to aiplatform.googleapis.com (Vertex AI's generateContent endpoint at location=global) → back the same way. Every hop authenticated, every hop logged, no external API key seeded anywhere in the chain.
The token consumption was recorded against svc:web-chat in Redis. Hit /v1/usage with the same JWT to see the running daily total. Hit it ten thousand times in a day and the gateway will start emitting Cloud Monitoring alerts at 75%, then 90%, then 95% of the daily cap. At 95% new requests get a 429 with a reset-at timestamp; requests already in flight finish to avoid mid-conversation truncation.
Day-two: redeploy without thinking
A code change to agent-svc:
docker build --platform=linux/amd64 \
-f platform/agent-svc/Dockerfile \
-t europe-docker.pkg.dev/$PROJECT_ID/platform/agent-svc:develop .
docker push europe-docker.pkg.dev/$PROJECT_ID/platform/agent-svc:develop
cd terraform/envs/stage
terraform applyThree commands. Terraform notices the resolved SHA changed, plans an in-place update to the google_cloud_run_v2_service resource, applies it (~30 seconds for the Cloud Run revision to spin up + healthcheck), and the new revision starts serving traffic. The old revision drains. No downtime in the steady state.
System prompt change:
gcloud secrets versions add infra-agent-system-prompt \
--project=$PROJECT_ID --data-file=agents/infra-agent/system-prompt.txtOne command. Hot-reloaded in ~60 seconds, no Terraform, no revision rollover.
Tfvar change (e.g. enabling Anthropic):
# edit terraform.tfvars: anthropic_secret_name = "anthropic-api-key"
terraform apply
echo -n "sk-ant-..." | gcloud secrets versions add anthropic-api-key --data-file=-Two commands plus an editor. Apply provisions the secret + IAM, then you seed the value. agent-svc picks up the env-var change on the next Cloud Run revision (Terraform creates one as part of the apply because the env var set changed).
All of these have one thing in common: none of them require remembering anything beyond terraform apply. No "and then update this file in this other place." No "wait, did you also run the deploy script?" No tribal knowledge.
This is what production-ready feels like. The platform's most boring property, that it does what it says on the tin and nothing more, is also its most valuable one.
What you can build on top of this
A few things this same Terraform plus a agents/<name>/agent.yaml would now happily host:
- A second domain agent (a "finance agent", a "support agent"): add
agents/finance-agent/, add amodule "finance_agent" { source = "../../modules/agent-domain" ... }block, add the URL toAGENT_REGISTRY_JSON, apply. One PR. Zero platform-code changes. - An async pipeline integration that writes results to GCS and publishes "result available" on Pub/Sub: the
pipeline:identity type already routes to a different code path ingateway-svc, the platform just needs the GCS bucket and topic in Terraform. - A partner integration over mTLS: separate route on the load balancer, dedicated
partner:identity prefix, isolated log sink for compliance.
These all live as future-work items in the project's docs/todo.md. The point isn't that they're already built. It is that they fit the existing shape. Adding them is engineering, not architecture.
Post 4 digs into what running this looks like once you have real traffic flowing: the provider router internals, the cost-vs-capability decisions, the lifecycle of a second agent shipping alongside the first.
Next: Multi-provider, multi-agent: scaling the agent-fabric in production.
For the day-by-day deploy procedures, see docs/deploy.md. For the cost numbers behind the £85/month stage figure, docs/finops.md.
The Engineering Notebook
Once a month, a long read on what we're learning building governed AI for regulated enterprises. No hot takes, no roundups.
Raghu Vennam
Guest Contributor
Guest contributor to Bugni Labs field notes, writing about agentic AI platform architecture, GCP, and production operations.
Related case studies
- Preparing core banking for a hybrid cloud, "zero data centre" futureFrom static, data-centre-centric platforms to a hybrid cloud strategy with elastic capacity and controlled risk.
- Building a cloud-native payment and data foundation for a new digital bankFrom concept to reference architecture, ISO20022 payments, data services and open banking adapters.
- Cloud-native credit decisioning for a digital-first bankFrom blank sheet to production-grade credit decisioning in four months.
You might also enjoy
10 Days to Hours: Screening Platform
Discover how a major UK bank's screening platform leveraged AI governance in financial services to reduce commercial customer onboarding from 10 days to under 12 hours, with vendor-agnostic architecture and zero unplanned incidents.
Field NoteBuilding an Internal Developer Platform in 4 Months
How we built an internal developer platform that shipped 20 microservices to a UK neobank in 4 months: 12-15 deploys/day, 47-minute lead time, zero incidents.
Field NoteMulti-Provider, Multi-Agent: Scaling the agent-fabric in Production
What changes when agent-fabric has real traffic, multiple LLM providers, a second agent path, and cost controls that need to survive production use.