All posts

Claude apps gateway brings enterprise controls to GCP

Manaal KhanJuly 2, 2026 at 12:47 AM5 min read
Claude apps gateway brings enterprise controls to GCP

Key Takeaways

  • The Claude apps gateway removes per-developer credential management by routing all Claude Code traffic through a single Cloud Run service identity
  • Spend limits can be set per user, group, or org with automatic 429 responses when caps are hit
  • Policy enforcement happens server-side, making local managed-settings.json edits ineffective

Anthropic has released a Claude apps gateway for Google Cloud that solves the biggest headache of rolling out Claude Code across an organization: per-developer credential management. The gateway sits between local Claude Code clients and GCP, centralizing identity, policy, spend controls, and telemetry in a single self-hosted service.

Individual developers have been able to point Claude Code at Vertex AI for a while. Set CLAUDE_CODE_USE_VERTEX=1, grant the aiplatform.user role, and inference stays inside your GCP perimeter. Simple. But scaling that to hundreds of engineers means distributing service account keys, pushing config files over MDM, and having zero visibility into who's burning through tokens.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_FY2cRbt.max-1200x1200.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/1_FY2cRbt.max-1200x1200.png
Advertisement

What does the gateway actually do?

The gateway handles five jobs that platform teams otherwise cobble together from scratch: identity, policy, telemetry, spend limits, and routing.

For identity, sign-in requests route through your existing IdP. Google Workspace or any OIDC provider. The gateway exchanges the token for a short-lived session. No service account keys or API keys land on developer laptops. Onboarding means adding someone to an IdP group. Offboarding means removing them, and their next session refresh fails immediately.

Policy lives in a single gateway.yaml file, resolved per group and enforced server-side. The gateway checks availableModels on every /v1/messages call. Developers can edit their local managed-settings.json all they want. It changes nothing. Rule updates propagate to the entire fleet within the hour.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_MvuTCiS.max-700x700.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/2_MvuTCiS.max-700x700.png

Telemetry ships over OTLP/HTTP to whatever collector you run. Cloud Monitoring, Grafana, Datadog. The key difference: every claude_code.token.usage metric carries the verified email and groups from the session JWT. Not the spoofable OTEL_RESOURCE_ATTRIBUTES that clients can set themselves.

Spend limits get set per user, group, or org through an admin API. The gateway meters tokens against a Cloud SQL ledger and returns a 429 when someone hits their cap. One caveat: costs calculate at list price. Committed-use discounts and negotiated rates don't factor in. Treat it as a runaway-usage guardrail, not a billing reconciliation tool.

How does the architecture work?

A developer's local (or deployed) Claude process sends inference traffic to the gateway over HTTPS. The gateway runs as a stateless container on Cloud Run. It validates its own session bearer, checks policy, and forwards the request to Agent Platform using the Cloud Run service account.

Cloud SQL holds two things: device-code sign-in state and the spend ledger. Google Workspace gets contacted only at sign-in and token refresh, not on every request. Inference stays in your GCP project. Quota, DPA, billing all unchanged.

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_nlczWOp.max-1100x1100.png
https://storage.googleapis.com/gweb-cloudblog-publish/images/3_nlczWOp.max-1100x1100.png

For routing, you can set region: global for Agent Platform's global endpoint or add multiple upstreams entries to fail over on 5xx, 429, or timeout in list order.

Advertisement

Setting up the gateway on GCP

The deployment breaks into two steps. First, provision the GCP foundation: enable Agent Platform, Cloud SQL, and Secret Manager APIs. Create a claude-gateway service account with roles/aiplatform.user. Spin up a small Cloud SQL Postgres instance for state.

The gateway authenticates to Agent Platform as the Cloud Run service identity. You don't create a service account key. Then create a new OAuth client (Web application type) in the Google Cloud console. The gateway authenticates developers against Google Workspace as an OIDC relying party, and this client issues the client_id and client_secret for that handshake. Those values feed the oidc block in gateway.yaml.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/image1_mZfc8Bn.gif
https://storage.googleapis.com/gweb-cloudblog-publish/original_images/image1_mZfc8Bn.gif

Second, configure the gateway itself. Write gateway.yaml pointing at your Google Workspace OIDC client, the Postgres connection string, and Agent Platform as the upstream. Store it in Secret Manager along with the OIDC client secret, the Postgres URL, and a JWT signing key.

The full gcloud command sequence and complete gateway.yaml reference live in Anthropic's Claude apps gateway on Google Cloud docs.

Who should deploy this?

Teams with more than a handful of Claude Code users. The individual developer flow works fine for small teams where everyone manages their own credentials. Once you're distributing configs over MDM and trying to track who's using what, the overhead tips toward running the gateway.

Regulated industries will care about the identity flow. No credentials on laptops means less surface area for credential theft. The spend caps matter less for compliance than for finance teams who've watched AI costs balloon without attribution.

ℹ️

Logicity's Take

This release reflects where enterprise AI tooling is headed: managed infrastructure that looks like traditional SaaS but runs in your cloud. The comparison here is with tools like Copilot for Business, which handles identity through Microsoft Entra but doesn't give you the same routing and spend control flexibility. Anthropic is betting that platform teams want control, even at the cost of running their own gateway. The Cloud SQL dependency adds operational overhead, but it's a fair trade for organizations that already run Postgres workloads. Pricing for the gateway itself isn't specified. It ships with the claude binary, suggesting no separate license, but inference costs still flow through your Vertex AI commitment.

Frequently Asked Questions

Does the Claude apps gateway require a separate license?

The gateway ships with the standard claude binary. Inference costs still apply through your Vertex AI billing, but no separate gateway license has been announced.

Can the gateway work with non-Google identity providers?

Yes. While the examples use Google Workspace, the gateway supports any OIDC-compliant identity provider.

How accurate are the spend limits?

Spend limits calculate at list price. Committed-use discounts and negotiated rates don't factor in, so treat them as guardrails against runaway usage rather than exact billing reconciliation.

What happens if the gateway goes down?

The gateway is stateless and runs on Cloud Run, so it scales and recovers automatically. However, Cloud SQL availability affects sign-in state and spend ledger access.

Also Read
6 GitHub security settings every maintainer should enable now

Covers essential security configurations for developer tool infrastructure

ℹ️

Need Help Implementing This?

Deploying the Claude apps gateway requires coordinating GCP permissions, OIDC configuration, and Cloud SQL setup. If your platform team needs guidance on enterprise AI infrastructure, reach out to our technical advisory at contact@logicity.in.

Source: Cloud Blog

Advertisement
M

Manaal Khan

Tech & Innovation Writer

Produced with AI assistance and reviewed by the Logicity editorial team. Learn more in our Editorial Policy.

Related Articles