Why Shared Secrets Are the Biggest Security Risk in Multi-Tenant AI Infrastructure (And How to Eliminate Them)
Shared secrets in multi-tenant AI infrastructure create cascading breach risk. One compromised token exposes every organization on the platform. Here is how per-org cryptographic isolation, transit encryption, zero-downtime key rotation, and O(1) key lookup eliminate the problem.
A breach that should have been contained
In September 2024, a CI/CD pipeline at a mid-size SaaS company leaked an environment variable. The variable was API_SECRET — a single shared token used to authenticate communication between their AI proxy and their control plane API. Every tenant on the platform used the same secret. The attacker did not need to compromise individual organizations. They had the master key.
Within hours, the attacker was issuing authorized requests on behalf of arbitrary tenants. They could read telemetry data, extract provider API keys from authorization responses, and submit requests to upstream LLM providers billed to other organizations. The breach affected all 340 tenants. Not because 340 systems were compromised — because one secret was shared across all of them.
This is the shared secret problem. It is not theoretical. It is the default architecture of most multi-tenant AI platforms today.
The shared secret problem in multi-tenant architectures
A shared secret is a single credential used to authenticate communication between two systems across all tenants. In multi-tenant AI infrastructure, this typically looks like a static API key or token that:
- Authenticates every proxy instance to the control plane
- Is the same value for every organization on the platform
- Lives in environment variables, CI/CD configs, and deployment manifests
- Never changes because rotating it means coordinating downtime across every tenant
The security model is binary: you either have the secret (full access to every tenant) or you do not (no access). There is no middle ground. There is no per-tenant scoping. There is no way to revoke access for one compromised proxy instance without revoking access for all of them.
Why this architecture exists
Shared secrets are easy. A single API_SECRET environment variable is trivial to configure. One value in the deployment manifest, one header check in the middleware, done. When you are building an MVP, this is the obvious choice. The problem is that it stays the obvious choice long after the platform has grown past the point where it is safe.
The blast radius problem
In traditional web applications, a compromised API key typically exposes one user’s data. In multi-tenant AI infrastructure with shared secrets, a compromised authentication token exposes:
- Every organization’s telemetry data — request logs, cost data, agent behavior
- Every organization’s provider API keys — the real OpenAI, Anthropic, and Google credentials stored in the control plane
- Every organization’s billing — the ability to make upstream API calls charged to any tenant
- Every organization’s policies — the ability to read (and potentially modify) governance rules
The blast radius is not “one tenant.” It is “the entire platform.”
Why AI infrastructure is especially vulnerable
AI infrastructure has characteristics that make shared secret vulnerabilities more severe than in traditional SaaS applications.
API keys are high-value targets
Provider API keys (OpenAI, Anthropic, Google) are direct-spend credentials. A compromised OpenAI key is not just an access token — it is a credit card. An attacker with a valid API key can generate thousands of dollars in charges per hour. Unlike a compromised social media account, there is immediate, measurable financial damage.
Agents run autonomously
AI agents operate without human oversight by design. A compromised proxy token does not require social engineering or human interaction to exploit. The attacker can automate requests at machine speed, against any tenant, without triggering the behavioral anomalies that human-facing systems use for detection.
Transit data is uniquely sensitive
When a proxy requests authorization from a control plane, the response includes the provider API key needed to make the upstream call. If that authorization channel uses a shared secret and the response transmits the provider key in plaintext, a single MITM or log exposure leaks credentials for any tenant. The authorization response is the most sensitive payload in the entire system.
Credential density is high
A typical multi-tenant AI proxy stores credentials for multiple providers per tenant — OpenAI, Anthropic, Google, Cohere, Mistral. A platform with 100 tenants and 3 providers each holds 300 provider API keys. One shared secret protects all 300. The ratio of “credentials at risk” to “credentials needed to compromise” is extreme.
Solution 1: Per-org proxy authentication
The first step is eliminating the shared secret entirely. Instead of one token that authenticates every proxy instance, each organization gets its own cryptographically unique proxy token.
How it works
When an organization provisions its proxy on the Govyn platform, the system generates a unique token with a gvp_ prefix (Govyn Proxy):
gvp_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE
This token is:
- Generated from 32 cryptographically random bytes (256 bits of entropy)
- Bcrypt-hashed before storage — the plaintext is shown once at provisioning time, then discarded
- Scoped to a single organization — it authenticates the proxy for that org and no other
- Paired with the organization’s proxy slug — both the token and slug must match for authentication to succeed
The proxy sends two headers on every request to the control plane:
X-Govyn-Proxy-Token: gvp_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE
X-Govyn-Proxy-Slug: acme-corp-7f3a2b
The control plane looks up the organization by slug, retrieves the stored bcrypt hash, and verifies the token. If the token is invalid or the slug does not exist, the request is rejected. If both match, the request proceeds with the organization’s context already resolved — no ambiguity about which tenant the request belongs to.
What changes for the operator
Before per-org auth, the proxy configuration looked like this:
# BEFORE: Shared secret (every org uses the same value)
proxy:
control_plane: https://api.govyn.cloud
auth:
secret: ${API_SECRET} # Same value for all 340 tenants
After per-org auth:
# AFTER: Per-org token (unique per organization)
proxy:
control_plane: https://api.govyn.cloud
auth:
token: ${GOVYN_PROXY_TOKEN} # Unique to this org
slug: ${GOVYN_PROXY_SLUG} # Unique to this org
The configuration change is minimal. The security improvement is fundamental.
Blast radius after per-org auth
If a proxy token is compromised:
| Impact | Shared secret | Per-org token |
|---|---|---|
| Organizations exposed | All (340) | 1 |
| Provider keys at risk | All (1,020) | 3 (one org’s providers) |
| Billing exposure | Platform-wide | Single org |
| Revocation impact | Platform outage | One org re-provisions |
| Detection difficulty | High (legitimate-looking cross-org traffic) | Low (anomalous single-org traffic) |
Compromise of one proxy token affects one organization. The other 339 are unaffected. Revocation means re-provisioning one token, not coordinating a platform-wide rotation.
Backward compatibility
Existing deployments using the shared secret continue to work during the migration period. The middleware accepts both authentication methods:
- Per-org token (preferred): Full tenant isolation, org context resolved from token
- Shared secret (deprecated): Logs a deprecation warning on every request to track migration progress
The deprecation warning creates operational pressure to migrate without breaking existing deployments. Operators see the warnings in their logs and can migrate at their own pace.
Solution 2: Transit encryption with AES-256-GCM
Per-org authentication solves the “who is this request from” problem. Transit encryption solves the “what if someone intercepts the response” problem.
The plaintext transit problem
When a proxy requests authorization from the control plane, the response includes the provider API key needed to call the upstream LLM. In a naive implementation, that key is in the response body as plaintext:
{
"decision": "allow",
"providerApiKey": "sk-proj-abc123..."
}
This plaintext response is vulnerable at multiple points:
- Log aggregation: If the control plane logs response bodies (common in debugging), provider keys appear in log storage
- Network inspection: Any intermediate proxy, load balancer, or monitoring tool that inspects HTTP response bodies sees the key
- Memory dumps: Process crash dumps or heap snapshots capture the plaintext response
- CDN or reverse proxy caching: A misconfigured cache layer could store and serve the response to other requests
Even with TLS on the wire, the plaintext key exists in memory at both endpoints and in any system that processes the HTTP response.
How transit encryption works
Instead of transmitting the provider API key in plaintext, the control plane encrypts it with AES-256-GCM before including it in the response:
{
"decision": "allow",
"encryptedProviderKey": {
"ciphertext": "xK7mP2...",
"iv": "a9Bf3Q...",
"tag": "R4nL8w..."
}
}
The encryption uses a dedicated PROXY_TRANSIT_KEY — a 256-bit key (64 hex characters) shared between the control plane and the proxy runtime. This key is distinct from the ENCRYPTION_KEY used for at-rest encryption of stored provider keys. The separation matters:
- ENCRYPTION_KEY: Encrypts provider API keys stored in the database. Used by the control plane only. Rotated independently.
- PROXY_TRANSIT_KEY: Encrypts provider API keys in flight between control plane and proxy. Used by both. Rotated independently.
Compromising one does not compromise the other. A database breach that exposes ENCRYPTION_KEY does not help an attacker intercept transit data. A network-level attack that exposes PROXY_TRANSIT_KEY does not help decrypt the database.
AES-256-GCM specifics
The implementation uses AES-256-GCM (Galois/Counter Mode) with:
- 256-bit key derived from a 64-character hex string
- 96-bit (12-byte) random IV generated per encryption operation
- 128-bit (16-byte) authentication tag for integrity verification
GCM is an authenticated encryption mode. It provides both confidentiality (the ciphertext cannot be read without the key) and integrity (the ciphertext cannot be modified without detection). If an attacker tampers with the ciphertext, IV, or tag, decryption fails with an authentication error rather than producing corrupted output.
The proxy decrypts the provider key in memory immediately before making the upstream API call, and discards it after the response is received. The plaintext provider key exists in proxy memory only for the duration of the upstream request.
Why not just rely on TLS?
TLS protects data in transit on the wire. Transit encryption protects data at every point where it exists as an HTTP response body — in application logs, in monitoring systems, in crash dumps, in response caches. TLS terminates at the endpoint. Transit encryption persists until the intended recipient explicitly decrypts.
This is defense in depth. TLS handles the network layer. Transit encryption handles everything else.
Solution 3: Zero-downtime encryption key rotation
Encryption keys must be rotated. Compliance frameworks require it (SOC 2, PCI DSS, HIPAA). Security best practices require it. Incident response requires it. The question is whether rotation requires downtime.
The rotation problem
In a single-key system, rotating the encryption key means:
- Generate a new key
- Re-encrypt every stored value with the new key
- Deploy the new key to all services
- Remove the old key
Steps 2 and 3 must happen atomically. If the new key is deployed before all values are re-encrypted, decryption fails for values still encrypted with the old key. If re-encryption finishes before deployment, the system is temporarily using the wrong key.
For a platform with 300 stored provider API keys across 100 tenants, the re-encryption step alone takes measurable time. During that window, the system is in an inconsistent state.
Dual-key rotation
Govyn’s encryption module supports two keys simultaneously:
- ENCRYPTION_KEY: The current (active) key. All new encryptions use this key.
- ENCRYPTION_KEY_PREVIOUS: The previous key. Kept for decrypting data that has not been re-encrypted yet.
Every encrypted value includes a keyVersion field that records which key version was used to encrypt it. When decrypting:
- Look up the key matching the value’s
keyVersion - Attempt decryption with that key
- If it fails, try the other key as a fallback safety net
- If both fail, the data is genuinely unrecoverable (throw an error)
The rotation procedure becomes:
- Generate a new key
- Set
ENCRYPTION_KEY_PREVIOUSto the currentENCRYPTION_KEY - Set
ENCRYPTION_KEYto the new key - Increment
ENCRYPTION_KEY_VERSION - Deploy — zero downtime, all existing encrypted data remains readable
- Run a background re-encryption job at your convenience
- Once all values are re-encrypted with the new key, remove
ENCRYPTION_KEY_PREVIOUS
Steps 2-4 are a configuration change. Step 5 is a normal deployment. Step 6 can run over hours or days without impacting service availability. There is no window where decryption fails.
Rotation in practice
# Current state
ENCRYPTION_KEY=aabbccdd... # 64 hex chars, version 1
# Rotation
ENCRYPTION_KEY_PREVIOUS=aabbccdd... # Old key moves here
ENCRYPTION_KEY=11223344... # New key
ENCRYPTION_KEY_VERSION=2 # Increment
# Deploy. Done. No downtime. No re-encryption required immediately.
New encryptions use version 2. Old data encrypted with version 1 decrypts using ENCRYPTION_KEY_PREVIOUS. The system is fully operational at every step.
Why keyVersion matters
Without version tracking, the decryption code would need to try both keys on every decryption attempt. With version tracking, it goes directly to the correct key on the first try. The fallback path exists as a safety net, not as the primary code path.
This is important for performance when the system handles thousands of decryption operations per minute. One key derivation and one decryption attempt per operation, not two.
Solution 4: O(1) API key lookup
API key validation is on the critical path of every proxy request. Every agent call that flows through the proxy must validate the API key before proceeding. The performance of this validation directly impacts request latency.
The bcrypt iteration problem
Govyn API keys are bcrypt-hashed before storage. Bcrypt is deliberately slow — that is its purpose. A single bcrypt.compare() operation takes 50-100ms depending on hardware and cost factor.
In the original implementation, validating an API key meant:
- Receive the token from the request
- Query all non-revoked API keys for the organization
- Iterate through each key, running
bcrypt.compare()on each one - Return the first match, or reject if none match
For an organization with 20 API keys, this is 20 sequential bcrypt comparisons in the worst case (when the key does not match any). At 75ms per comparison, that is 1.5 seconds of CPU time spent on authentication alone — on every request.
# Worst case: O(n) bcrypt comparisons
Token arrives -> Load all org keys -> bcrypt.compare(token, key1) -> no match
-> bcrypt.compare(token, key2) -> no match
-> bcrypt.compare(token, key3) -> no match
-> ... (repeat for all n keys)
-> bcrypt.compare(token, key20) -> match (or reject)
Time: n * ~75ms = 1,500ms for n=20
This does not scale. Organizations with more keys, or platforms with high request volumes, hit a wall where authentication dominates request latency.
Prefix-based O(1) lookup
The solution is an 8-character prefix stored in plaintext alongside the bcrypt hash. When a new API key is generated:
Full key: gvn_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE...
Prefix: A7x9Kp2m (characters 5-12, after "gvn_")
Hash: $2a$10$xK7mP2... (bcrypt hash of full key)
The prefix is stored in a database column with an index. Validation becomes:
- Receive the token from the request
- Extract the 8-character prefix (positions 4-12)
- Query for a single key matching that prefix:
WHERE keyPrefix = 'A7x9Kp2m' AND revokedAt IS NULL - Run one
bcrypt.compare()to verify the full token against the stored hash
# O(1): One database lookup + one bcrypt comparison
Token arrives -> Extract prefix "A7x9Kp2m"
-> SELECT WHERE keyPrefix = 'A7x9Kp2m' (indexed, O(1))
-> bcrypt.compare(token, candidate.hash) -> match or reject
Time: ~1ms (query) + ~75ms (bcrypt) = ~76ms regardless of key count
One bcrypt comparison instead of up to 20. The time is constant regardless of how many keys an organization has.
Is the prefix a security risk?
The prefix is 8 characters from a base64url character set (64 possible characters per position). That is 64^8 = approximately 281 trillion possible prefixes. An attacker who knows the prefix still needs to brute-force 24 remaining characters of the key (also base64url). The prefix narrows the search space for “which database row to compare against,” not the search space for “what is the key.”
The bcrypt hash remains the security boundary. The prefix is an index, not a credential.
Legacy key migration
Keys created before the prefix column was added have keyPrefix = NULL. The validation code handles this with a fallback path:
- Look up by prefix — if found, single bcrypt compare
- If no prefix match, query all keys with
keyPrefix IS NULLand iterate with bcrypt (legacy behavior)
As old keys are revoked and replaced with new ones, the legacy fallback path handles fewer and fewer keys until it processes none. No migration script required — natural key lifecycle handles the transition.
The complete security architecture
Here is how all four solutions work together on a single proxy request:
Request flow
1. Agent sends request to proxy
Headers: Authorization: Bearer gvn_A7x9Kp2m...
Body: { "model": "gpt-4o", "messages": [...] }
2. Proxy validates agent API key (O(1) prefix lookup)
Extract prefix "A7x9Kp2m" -> DB lookup -> single bcrypt compare -> valid
3. Proxy requests authorization from control plane
Headers: X-Govyn-Proxy-Token: gvp_... (per-org token)
X-Govyn-Proxy-Slug: acme-corp-7f3a2b
4. Control plane authenticates proxy (per-org bcrypt verification)
Slug lookup -> bcrypt compare token hash -> valid, org = acme-corp
5. Control plane evaluates policies for acme-corp
Budget check, rate limit, content filter, approval rules -> allow
6. Control plane decrypts provider API key (versioned AES-256-GCM)
Read encrypted key from DB -> decrypt with ENCRYPTION_KEY (version 2)
7. Control plane re-encrypts provider key for transit (AES-256-GCM)
Encrypt with PROXY_TRANSIT_KEY -> include in response as encryptedProviderKey
8. Proxy receives response, decrypts provider key
AES-GCM decrypt with PROXY_TRANSIT_KEY -> plaintext provider key in memory
9. Proxy makes upstream API call to OpenAI/Anthropic
Authorization: Bearer sk-proj-... (decrypted, never stored on proxy)
10. Proxy discards provider key from memory after response received
At no point does a shared secret authenticate any cross-tenant operation. At no point does a provider API key transit the network in plaintext. At no point does a key validation require iterating through all stored keys.
Before and after
| Concern | Before (v1.1) | After (v1.2) |
|---|---|---|
| Proxy-to-API auth | Single shared API_SECRET | Per-org gvp_ token (bcrypt-hashed) |
| Blast radius of compromise | All tenants | Single tenant |
| Provider key transit | Plaintext in HTTP response | AES-256-GCM encrypted |
| Key rotation | Full platform downtime | Zero-downtime dual-key |
| API key validation | O(n) bcrypt iteration | O(1) prefix lookup + single bcrypt |
| Revocation scope | Platform-wide or nothing | Per-organization |
| Auth method tracking | None | authMethod field on every request |
| Migration path | N/A | Shared-secret fallback with deprecation warnings |
Implementation details
Environment variables
The v1.2 security architecture introduces three encryption-related environment variables:
# At-rest encryption of stored provider API keys (BYOK)
ENCRYPTION_KEY= # 64-char hex (32 bytes), current version
ENCRYPTION_KEY_PREVIOUS= # 64-char hex, previous version (for rotation)
ENCRYPTION_KEY_VERSION= # Integer, defaults to 1
# Transit encryption between control plane and proxy
PROXY_TRANSIT_KEY= # 64-char hex (32 bytes), separate from ENCRYPTION_KEY
Generate keys with:
# Generate a 256-bit key as 64 hex characters
openssl rand -hex 32
Proxy configuration
# govyn.yaml (proxy instance for acme-corp)
proxy:
control_plane: https://api.govyn.cloud
auth:
token: ${GOVYN_PROXY_TOKEN} # gvp_ token, unique to this org
slug: ${GOVYN_PROXY_SLUG} # e.g., acme-corp-7f3a2b
Key rotation checklist
- Generate new key:
openssl rand -hex 32 - Set
ENCRYPTION_KEY_PREVIOUS= currentENCRYPTION_KEY - Set
ENCRYPTION_KEY= new key - Increment
ENCRYPTION_KEY_VERSION - Deploy to all control plane instances
- Verify: check logs for successful decryption with both key versions
- Run background re-encryption of stored values (optional, at your pace)
- After re-encryption: remove
ENCRYPTION_KEY_PREVIOUS
Compliance mapping
| Control | Implementation |
|---|---|
| SOC 2 CC6.1 (Logical access) | Per-org proxy tokens, bcrypt-hashed |
| SOC 2 CC6.7 (Encryption) | AES-256-GCM at-rest and in-transit |
| PCI DSS 3.5 (Key management) | Dual-key rotation, version tracking |
| PCI DSS 3.6 (Key rotation) | Zero-downtime rotation procedure |
| HIPAA 164.312(a)(1) (Access control) | Per-tenant cryptographic isolation |
| HIPAA 164.312(e)(1) (Transmission security) | Transit encryption of provider keys |
Key takeaways
-
Shared secrets are a platform-wide vulnerability. One token authenticating all tenants means one compromise exposes everyone. Replace shared secrets with per-tenant credentials.
-
Transit encryption is not redundant with TLS. Provider API keys in HTTP response bodies are vulnerable in logs, crash dumps, caches, and monitoring systems. Encrypt sensitive response payloads independently.
-
Key rotation should not require downtime. Dual-key support with version tracking lets you rotate encryption keys with a configuration change and a normal deployment. No re-encryption window, no service interruption.
-
Authentication performance matters at proxy scale. O(n) bcrypt iteration does not scale with key count. Prefix-based lookup reduces every validation to one database query and one bcrypt comparison, regardless of how many keys exist.
-
Defense in depth applies to AI infrastructure. Per-org auth, transit encryption, versioned at-rest encryption, and O(1) key lookup are four independent layers. Compromising one does not compromise the others.
FAQ
Does per-org authentication add latency to every proxy request?
The proxy token is verified with a single database lookup (by slug) and one bcrypt comparison. This adds approximately 75-100ms to the authorization request. Since the proxy already makes an authorization call to the control plane on every request, the overhead is the bcrypt comparison itself — not a new network round trip. For context, the upstream LLM API call typically takes 500-5,000ms. The authentication overhead is negligible relative to the total request latency.
What happens if I lose my proxy token?
Proxy tokens are shown once at provisioning time — they are not stored in plaintext anywhere. If you lose the token, you re-provision your proxy, which generates a new slug and token. Your agents continue working; only the proxy’s control plane authentication changes. Agent-facing API keys (the gvn_ tokens) are unaffected.
Can I use the same ENCRYPTION_KEY for transit and at-rest encryption?
You can, but you should not. Using separate keys (ENCRYPTION_KEY and PROXY_TRANSIT_KEY) provides isolation. A compromise of one key does not affect data protected by the other. If your transit key is exposed through a proxy instance compromise, your stored provider keys in the database remain encrypted with a different key that the attacker does not have.
How does this work with self-hosted Govyn?
Self-hosted Govyn (the open-source version) runs the proxy directly with provider API keys in its own configuration — there is no control plane authorization step. The transit encryption and per-org auth features apply to the cloud-hosted Govyn platform, where a centralized control plane manages multi-tenant proxy instances. Self-hosted users benefit from the O(1) key lookup improvement if they use the API key management features.
Is the 8-character prefix long enough for uniqueness?
Eight base64url characters provide 64^8 (approximately 281 trillion) possible values. The probability of a collision between two keys is negligible even at millions of keys. If a collision did occur, the system falls back to bcrypt comparison against both candidates — still far fewer comparisons than iterating all keys. In practice, prefix collisions have not been observed.
Further reading
- Proxy vs SDK: Why Architecture Matters for AI Agent Governance — the architectural foundation that makes per-org isolation possible
- Budget Control Policies — per-agent budget enforcement through the proxy
- Compliance Audit Policies — audit trail and compliance features
- Production Safety — safety policies for production agent deployments
Govyn is an open-source API proxy for AI agent governance. Per-org cryptographic isolation ships in v1.2. MIT licensed. Self-host or cloud-hosted.