Why Shared Secrets Are the Biggest Security Risk in Multi-Tenant AI Infrastructure (And How to Eliminate Them)

20 min read

Shared secrets in multi-tenant AI infrastructure create cascading breach risk. One compromised token exposes every organization on the platform. Here is how per-org cryptographic isolation, transit encryption, zero-downtime key rotation, and O(1) key lookup eliminate the problem.


A breach that should have been contained

In September 2024, a CI/CD pipeline at a mid-size SaaS company leaked an environment variable. The variable was API_SECRET — a single shared token used to authenticate communication between their AI proxy and their control plane API. Every tenant on the platform used the same secret. The attacker did not need to compromise individual organizations. They had the master key.

Within hours, the attacker was issuing authorized requests on behalf of arbitrary tenants. They could read telemetry data, extract provider API keys from authorization responses, and submit requests to upstream LLM providers billed to other organizations. The breach affected all 340 tenants. Not because 340 systems were compromised — because one secret was shared across all of them.

This is the shared secret problem. It is not theoretical. It is the default architecture of most multi-tenant AI platforms today.


The shared secret problem in multi-tenant architectures

A shared secret is a single credential used to authenticate communication between two systems across all tenants. In multi-tenant AI infrastructure, this typically looks like a static API key or token that:

  • Authenticates every proxy instance to the control plane
  • Is the same value for every organization on the platform
  • Lives in environment variables, CI/CD configs, and deployment manifests
  • Never changes because rotating it means coordinating downtime across every tenant

The security model is binary: you either have the secret (full access to every tenant) or you do not (no access). There is no middle ground. There is no per-tenant scoping. There is no way to revoke access for one compromised proxy instance without revoking access for all of them.

Why this architecture exists

Shared secrets are easy. A single API_SECRET environment variable is trivial to configure. One value in the deployment manifest, one header check in the middleware, done. When you are building an MVP, this is the obvious choice. The problem is that it stays the obvious choice long after the platform has grown past the point where it is safe.

The blast radius problem

In traditional web applications, a compromised API key typically exposes one user’s data. In multi-tenant AI infrastructure with shared secrets, a compromised authentication token exposes:

  • Every organization’s telemetry data — request logs, cost data, agent behavior
  • Every organization’s provider API keys — the real OpenAI, Anthropic, and Google credentials stored in the control plane
  • Every organization’s billing — the ability to make upstream API calls charged to any tenant
  • Every organization’s policies — the ability to read (and potentially modify) governance rules

The blast radius is not “one tenant.” It is “the entire platform.”


Why AI infrastructure is especially vulnerable

AI infrastructure has characteristics that make shared secret vulnerabilities more severe than in traditional SaaS applications.

API keys are high-value targets

Provider API keys (OpenAI, Anthropic, Google) are direct-spend credentials. A compromised OpenAI key is not just an access token — it is a credit card. An attacker with a valid API key can generate thousands of dollars in charges per hour. Unlike a compromised social media account, there is immediate, measurable financial damage.

Agents run autonomously

AI agents operate without human oversight by design. A compromised proxy token does not require social engineering or human interaction to exploit. The attacker can automate requests at machine speed, against any tenant, without triggering the behavioral anomalies that human-facing systems use for detection.

Transit data is uniquely sensitive

When a proxy requests authorization from a control plane, the response includes the provider API key needed to make the upstream call. If that authorization channel uses a shared secret and the response transmits the provider key in plaintext, a single MITM or log exposure leaks credentials for any tenant. The authorization response is the most sensitive payload in the entire system.

Credential density is high

A typical multi-tenant AI proxy stores credentials for multiple providers per tenant — OpenAI, Anthropic, Google, Cohere, Mistral. A platform with 100 tenants and 3 providers each holds 300 provider API keys. One shared secret protects all 300. The ratio of “credentials at risk” to “credentials needed to compromise” is extreme.


Solution 1: Per-org proxy authentication

The first step is eliminating the shared secret entirely. Instead of one token that authenticates every proxy instance, each organization gets its own cryptographically unique proxy token.

How it works

When an organization provisions its proxy on the Govyn platform, the system generates a unique token with a gvp_ prefix (Govyn Proxy):

gvp_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE

This token is:

  • Generated from 32 cryptographically random bytes (256 bits of entropy)
  • Bcrypt-hashed before storage — the plaintext is shown once at provisioning time, then discarded
  • Scoped to a single organization — it authenticates the proxy for that org and no other
  • Paired with the organization’s proxy slug — both the token and slug must match for authentication to succeed

The proxy sends two headers on every request to the control plane:

X-Govyn-Proxy-Token: gvp_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE
X-Govyn-Proxy-Slug: acme-corp-7f3a2b

The control plane looks up the organization by slug, retrieves the stored bcrypt hash, and verifies the token. If the token is invalid or the slug does not exist, the request is rejected. If both match, the request proceeds with the organization’s context already resolved — no ambiguity about which tenant the request belongs to.

What changes for the operator

Before per-org auth, the proxy configuration looked like this:

# BEFORE: Shared secret (every org uses the same value)
proxy:
  control_plane: https://api.govyn.cloud
  auth:
    secret: ${API_SECRET}  # Same value for all 340 tenants

After per-org auth:

# AFTER: Per-org token (unique per organization)
proxy:
  control_plane: https://api.govyn.cloud
  auth:
    token: ${GOVYN_PROXY_TOKEN}  # Unique to this org
    slug: ${GOVYN_PROXY_SLUG}    # Unique to this org

The configuration change is minimal. The security improvement is fundamental.

Blast radius after per-org auth

If a proxy token is compromised:

ImpactShared secretPer-org token
Organizations exposedAll (340)1
Provider keys at riskAll (1,020)3 (one org’s providers)
Billing exposurePlatform-wideSingle org
Revocation impactPlatform outageOne org re-provisions
Detection difficultyHigh (legitimate-looking cross-org traffic)Low (anomalous single-org traffic)

Compromise of one proxy token affects one organization. The other 339 are unaffected. Revocation means re-provisioning one token, not coordinating a platform-wide rotation.

Backward compatibility

Existing deployments using the shared secret continue to work during the migration period. The middleware accepts both authentication methods:

  1. Per-org token (preferred): Full tenant isolation, org context resolved from token
  2. Shared secret (deprecated): Logs a deprecation warning on every request to track migration progress

The deprecation warning creates operational pressure to migrate without breaking existing deployments. Operators see the warnings in their logs and can migrate at their own pace.

Per-org isolation architecture


Solution 2: Transit encryption with AES-256-GCM

Per-org authentication solves the “who is this request from” problem. Transit encryption solves the “what if someone intercepts the response” problem.

The plaintext transit problem

When a proxy requests authorization from the control plane, the response includes the provider API key needed to call the upstream LLM. In a naive implementation, that key is in the response body as plaintext:

{
  "decision": "allow",
  "providerApiKey": "sk-proj-abc123..."
}

This plaintext response is vulnerable at multiple points:

  • Log aggregation: If the control plane logs response bodies (common in debugging), provider keys appear in log storage
  • Network inspection: Any intermediate proxy, load balancer, or monitoring tool that inspects HTTP response bodies sees the key
  • Memory dumps: Process crash dumps or heap snapshots capture the plaintext response
  • CDN or reverse proxy caching: A misconfigured cache layer could store and serve the response to other requests

Even with TLS on the wire, the plaintext key exists in memory at both endpoints and in any system that processes the HTTP response.

How transit encryption works

Instead of transmitting the provider API key in plaintext, the control plane encrypts it with AES-256-GCM before including it in the response:

{
  "decision": "allow",
  "encryptedProviderKey": {
    "ciphertext": "xK7mP2...",
    "iv": "a9Bf3Q...",
    "tag": "R4nL8w..."
  }
}

The encryption uses a dedicated PROXY_TRANSIT_KEY — a 256-bit key (64 hex characters) shared between the control plane and the proxy runtime. This key is distinct from the ENCRYPTION_KEY used for at-rest encryption of stored provider keys. The separation matters:

  • ENCRYPTION_KEY: Encrypts provider API keys stored in the database. Used by the control plane only. Rotated independently.
  • PROXY_TRANSIT_KEY: Encrypts provider API keys in flight between control plane and proxy. Used by both. Rotated independently.

Compromising one does not compromise the other. A database breach that exposes ENCRYPTION_KEY does not help an attacker intercept transit data. A network-level attack that exposes PROXY_TRANSIT_KEY does not help decrypt the database.

AES-256-GCM specifics

The implementation uses AES-256-GCM (Galois/Counter Mode) with:

  • 256-bit key derived from a 64-character hex string
  • 96-bit (12-byte) random IV generated per encryption operation
  • 128-bit (16-byte) authentication tag for integrity verification

GCM is an authenticated encryption mode. It provides both confidentiality (the ciphertext cannot be read without the key) and integrity (the ciphertext cannot be modified without detection). If an attacker tampers with the ciphertext, IV, or tag, decryption fails with an authentication error rather than producing corrupted output.

The proxy decrypts the provider key in memory immediately before making the upstream API call, and discards it after the response is received. The plaintext provider key exists in proxy memory only for the duration of the upstream request.

Why not just rely on TLS?

TLS protects data in transit on the wire. Transit encryption protects data at every point where it exists as an HTTP response body — in application logs, in monitoring systems, in crash dumps, in response caches. TLS terminates at the endpoint. Transit encryption persists until the intended recipient explicitly decrypts.

This is defense in depth. TLS handles the network layer. Transit encryption handles everything else.


Solution 3: Zero-downtime encryption key rotation

Encryption keys must be rotated. Compliance frameworks require it (SOC 2, PCI DSS, HIPAA). Security best practices require it. Incident response requires it. The question is whether rotation requires downtime.

The rotation problem

In a single-key system, rotating the encryption key means:

  1. Generate a new key
  2. Re-encrypt every stored value with the new key
  3. Deploy the new key to all services
  4. Remove the old key

Steps 2 and 3 must happen atomically. If the new key is deployed before all values are re-encrypted, decryption fails for values still encrypted with the old key. If re-encryption finishes before deployment, the system is temporarily using the wrong key.

For a platform with 300 stored provider API keys across 100 tenants, the re-encryption step alone takes measurable time. During that window, the system is in an inconsistent state.

Dual-key rotation

Govyn’s encryption module supports two keys simultaneously:

  • ENCRYPTION_KEY: The current (active) key. All new encryptions use this key.
  • ENCRYPTION_KEY_PREVIOUS: The previous key. Kept for decrypting data that has not been re-encrypted yet.

Every encrypted value includes a keyVersion field that records which key version was used to encrypt it. When decrypting:

  1. Look up the key matching the value’s keyVersion
  2. Attempt decryption with that key
  3. If it fails, try the other key as a fallback safety net
  4. If both fail, the data is genuinely unrecoverable (throw an error)

The rotation procedure becomes:

  1. Generate a new key
  2. Set ENCRYPTION_KEY_PREVIOUS to the current ENCRYPTION_KEY
  3. Set ENCRYPTION_KEY to the new key
  4. Increment ENCRYPTION_KEY_VERSION
  5. Deploy — zero downtime, all existing encrypted data remains readable
  6. Run a background re-encryption job at your convenience
  7. Once all values are re-encrypted with the new key, remove ENCRYPTION_KEY_PREVIOUS

Steps 2-4 are a configuration change. Step 5 is a normal deployment. Step 6 can run over hours or days without impacting service availability. There is no window where decryption fails.

Rotation in practice

# Current state
ENCRYPTION_KEY=aabbccdd...  # 64 hex chars, version 1

# Rotation
ENCRYPTION_KEY_PREVIOUS=aabbccdd...  # Old key moves here
ENCRYPTION_KEY=11223344...           # New key
ENCRYPTION_KEY_VERSION=2             # Increment

# Deploy. Done. No downtime. No re-encryption required immediately.

New encryptions use version 2. Old data encrypted with version 1 decrypts using ENCRYPTION_KEY_PREVIOUS. The system is fully operational at every step.

Why keyVersion matters

Without version tracking, the decryption code would need to try both keys on every decryption attempt. With version tracking, it goes directly to the correct key on the first try. The fallback path exists as a safety net, not as the primary code path.

This is important for performance when the system handles thousands of decryption operations per minute. One key derivation and one decryption attempt per operation, not two.

Key rotation timeline


Solution 4: O(1) API key lookup

API key validation is on the critical path of every proxy request. Every agent call that flows through the proxy must validate the API key before proceeding. The performance of this validation directly impacts request latency.

The bcrypt iteration problem

Govyn API keys are bcrypt-hashed before storage. Bcrypt is deliberately slow — that is its purpose. A single bcrypt.compare() operation takes 50-100ms depending on hardware and cost factor.

In the original implementation, validating an API key meant:

  1. Receive the token from the request
  2. Query all non-revoked API keys for the organization
  3. Iterate through each key, running bcrypt.compare() on each one
  4. Return the first match, or reject if none match

For an organization with 20 API keys, this is 20 sequential bcrypt comparisons in the worst case (when the key does not match any). At 75ms per comparison, that is 1.5 seconds of CPU time spent on authentication alone — on every request.

# Worst case: O(n) bcrypt comparisons
Token arrives -> Load all org keys -> bcrypt.compare(token, key1) -> no match
                                   -> bcrypt.compare(token, key2) -> no match
                                   -> bcrypt.compare(token, key3) -> no match
                                   -> ... (repeat for all n keys)
                                   -> bcrypt.compare(token, key20) -> match (or reject)

Time: n * ~75ms = 1,500ms for n=20

This does not scale. Organizations with more keys, or platforms with high request volumes, hit a wall where authentication dominates request latency.

Prefix-based O(1) lookup

The solution is an 8-character prefix stored in plaintext alongside the bcrypt hash. When a new API key is generated:

Full key:  gvn_A7x9Kp2mR5nB8qW3vF6jL1dH4sY0tC7uE...
Prefix:    A7x9Kp2m  (characters 5-12, after "gvn_")
Hash:      $2a$10$xK7mP2... (bcrypt hash of full key)

The prefix is stored in a database column with an index. Validation becomes:

  1. Receive the token from the request
  2. Extract the 8-character prefix (positions 4-12)
  3. Query for a single key matching that prefix: WHERE keyPrefix = 'A7x9Kp2m' AND revokedAt IS NULL
  4. Run one bcrypt.compare() to verify the full token against the stored hash
# O(1): One database lookup + one bcrypt comparison
Token arrives -> Extract prefix "A7x9Kp2m"
             -> SELECT WHERE keyPrefix = 'A7x9Kp2m' (indexed, O(1))
             -> bcrypt.compare(token, candidate.hash) -> match or reject

Time: ~1ms (query) + ~75ms (bcrypt) = ~76ms regardless of key count

One bcrypt comparison instead of up to 20. The time is constant regardless of how many keys an organization has.

Is the prefix a security risk?

The prefix is 8 characters from a base64url character set (64 possible characters per position). That is 64^8 = approximately 281 trillion possible prefixes. An attacker who knows the prefix still needs to brute-force 24 remaining characters of the key (also base64url). The prefix narrows the search space for “which database row to compare against,” not the search space for “what is the key.”

The bcrypt hash remains the security boundary. The prefix is an index, not a credential.

Legacy key migration

Keys created before the prefix column was added have keyPrefix = NULL. The validation code handles this with a fallback path:

  1. Look up by prefix — if found, single bcrypt compare
  2. If no prefix match, query all keys with keyPrefix IS NULL and iterate with bcrypt (legacy behavior)

As old keys are revoked and replaced with new ones, the legacy fallback path handles fewer and fewer keys until it processes none. No migration script required — natural key lifecycle handles the transition.


The complete security architecture

Here is how all four solutions work together on a single proxy request:

Request flow

1. Agent sends request to proxy
   Headers: Authorization: Bearer gvn_A7x9Kp2m...
   Body: { "model": "gpt-4o", "messages": [...] }

2. Proxy validates agent API key (O(1) prefix lookup)
   Extract prefix "A7x9Kp2m" -> DB lookup -> single bcrypt compare -> valid

3. Proxy requests authorization from control plane
   Headers: X-Govyn-Proxy-Token: gvp_... (per-org token)
            X-Govyn-Proxy-Slug: acme-corp-7f3a2b

4. Control plane authenticates proxy (per-org bcrypt verification)
   Slug lookup -> bcrypt compare token hash -> valid, org = acme-corp

5. Control plane evaluates policies for acme-corp
   Budget check, rate limit, content filter, approval rules -> allow

6. Control plane decrypts provider API key (versioned AES-256-GCM)
   Read encrypted key from DB -> decrypt with ENCRYPTION_KEY (version 2)

7. Control plane re-encrypts provider key for transit (AES-256-GCM)
   Encrypt with PROXY_TRANSIT_KEY -> include in response as encryptedProviderKey

8. Proxy receives response, decrypts provider key
   AES-GCM decrypt with PROXY_TRANSIT_KEY -> plaintext provider key in memory

9. Proxy makes upstream API call to OpenAI/Anthropic
   Authorization: Bearer sk-proj-... (decrypted, never stored on proxy)

10. Proxy discards provider key from memory after response received

At no point does a shared secret authenticate any cross-tenant operation. At no point does a provider API key transit the network in plaintext. At no point does a key validation require iterating through all stored keys.

Before and after

ConcernBefore (v1.1)After (v1.2)
Proxy-to-API authSingle shared API_SECRETPer-org gvp_ token (bcrypt-hashed)
Blast radius of compromiseAll tenantsSingle tenant
Provider key transitPlaintext in HTTP responseAES-256-GCM encrypted
Key rotationFull platform downtimeZero-downtime dual-key
API key validationO(n) bcrypt iterationO(1) prefix lookup + single bcrypt
Revocation scopePlatform-wide or nothingPer-organization
Auth method trackingNoneauthMethod field on every request
Migration pathN/AShared-secret fallback with deprecation warnings

Implementation details

Environment variables

The v1.2 security architecture introduces three encryption-related environment variables:

# At-rest encryption of stored provider API keys (BYOK)
ENCRYPTION_KEY=          # 64-char hex (32 bytes), current version
ENCRYPTION_KEY_PREVIOUS= # 64-char hex, previous version (for rotation)
ENCRYPTION_KEY_VERSION=  # Integer, defaults to 1

# Transit encryption between control plane and proxy
PROXY_TRANSIT_KEY=       # 64-char hex (32 bytes), separate from ENCRYPTION_KEY

Generate keys with:

# Generate a 256-bit key as 64 hex characters
openssl rand -hex 32

Proxy configuration

# govyn.yaml (proxy instance for acme-corp)
proxy:
  control_plane: https://api.govyn.cloud
  auth:
    token: ${GOVYN_PROXY_TOKEN}  # gvp_ token, unique to this org
    slug: ${GOVYN_PROXY_SLUG}    # e.g., acme-corp-7f3a2b

Key rotation checklist

  1. Generate new key: openssl rand -hex 32
  2. Set ENCRYPTION_KEY_PREVIOUS = current ENCRYPTION_KEY
  3. Set ENCRYPTION_KEY = new key
  4. Increment ENCRYPTION_KEY_VERSION
  5. Deploy to all control plane instances
  6. Verify: check logs for successful decryption with both key versions
  7. Run background re-encryption of stored values (optional, at your pace)
  8. After re-encryption: remove ENCRYPTION_KEY_PREVIOUS

Compliance mapping

ControlImplementation
SOC 2 CC6.1 (Logical access)Per-org proxy tokens, bcrypt-hashed
SOC 2 CC6.7 (Encryption)AES-256-GCM at-rest and in-transit
PCI DSS 3.5 (Key management)Dual-key rotation, version tracking
PCI DSS 3.6 (Key rotation)Zero-downtime rotation procedure
HIPAA 164.312(a)(1) (Access control)Per-tenant cryptographic isolation
HIPAA 164.312(e)(1) (Transmission security)Transit encryption of provider keys

Key takeaways

  1. Shared secrets are a platform-wide vulnerability. One token authenticating all tenants means one compromise exposes everyone. Replace shared secrets with per-tenant credentials.

  2. Transit encryption is not redundant with TLS. Provider API keys in HTTP response bodies are vulnerable in logs, crash dumps, caches, and monitoring systems. Encrypt sensitive response payloads independently.

  3. Key rotation should not require downtime. Dual-key support with version tracking lets you rotate encryption keys with a configuration change and a normal deployment. No re-encryption window, no service interruption.

  4. Authentication performance matters at proxy scale. O(n) bcrypt iteration does not scale with key count. Prefix-based lookup reduces every validation to one database query and one bcrypt comparison, regardless of how many keys exist.

  5. Defense in depth applies to AI infrastructure. Per-org auth, transit encryption, versioned at-rest encryption, and O(1) key lookup are four independent layers. Compromising one does not compromise the others.


FAQ

Does per-org authentication add latency to every proxy request?

The proxy token is verified with a single database lookup (by slug) and one bcrypt comparison. This adds approximately 75-100ms to the authorization request. Since the proxy already makes an authorization call to the control plane on every request, the overhead is the bcrypt comparison itself — not a new network round trip. For context, the upstream LLM API call typically takes 500-5,000ms. The authentication overhead is negligible relative to the total request latency.

What happens if I lose my proxy token?

Proxy tokens are shown once at provisioning time — they are not stored in plaintext anywhere. If you lose the token, you re-provision your proxy, which generates a new slug and token. Your agents continue working; only the proxy’s control plane authentication changes. Agent-facing API keys (the gvn_ tokens) are unaffected.

Can I use the same ENCRYPTION_KEY for transit and at-rest encryption?

You can, but you should not. Using separate keys (ENCRYPTION_KEY and PROXY_TRANSIT_KEY) provides isolation. A compromise of one key does not affect data protected by the other. If your transit key is exposed through a proxy instance compromise, your stored provider keys in the database remain encrypted with a different key that the attacker does not have.

How does this work with self-hosted Govyn?

Self-hosted Govyn (the open-source version) runs the proxy directly with provider API keys in its own configuration — there is no control plane authorization step. The transit encryption and per-org auth features apply to the cloud-hosted Govyn platform, where a centralized control plane manages multi-tenant proxy instances. Self-hosted users benefit from the O(1) key lookup improvement if they use the API key management features.

Is the 8-character prefix long enough for uniqueness?

Eight base64url characters provide 64^8 (approximately 281 trillion) possible values. The probability of a collision between two keys is negligible even at millions of keys. If a collision did occur, the system falls back to bcrypt comparison against both candidates — still far fewer comparisons than iterating all keys. In practice, prefix collisions have not been observed.


Further reading


Govyn is an open-source API proxy for AI agent governance. Per-org cryptographic isolation ships in v1.2. MIT licensed. Self-host or cloud-hosted.

Secure your agents →

Related posts

Explore more