How to Audit AI Agent Activity for SOC 2 and EU AI Act Compliance
A fintech team failed their first SOC 2 Type II audit on a single finding. They had agent activity logs. They had retention policies. What they did not have was a record of which version of which policy file had been in force when an autonomous research agent pulled customer records at 03:14 UTC on a Wednesday in February. The auditor wanted the policy bundle hash, the agent identity, the tool call parameters, the human approval (or its absence), and the egress destination. The team had a JSON blob with a prompt and a response. Remediation cost a six-figure consulting engagement and a delayed enterprise contract worth roughly $480,000 in first-year ARR. The gap was not the agent. The gap was the audit log.
This is the pattern across every compliance-grade AI deployment in 2026. The regulations finally name what auditors are allowed to ask for, and most agent log schemas are not designed to answer the question. SOC 2 CC7.2 expects continuous monitoring with one year of evidence. The EU AI Act’s Article 12 expects automatically generated logs that survive the lifetime of the high-risk system, are accessible to national authorities, and capture events that would let an inspector reconstruct what happened. ISO/IEC 42001 expects model design records, performance logs, and data audit trails as ongoing evidence rather than annual artifacts. The fields they ask for overlap, but not completely. The retention windows differ. The accessibility requirements differ. The remediation cost when you find out at audit time can fund a small engineering team for a year.
This post maps each requirement to a specific telemetry field, gives you a minimum viable audit log schema that satisfies SOC 2 and EU AI Act simultaneously, and explains where the schema needs to live so that the agent itself cannot rewrite it.
What does an auditor actually ask for in AI agent logs?
An AI agent audit log is a tamper-evident, append-only record of every action an agent takes, with seven required fields per event: who initiated it, what action was attempted, when it occurred (with timezone-aware timestamp), why it was authorized (or denied), what input data the action consumed, what tools and parameters it used, and what data left the boundary as a result.
That is the working definition. It is the question pattern auditors run regardless of whether the framework is SOC 2, ISO 27001, ISO 42001, the EU AI Act, NIST AI RMF, or HIPAA. The framework determines retention period, accessibility, and which fields are mandatory versus advisory. The fields themselves are stable.
In practice, an auditor will run six query patterns against your logs. If your schema cannot answer all six, you have a finding. The patterns are: enumerate every agent action that touched a specific data category in a date range; reconstruct the full decision chain for a specific incident; show every policy change between two dates and which agents ran under each version; identify every action that bypassed human review or that should have been escalated; list cross-tenant or cross-environment access events; produce a tamper-evidence proof that no log entries were modified or deleted.
Most agent observability stacks were designed for debugging, not auditing. They optimize for developer ergonomics: pretty traces, span hierarchies, latency distributions. Auditors do not care about latency distributions. They care about whether your logs would hold up as evidence in an enforcement action under the EU AI Act’s August 2026 high-risk provisions or as control evidence in a Type II opinion. The two purposes share infrastructure but differ in their schema and retention requirements. Treat them as one system with two output formats.
How does SOC 2 map to AI agent telemetry?
SOC 2’s Trust Services Criteria do not name AI directly. They name controls. Five Common Criteria translate cleanly to agent telemetry fields, and three of them (CC7.2, CC8.1, and the CC6 series) carry most of the weight in any AI-focused audit. The AICPA Trust Services Criteria are the authoritative source; auditor expectations on retention and field coverage have hardened around the patterns below.
CC6.1 governs logical access security. For agents, the relevant fields are agent identity (a unique, non-rotating identifier separate from the API key), the principal who deployed the agent, role assignments, and the credential the agent used to authenticate to its proxy or policy engine. CC6.6 covers boundary protection: every external destination an agent reaches, classified by trust level. CC6.7 covers data transmission: every payload that crossed an environment boundary, with a hash and a destination identity.
CC7.2 is the criterion most agent log schemas fail. It requires monitoring of system components for anomalies and the response to them. For agents, this means continuous capture of policy decisions (allow, deny, modify, escalate), threshold breaches (rate limit, budget cap, tool blocklist), and the policy version in force when each decision was made. A SOC 2 auditor expects a full year of these records and queryable evidence that someone actually reviews them. GitHub’s default 90-day audit log retention is a known gap; teams routinely export logs to long-term storage for SOC 2 specifically because of this. Agent platforms that retain only the last 30 days of traces are starting from the same gap.
CC8.1 governs change management. Every policy change, agent deployment, model version pin, and tool registration is a change. The audit log must capture who proposed it, who approved it, what the diff was, when it took effect, and what previous version it replaced. If your agent reads its prompt from a file in a git repo and the file changes in a hotfix, that is a change event under CC8.1. If a tool description in an MCP server is updated and your agent picks up the new description, that is also a change event. The auditor will ask for change records that line up with policy version identifiers in the runtime decision logs. They have to reconcile.
The CC9 series, particularly CC9.1 and CC9.2, becomes relevant when agents call vendor MCP servers or third-party APIs. CC9.1 expects evidence of risk mitigation against vendor failures; CC9.2 expects vendor-management-grade controls on third-party access. Every external tool an agent calls is, for SOC 2 purposes, a vendor relationship in miniature. The audit log must capture the vendor identity, the data shared, and the control basis for sharing it.
Retention is where SOC 2 gets practical. The AICPA does not specify a duration. Auditors do, and they have converged on twelve months of audit log evidence as the practical minimum for a Type II opinion. The reasoning is mechanical: a Type II observation period is six to twelve months, and the auditor needs evidence that controls operated effectively across that whole window. If your logs only retain ninety days, you have automatically failed the evidence test for any Type II covering a longer period. Plan for thirteen months minimum; export aggressively from any platform that defaults to less.
How does the EU AI Act Article 12 map to log retention?
Article 12 of the EU AI Act requires that high-risk AI systems “technically allow for the automatic recording of events (logs) over the lifetime of the system.” That is the lifetime, not the audit period. Article 19, which governs how providers must retain those logs, sets the floor at six months unless other Union or national law requires longer (notably, GDPR for any logs that contain personal data). Article 18 separately requires technical documentation retention for ten years after the system is released. Read together: the documents about the system live ten years; the runtime logs live at least six months but must be capable of being generated and retained over the system’s full lifetime. The official Article 12 text is the authoritative reference; the practical retention reading sits in Article 19.
Article 12 specifies what events the logs must capture. The text references three categories: events relevant to identifying risks under Article 79(1), events relevant to post-market monitoring under Article 72, and events relevant to monitoring the operation of high-risk systems under Article 26(5). The phrasing is deliberate. It pushes the question of “what to log” back to the provider, who must be able to defend the choice on the basis that an inspector could reconstruct the system’s behavior in any of those three contexts.
For high-risk systems used in remote biometric identification (Annex III, point 1(a)), Article 12 names exact fields. The logs must record the start and end timestamps of each use, the reference database against which input data was checked, the input data that produced a match, and the identity of natural persons involved in verification. This is the only place in the regulation where field-level requirements are spelled out. The pattern matters for other high-risk categories because authorities are likely to argue that the spirit of Article 12 demands equivalent specificity in any high-risk deployment, even where the text does not enumerate it.
The accessibility requirement is the stricter half. Logs must be accessible to national competent authorities upon request. This rules out encrypted-at-rest storage where only the deployer holds the key, end-to-end encrypted client-only logging, and aggressively obfuscated formats. The logs must be tamper-evident, timestamped, and independently verifiable. In practice, this means signed log entries (HMAC or append-only Merkle structures), hash-chained sequence numbers, and a documented procedure for producing an authority-ready export within reasonable time.
| Regulation | Minimum retention | Maximum scope | Accessibility | Notes |
|---|---|---|---|---|
| SOC 2 (CC7.2 evidence) | 12 months (auditor expectation) | Type II observation period | Auditor and security team | AICPA does not mandate; auditors enforce |
| EU AI Act Article 19 | 6 months | Lifetime of high-risk system | National competent authorities | Longer if Union or national law applies |
| EU AI Act Article 18 | 10 years | Technical documentation only | Authorities on request | Documentation, not runtime logs |
| GDPR (personal data in logs) | Necessary and proportionate | Bounded by data minimization | Subject access requests apply | Often shortens log retention |
| ISO 27001 (good practice) | 12 months | Whole ISMS scope | Auditor | Aligned with PCI DSS 4.0 norm |
| PCI DSS 4.0 | 12 months total, 3 months immediate | Cardholder data environments | QSA | Useful as a benchmark |
| HIPAA (security rule) | 6 years | Covered systems | OCR | Where PHI flows through agents |
The conflict between Article 19 and GDPR is the operational headache. An agent log that contains a customer’s name, address, or message content is personal data. GDPR says you keep personal data only as long as necessary for the purpose, and the purpose of an audit log is bounded. If you retain a year of logs because SOC 2 auditors want a year, but ten percent of those logs contain personal data that the customer asked you to delete, you are out of compliance with GDPR even while you are compliant with SOC 2. The standard resolution is structural: separate the personal data from the audit metadata. Hash, redact, or vault the personal data with a short retention; keep the audit metadata (who did what, when, with which policy version) with longer retention. The audit log proves the action; the vaulted payload, when still available, proves the content.
Deployers should not assume the provider’s logs are sufficient. Article 26(5) places the monitoring obligation on the deployer. If you are using a third-party agent service, you need a logging story that survives the provider failing or refusing to produce records. This is one of the reasons proxy-layer governance has gained traction over the last year: when policy enforcement and logging live in your infrastructure, the provider’s audit posture becomes a backup, not a single point of failure.
What is the minimum viable audit log schema?
A minimum viable audit log captures one event per agent action with twelve required fields. Fewer than twelve and you cannot satisfy the union of SOC 2 CC7.2 and EU AI Act Article 12. More is fine; less leaves gaps an auditor will find.
The twelve fields cover seven questions: who, what, when, why, with what input, with what tool, and what egressed. The schema below is a JSON Lines record format, with field names aligned where possible to the OpenTelemetry GenAI semantic conventions so existing tracing infrastructure can produce compliant audit records without a parallel ingestion path.
{
"event_id": "evt_01HRWQ8K9X3M2P5N7Z4Y8B6T1F",
"occurred_at": "2026-05-05T14:32:11.418Z",
"agent": {
"id": "agent_credit_review_v3",
"name": "credit-review-agent",
"version": "3.4.1",
"deployer_principal": "team-lending@company.com"
},
"session": {
"id": "sess_4j2kP9wq3xL",
"tenant_id": "tenant_acme_corp",
"trigger": "scheduled"
},
"action": {
"operation": "invoke_agent",
"tool_name": "database.query",
"tool_target": "postgres://prod-credit-db.internal:5432",
"parameters_hash": "sha256:c1ad...e7f9"
},
"policy": {
"bundle_id": "policies-2026-04-22-signed",
"version": "v1.4.7",
"verdict": "allow",
"reason": "matched: lending_team_read_allowed",
"evaluated_rules": ["tenant_isolation", "data_class_check", "rate_limit"]
},
"input": {
"prompt_hash": "sha256:9e2c...4d11",
"prompt_byte_size": 1842,
"data_classifications": ["pii_present", "financial"]
},
"model": {
"provider": "anthropic",
"id": "claude-sonnet-4-7",
"request_id": "req_AbC123XyZ"
},
"tokens": {
"input": 1240,
"output": 387,
"total_cost_usd": 0.0234
},
"egress": {
"destination_classification": "internal_database",
"payload_hash": "sha256:6f0a...88b2",
"bytes_out": 4096,
"rows_returned": 17
},
"oversight": {
"human_required": false,
"human_reviewer": null,
"approval_token": null
},
"integrity": {
"sequence": 184729841,
"previous_hash": "sha256:e8d5...c1aa",
"entry_signature": "ed25519:a14c...77be"
}
}
Each top-level field carries audit weight. event_id is the single identifier you give an auditor when they ask “show me the record for this incident.” occurred_at is RFC 3339 with millisecond precision and timezone, never local time. agent.id is stable across deployments and survives version bumps; agent.version changes when code or prompt changes. session.tenant_id is the field SOC 2 CC6.1 and the EU AI Act Article 14 (human oversight at the deployer level) both depend on; missing tenant binding is the single most common multi-tenant audit finding.
policy.bundle_id is the linkage point between runtime decisions and the change log under CC8.1. The bundle ID maps to a signed git tag or a release artifact. An auditor pulls the bundle ID, reads the change log, and reconstructs which approved policy version was in force. policy.verdict is one of allow, deny, modify, escalate, mirroring the policy engine decision verbs. policy.reason is the human-readable rule name that produced the verdict; this is what an auditor reads to confirm the decision was deterministic and not a model judgment call.
input.prompt_hash and egress.payload_hash solve the GDPR-versus-SOC 2 retention conflict. The hash is non-personal metadata you can keep for a year. The payload itself, if it contains personal data, lives in a separate vault with shorter retention and access controls. When an auditor needs to verify that two logged actions had identical inputs (a common pattern in fraud investigation), the hashes are sufficient. When an authority needs the actual content under EU AI Act Article 12, you produce it from the vault if it still exists, or you produce a documented hash and the deletion record if it does not.
model.request_id is the Anthropic or OpenAI request ID returned by the provider. This is your linkage point into the provider’s own logs. Anthropic’s Compliance API and OpenAI’s Audit Logs API both index by request ID. You do not need to fetch the provider’s logs for routine audits, but when your log shows a verdict of allow and your downstream data shows an unexpected outcome, the provider’s request ID is how you join the timelines.
integrity.sequence, integrity.previous_hash, and integrity.entry_signature make the log tamper-evident. The sequence number is monotonic per stream. The previous_hash chains entries so any deletion is detectable: change one entry and every entry after it has the wrong previous_hash. The signature is per-entry, signed with a key the agent does not hold. Article 12’s tamper-evidence requirement is technically satisfied by hash chains alone; in practice, signatures help when you need to prove the log was produced by your infrastructure and not fabricated after the fact.
Three optional fields address common gaps without adding routine cost. error captures structured error metadata when an action fails; auditors look for failure patterns. cost aggregates token, tool, and infrastructure cost into a single number, which CC9 controls expect when fraud or runaway-cost incidents are reviewed. correlation_id joins multi-step agent workflows so an auditor can pull the entire decision tree of a single business transaction with one query.
How long should you retain AI agent logs?
Retention is set by the strictest applicable regulation, not the median. For most enterprise deployments the binding window is twelve months for SOC 2 evidence, six months for EU AI Act runtime logs, ten years for EU AI Act technical documentation, and as-needed-and-proportionate for GDPR personal data within logs. The technical answer is: retain audit metadata for thirteen months minimum, retain personal-data payloads for the shortest justifiable window, and retain technical documentation indefinitely.
The thirteen-month metadata floor is mechanical. Type II audit observation periods are six to twelve months. The auditor wants evidence covering the full window plus a buffer. Twelve months is the floor; thirteen is the operating point. PCI DSS 4.0’s twelve-month requirement, ISO 27001’s twelve-month guidance, and HIPAA’s six-year requirement all point in the same direction for security logs: a year is the minimum credible retention.
| Data class | Minimum retention | Storage tier | Accessibility |
|---|---|---|---|
| Audit metadata (no personal data) | 13 months | Hot tier, queryable | Security and audit team |
| Audit metadata (older) | 7 years | Warm tier, indexed | On-demand, hours not days |
| Hashed payloads | 13 months | Hot tier | Linked to metadata |
| Personal-data payloads | Shortest justifiable | Vault, encrypted | Subject to GDPR controls |
| Technical documentation (Article 18) | 10 years | Cold tier, immutable | Authorities on request |
| Tamper-evidence proofs | Indefinite | Immutable, distributed | Always available |
| Change log entries | 13 months minimum | With audit metadata | Reconcilable to runtime logs |
Two anti-patterns recur in the deployments that fail audits. The first is undifferentiated retention: a single TTL on the entire log stream. This forces the team to choose between losing audit coverage and accumulating unbounded personal data. Tier the retention; do not set one number for everything. The second is implicit deletion: logs are pruned when an index grows too large, with no record of what was pruned. Article 12’s tamper-evidence and SOC 2’s CC8.1 both require deletion to be a logged event with its own audit trail. Run a deletion job and you must retain a record that the job ran, what time window it covered, and how many records it removed, because “we ran out of space” is not a defense to a regulator.
The right operational model is two-tier with explicit deletion. Hot tier holds audit metadata and hashed payloads for thirteen months; warm or cold tier holds them for seven years. Personal-data payloads live in a separate vault with the shortest justifiable retention, often thirty to ninety days, with formal deletion on schedule and a deletion event in the audit log. Tamper-evidence proofs (Merkle roots, signed checkpoints) live indefinitely and are cheap to keep. Technical documentation lives in immutable storage for ten years per Article 18. Each tier has its own access controls, its own export procedure, and its own retention test.
Where should agent audit logs live?
Audit logs should live in infrastructure the agent cannot modify. That is the single architectural decision that determines whether the rest of the schema is auditable or merely advisory. There are three deployment patterns for where the logs originate, and they have meaningfully different gaps.
The application-layer SDK pattern, in which the agent process writes logs directly through a logging library, is the most common starting point and the weakest audit posture. The agent itself controls the logger. A buggy or compromised agent can drop entries, modify them in flight, or skip logging entirely. There is no tamper-evidence on the write path. Different teams implement the schema differently. An auditor cannot prove the log is complete, only that it shows what it shows. This is acceptable for development. It is not acceptable for production high-risk systems under the EU AI Act, and it produces consistent CC7.2 findings under SOC 2.
The proxy-layer pattern, in which all agent traffic flows through a network proxy or admission controller that produces the audit log, is the architecture that survives agent compromise. The agent cannot disable the proxy because the agent does not hold the credentials needed to reach the model or tool directly. Logs are written by infrastructure the agent does not control. Tamper-evidence is applied at the source. The schema is consistent across every agent that connects through the proxy. This is the architecture that satisfies CC7.2’s continuous-monitoring requirement and Article 12’s tamper-evidence and accessibility requirements with the smallest engineering surface area. The architectural argument for this pattern is identical to the case for proxy-based policy enforcement, because policy enforcement and audit logging are two outputs of the same intercept point.
The provider-only pattern, in which you rely on the model vendor’s audit logs (Anthropic’s Compliance API, OpenAI’s Audit Logs API), captures only what the vendor sees. Tool calls outside the model API, MCP server interactions, downstream side effects, and tenancy mappings live entirely outside the vendor’s visibility. The vendor’s logs are useful corroboration but cannot serve as the primary audit record for any deployment with non-trivial tool use. They also tie your retention to the vendor’s commercial terms, which can change.
The practical recommendation is proxy-primary, vendor-secondary, with optional SDK enrichment for in-process telemetry that the proxy cannot see (token counts before serialization, latency breakdowns inside agent code, prompt-construction context that does not cross the proxy boundary). The proxy is the system of record. The vendor logs and SDK telemetry add detail. The audit posture is determined by the system of record.
What about ISO 42001 and other emerging standards?
ISO/IEC 42001:2023 is the AI management system standard that operationalizes the controls SOC 2 names abstractly and the EU AI Act names sectorally. Clause 9 requires continuous performance monitoring and audit. Clause 8 requires evidence of model design decisions, accuracy and performance monitoring, data audit trails, and product launch approvals retained as ongoing records, not annual artifacts. The audit field set ISO 42001 expects is a near-superset of what SOC 2 CC7.2 and EU AI Act Article 12 require: model design records, performance logs, change records, decision logs, data lineage, and incident response records, all auditable on demand. The ISO 42001 standard text is the authoritative reference.
The NIST AI Risk Management Framework’s MEASURE and MANAGE functions overlap heavily. NIST does not specify field-level requirements; it expects evidence that risk is monitored and managed throughout the lifecycle. Translating this to audit logs means: incident response records linked to the underlying decision log, drift detection logs linked to the model identity, and human-oversight records linked to the actions they reviewed. These are joins across the schema fields the EU AI Act and SOC 2 already require, not new fields.
NIST SP 800-218A, the secure software development framework as applied to AI, is starting to influence vendor due diligence. Its expectations on supply chain integrity for models, datasets, and agent code translate to audit fields capturing model provenance hash, training data registry reference, and code commit ID per agent version. Adding these fields is incremental work for the schema if they are not already present.
The pattern across all four frameworks is convergence, not divergence. SOC 2 names the controls. The EU AI Act names the events that must be loggable. ISO 42001 names the management discipline. NIST names the risk-management processes. The audit log fields that satisfy one framework, with documented retention and access controls, satisfy the others with marginal additions. Designing the schema once, against the strictest applicable framework, is cheaper than re-architecting the log stream when the next regulation lands. The next regulation is probably the United States’s Executive Order 14179 successor framework, expected in late 2026 or early 2027, which is widely expected to mirror the EU AI Act’s logging structure with American enforcement priorities.
How do you implement compliant logging without slowing agents?
The naive implementation, in which every agent action blocks on a synchronous write to a tamper-evident log, adds latency that operators eventually disable. The compliant-but-fast implementation separates the hot path from the durable path, accepts a small ingestion lag, and sets retention tiers that match access patterns. The performance budget is real, and the cost is dominated by storage rather than compute.
A production-grade audit pipeline runs four stages with explicit budgets. Stage one is in-proxy capture: the proxy serializes the audit event and writes to a local ring buffer in under one millisecond. Stage two is async ingestion: a background worker drains the buffer to the durable log every few hundred milliseconds with batch compression. Stage three is integrity sealing: a separate process computes the hash chain and per-entry signatures, often on a five-second cadence, and writes the sealed batch to immutable storage. Stage four is the queryable index: the indexed copy in the hot tier, optimized for the auditor query patterns, can lag the immutable log by minutes without compliance impact.
The latency at stage one is the only latency the agent sees. Targets under one millisecond are routine on commodity hardware with memory-mapped append buffers; the token count manipulation detection pipeline uses a similar fire-and-forget pattern for the same reason. Stage two and three latency affects the time-to-evidence window for forensic investigation but does not affect the agent itself. Stage four latency is purely an operator convenience; auditors do not query in real time.
Sampling is a tempting but rarely correct optimization. SOC 2 CC7.2 expects continuous monitoring; sampled logs are not continuous. EU AI Act Article 12 expects events relevant to risk identification; a sampled log can miss the rare anomalous event that is exactly what the regulator wants to see. The compliant operating point is full event capture with selective payload retention. Every event gets a metadata record. The full request and response bodies are retained for a sample of events plus all events flagged by policy as high-risk. Hashes link the metadata to the (possibly-deleted) payloads in either case. This satisfies the continuous monitoring requirement while keeping storage bounded.
Cost at scale is dominated by hot-tier storage of payload contents, not by metadata. A 500-agent deployment producing five million actions per month at one kilobyte of metadata per action generates roughly five gigabytes per month of metadata, indexed and queryable. Payload retention with average two-kilobyte requests and four-kilobyte responses adds thirty gigabytes per month if everything is retained, three to five gigabytes per month with selective retention plus high-risk flags. At commodity object-storage prices, the audit pipeline costs in the low hundreds of dollars per month for that scale. The dominant operational cost is the engineering time to build and maintain the pipeline, which is why the proxy-layer architecture wins on TCO: one pipeline serves every agent.
FAQ
What audit logs does SOC 2 require for AI?
SOC 2 does not explicitly name AI, but the Common Criteria apply to any system that processes information. CC7.2 requires continuous monitoring of system components for anomalies, which translates to logs of every agent action, every policy decision, and every threshold breach. CC8.1 requires change management records, including agent deployments, prompt updates, and policy bundle changes. CC6.1 requires authentication and authorization records, including agent identity and the principal that deployed each agent. The auditor expects twelve months of retained logs plus tamper-evidence, queryability, and reconcilability between change records and runtime decision logs. The AICPA does not mandate a specific retention duration; the practical expectation in 2026 audits is twelve months minimum, often thirteen months for safety margin against Type II observation periods.
How long must EU AI Act records be retained?
Article 19 sets the floor at six months for automatically generated runtime logs of high-risk systems, unless other Union or national law requires longer (notably GDPR for personal data within the logs). Article 18 separately requires technical documentation retention for ten years after the system is released, including model design, quality management documentation, approved changes, and the EU declaration of conformity. Article 12 requires that the logging capability cover the lifetime of the system, meaning you must be able to generate logs throughout the system’s operational life, even if older logs have been archived. Most enterprise deployments retain runtime audit metadata for thirteen months to satisfy SOC 2 simultaneously, retain personal-data payloads for the shortest justifiable window under GDPR, and retain technical documentation indefinitely under Article 18.
Can you audit agent activity without intercepting model traffic?
Partially, but not enough for high-risk deployments. SDK-level logging captures what the agent code chooses to log; an agent with a bug or a compromised dependency can omit or rewrite entries. Provider-side logs from Anthropic or OpenAI capture model interactions but miss tool calls, MCP server interactions, tenancy mappings, and downstream side effects. The compliance-grade pattern is proxy-layer interception, in which network traffic between the agent and any external destination flows through a control point that produces the audit log. The proxy holds credentials the agent does not, so the agent has no path around it. SOC 2 CC7.2 and EU AI Act Article 12 are both materially easier to satisfy with proxy-layer logging than with application-layer logging, and the proxy doubles as the policy enforcement point that prevents the failures the audit log would otherwise document.
Does prompt logging satisfy compliance requirements?
No. Prompt logging is a subset of audit logging. A prompt log captures the model input and (sometimes) output. A compliance-grade audit log captures the prompt as one of twelve required fields, alongside agent identity, policy version, decision verdict, tool calls, egress destinations, and integrity proofs. Prompt logging without policy decision logging cannot answer “why was this allowed?” Prompt logging without tool call logging cannot answer “what did the agent actually do with the response?” Prompt logging without tenancy fields cannot answer “did this action cross a customer boundary?” The standard auditor question list has six common patterns; prompt logs alone answer at most one of them. Treat prompts as one input to the audit record, not the audit record itself.
What are the gaps between SOC 2 and EU AI Act audit requirements?
Three gaps recur. First, retention: SOC 2’s de facto twelve-month expectation is longer than the EU AI Act’s six-month floor for runtime logs but shorter than Article 18’s ten-year requirement for technical documentation. Plan for the union, not the intersection. Second, accessibility: the EU AI Act requires logs to be accessible to national competent authorities upon request, which constrains encryption strategies and requires a documented export procedure; SOC 2 does not impose this directly. Third, scope: SOC 2 covers the whole organization; the EU AI Act focuses on high-risk systems specifically, which means deployments must classify which agents fall under high-risk and apply the Article 12 schema to those, while non-high-risk agents may run a lighter schema. The simpler operational choice is to apply the high-risk schema to every production agent regardless of classification, because the marginal cost of additional fields is low and the cost of misclassification at audit time is high.
Further reading
- The EU AI Act Takes Effect in August. Here’s What Your AI Infrastructure Needs to Do.: the broader EU AI Act compliance posture, including Articles 9, 11, 13, and 14.
- What is an AI Agent Policy Engine? Definition, Architecture, and How It Differs from Guardrails: the policy enforcement layer that produces the decision verdicts captured in the audit log.
- Trust but Verify: How to Detect Token Count Manipulation in AI API Pipelines: the fire-and-forget telemetry pattern that scales to compliance-grade audit volumes.
- EU AI Act Article 12 (official text) and Article 19 (retention): the authoritative regulatory references.
- AICPA Trust Services Criteria: the SOC 2 Common Criteria, including CC6, CC7, CC8, and CC9.
- OpenTelemetry GenAI Semantic Conventions: the field naming standard the audit schema can align to without breaking existing tracing tooling.
- ISO/IEC 42001:2023: the AI management system standard whose audit requirements form a near-superset of the EU AI Act and SOC 2 fields.
Disclosure: Govyn is an open-source AI governance proxy that produces the kind of tamper-evident, policy-aware audit log this post describes. We build the infrastructure. The regulatory analysis here is grounded in published regulation text, AICPA Trust Services Criteria, and the ISO and OpenTelemetry standards, all cited inline. We have a commercial interest in proxy-layer audit logging, and we believe the architectural case stands on its own. Evaluate the evidence independently.
Govyn is open source, MIT licensed. Self-host or cloud-hosted. Audit log pipeline ships in core.