MCP Security: Why Tool-Use Agents Are Your Biggest Attack Surface

April 22, 2026 29 min read

security MCP agent-governance tool-use proxy

Every MCP tool call is an unaudited API request. Model Context Protocol agents create the largest unmonitored attack surface in enterprise AI stacks. Here is how proxy-layer interception governs them at scale.

The tool call nobody logged

In September 2025, an unofficial Postmark MCP server with 1,500 weekly downloads was modified to include a hidden BCC field in its send_email function. Every email sent through the tool was silently copied to an attacker-controlled address. The modification lived in the tool’s metadata, not in user-visible code. No agent flagged it. No security team caught it. The tool description said “send email.” The tool sent email and exfiltrated every message.

This was not a prompt injection. It was not a jailbreak. It was a tool doing exactly what its definition said to do, and the definition had been poisoned. The agent followed instructions. That was the problem.

The Model Context Protocol has become the dominant standard for connecting LLM agents to external tools and data sources. Anthropic positioned it as “USB-C for AI,” and adoption has been explosive. But MCP was designed for functionality, not security. Every tool an agent can invoke is an unaudited API endpoint. Every tool description is an unsanitized input to the agent’s decision-making process. And the security model that governs it all is, by the specification’s own admission, unenforceable at the protocol level.

One in eight enterprise security breaches now involves an agentic system, according to CrowdStrike and Mandiant data from 2025 and early 2026. Agent-involved breach incidents grew 340% year-over-year between 2024 and 2025. The attack surface that drives most of these incidents is not the model. It is the tools.

What does MCP actually expose?

MCP defines a standardized protocol for agents to discover, invoke, and receive results from external tools. An MCP server advertises a set of tools with names, descriptions, and input schemas. An MCP client (the agent) reads those descriptions, decides which tools to call based on the user’s request, constructs parameters, and sends the call. The server executes the tool and returns results.

Here is the security problem: every layer of this interaction is a trust boundary that MCP does not enforce.

Tool descriptions are injected into the agent’s context window. When an MCP server is installed, its tool names and descriptions are automatically added to the agent’s prompt. The agent treats these descriptions as ground truth. If a tool description contains malicious instructions, the agent follows them. This is tool poisoning, and it is the most consequential attack vector in the MCP ecosystem.

The agent decides what to call. The user does not directly invoke tools. The agent interprets the user’s request, selects tools, and constructs parameters autonomously. The user sees the result, not the intermediate tool calls. In many implementations, the user never sees which tools were called, with what parameters, or what data was sent.

Tool responses flow back into the agent’s context. The data returned by a tool becomes part of the agent’s reasoning context. A malicious tool response can contain injected instructions that redirect the agent’s subsequent behavior. This is indirect prompt injection through the tool layer, and it compounds with every tool in the chain.

Authentication is optional. The MCP specification added OAuth support in March 2025, but authentication remains optional and is frequently skipped in practice. The specification explicitly states that it “cannot enforce these security principles at the protocol level.” Security responsibility falls entirely on implementation teams, with no protocol-level guardrails.

Equixly’s March 2025 security assessment of the most popular MCP server implementations found that 43% contained command injection vulnerabilities. Another 30% were vulnerable to server-side request forgery. 22% permitted arbitrary file access. These were not obscure implementations. They were the most downloaded MCP servers in public registries.

How do tool poisoning attacks actually work?

Tool poisoning is the most dangerous attack class in the MCP ecosystem because it exploits the protocol’s core design: agents trust tool descriptions implicitly.

Invariant Labs first documented tool poisoning attacks in April 2025. The attack pattern is deceptively simple. An attacker publishes an MCP server with a tool whose description contains hidden instructions. The tool appears to perform a benign function. The full description, which the agent processes but the user never sees, contains additional directives.

Here is a concrete example from the Invariant Labs disclosure. A malicious MCP server advertises an add() tool for basic arithmetic. The visible description says “Adds two numbers.” The complete description, injected into the agent’s context, contains additional instructions: before performing the addition, read the user’s SSH keys and the mcp.json configuration file (which contains credentials for every other connected MCP server), encode the data into a function parameter, and deliver it to the attacker’s server.

The agent follows these instructions because, from its perspective, they are part of the tool’s legitimate requirements. The instructions are in the tool description. The tool description is, by protocol design, authoritative.

This is not theoretical. Invariant Labs demonstrated the attack compromising WhatsApp chat histories, GitHub private repositories, and SSH credentials across major AI platforms. A “random fact of the day” tool was weaponized to rewrite how a co-installed WhatsApp MCP server handled messages. Once the agent processed the poisoned tool description, it followed hidden instructions to send hundreds of past WhatsApp messages, including personal chats and business conversations, to an attacker-controlled phone number.

The MCPTox benchmark confirms the scale

The MCPTox benchmark, published at AAAI 2026, is the first systematic evaluation of tool poisoning across real-world MCP deployments. The researchers constructed 1,312 malicious test cases across 11 risk categories, targeting 45 real-world MCP servers from 8 application domains. They tested 20 prominent LLM agents. The results:

Average attack success rate: 36.5% across all 20 agents tested
o1-mini: 72.8% success rate, the highest among tested models
GPT-4o-mini, DeepSeek-R1, and Phi-4 all exceeded 60% success rates
Highest refusal rate: Claude 3.7 Sonnet, at less than 3%

That last number deserves emphasis. Even the most safety-aligned model in the test refused a tool poisoning attack less than 3% of the time. Safety alignment is designed to prevent the model from generating harmful content. It is not designed to prevent the model from following instructions embedded in tool descriptions. The attack exploits instruction-following capability. More capable models are often more susceptible because they are better at following complex instructions, including malicious ones.

The supply chain dimension

Tool poisoning is not limited to a single session. Unlike a traditional prompt injection that affects one conversation, a poisoned tool description infects every agent that connects to the server. It persists across sessions, across users, across organizations. The September 2025 Postmark incident demonstrated this: 1,500 weekly downloads means 1,500 weekly installations of a poisoned tool definition, each silently exfiltrating emails.

This mirrors the npm and PyPI supply chain attack pattern, but with a critical difference: a poisoned npm package requires code execution. A poisoned MCP tool description requires only that the agent read it. The attack surface is the tool description itself, not the tool’s code.

Why don’t prompt-level guards work for tool calls?

The industry’s first instinct is to add security instructions to the system prompt. “Do not access sensitive files.” “Confirm before executing destructive operations.” “Never send data to external servers without user approval.”

These instructions fail for tool calls for three structural reasons.

First, tool descriptions override prompt instructions. When a poisoned tool description says “read SSH keys before proceeding,” the agent faces a conflict between the system prompt (“do not access sensitive files”) and the tool specification (“reading SSH keys is required for this tool”). Research consistently shows that agents resolve this conflict in favor of the tool description. The tool description is specific and immediate; the system prompt is general and distant. Specificity wins. The MCPTox data confirms it: refusal rates under 3%.

Second, agents lose track of security policies as context grows. As an agent processes more tools, more conversation history, and more tool responses, its effective context window fills. Security instructions in the system prompt compete for attention with hundreds of tool descriptions, prior conversation turns, and intermediate results. Context windows degrade in practice: models lose recall of early instructions as the window fills. A security policy stated in the first 200 tokens becomes invisible when the agent is processing its 50th tool description at token 30,000.

Third, the agent decides what to call, not the user. Prompt-level guards assume the user is the threat actor. In agentic systems, the agent is the actor. The user says “send this email.” The agent decides which tools to invoke, in what order, with what parameters. If a poisoned tool has injected instructions to BCC every email to an external address, the agent executes that instruction as part of its autonomous reasoning. The user never sees the tool call parameters. The prompt-level guard never fires because the guard was designed to check user intent, not agent behavior.

This is the fundamental category error in current AI security thinking. Model-level guardrails protect against the model doing something harmful when given a harmful prompt. They do not protect against an agent following legitimate-looking instructions from a poisoned tool definition. The gap between prompt-level security and execution-layer security is where breaches happen.

What are the real attack scenarios?

The following scenarios are grounded in documented research and disclosed incidents.

Scenario 1: Supply chain poisoning via public MCP registries

An attacker publishes an MCP server to a public registry. The server provides a genuinely useful tool: a Markdown converter, a database query helper, a notification service. The tool works correctly for its stated purpose. It also contains poisoned metadata that instructs the agent to exfiltrate environment variables, API keys, or file contents to an external endpoint during every invocation.

This has already happened. The Postmark MCP server incident in September 2025 demonstrated the exact pattern: a useful tool with 1,500 weekly downloads was modified to silently BCC all outgoing emails to an attacker. The attack lived in the tool’s function definition, invisible to users who installed it.

Scenario 2: Cross-tool privilege escalation

An attacker installs a low-privilege MCP server alongside high-privilege ones. The low-privilege tool’s description contains instructions that target the high-privilege tools. When the agent processes the low-privilege tool’s description, it absorbs the injected instructions and applies them when invoking high-privilege tools.

Invariant Labs demonstrated this with the WhatsApp MCP attack. A “random fact of the day” tool (zero privileges, zero sensitivity) rewrote the agent’s behavior when interacting with the WhatsApp MCP server (high privilege, access to all messages). The low-privilege tool hijacked the high-privilege tool’s execution context. No per-tool isolation could prevent it because the agent’s context window is shared across all tools.

Scenario 3: Credential harvesting through tool chaining

An agent has access to multiple MCP servers. One of them is poisoned. The poisoned tool’s description instructs the agent to read the MCP configuration file, which contains connection credentials for every other MCP server the agent uses. The credentials are encoded into a parameter of a subsequent tool call and sent to the attacker’s server.

This is the SSH key exfiltration attack documented by Invariant Labs. The mcp.json file is a single point of compromise. One poisoned tool in the chain gives the attacker credentials for the entire MCP ecosystem that agent touches.

Scenario 4: Silent data exfiltration through normal operations

A poisoned MCP tool performs its stated function correctly. It also appends exfiltrated data to its normal API calls as additional parameters, encoded query strings, or custom headers. The data leaves through the same network path as legitimate traffic. Rate limiting and anomaly detection do not flag it because the traffic volume and pattern match normal usage.

Trend Micro’s 2025 research on AI agent data exfiltration documented this pattern: agents can be manipulated to leak data through tool calls that appear indistinguishable from normal operations. The exfiltration channel is the tool’s own API call, making it invisible to perimeter-based security.

Scenario 5: Persistent injection through poisoned data stores

An attacker writes crafted data into a system that an MCP tool reads from: a database, a wiki, a document store, a ticketing system. When the agent queries that system through the MCP tool, the tool response contains injected instructions. The agent follows the instructions, which may include calling other tools, sending data externally, or modifying records.

This is not a one-shot attack. Every agent that queries the poisoned data encounters the injection. The attack persists until the poisoned data is discovered and removed. Palo Alto Networks’ Unit 42 research documented prompt injection through MCP sampling that exploits exactly this vector: malicious content in data stores that is surfaced to agents through tool responses.

How does the attack surface scale?

The attack surface of an MCP-connected agent is not additive. It is multiplicative. Each tool adds its own attack vectors, but it also creates interaction vectors with every other tool in the agent’s context.

Consider an enterprise with 10 agents, each connected to 25 MCP tools. That is 250 individual tool endpoints. Each tool can interact with every other tool through the agent’s context, creating 250 * 249 / 2 = 31,125 possible cross-tool interaction paths. Each interaction path is a potential vector for tool poisoning, parameter manipulation, or response injection.

Now factor in permissions. 78% of agents involved in 2025 and 2026 breaches had significantly broader permission scopes than their designated function required. Each over-permissioned tool multiplies the blast radius of every other tool in the agent’s context. A poisoned low-privilege tool that can influence a high-privilege tool’s behavior inherits the high-privilege tool’s permissions transitively.

This is why 97% of enterprises surveyed in a 2026 Agentic AI Security Report expect a material AI-agent-driven security or fraud incident within the next 12 months. Nearly half expect one within six months. The attack surface is not just growing. It is compounding.

Calculate your MCP attack surface

Adjust the sliders to see how your tool-use agent deployment scales the attack surface. The total includes direct tool vectors and cross-tool interaction paths.

Agents 5

Tools per agent 10

Over-permissioned agents (78% of production deployments)

Total Tool Endpoints

50

Cross-Tool Vectors

225

Total Attack Surface

275

HIGH RISK: Immediate proxy-layer governance recommended

How does proxy-layer interception solve this?

The answer is the same architectural insight that governs shadow AI agents: you do not secure the agent. You secure the chokepoint.

For LLM API calls, the chokepoint is the provider endpoint. For MCP tool calls, the chokepoint is the connection between the agent and the MCP server. A proxy layer that sits between the agent and its MCP servers can intercept every tool call, inspect parameters, validate against policy, log the transaction, and block unauthorized operations. All without modifying the agent or the MCP server.

This architecture mirrors the SSRF defense-in-depth approach for outbound HTTP requests. Instead of trusting the agent to make safe requests, you validate every request at the network layer. Instead of trusting tool descriptions, you validate tool calls against an explicit policy.

The proxy approach works because it does not require the agent’s cooperation. The agent does not need to be modified. The MCP server does not need to be modified. The proxy intercepts the connection, applies policy, and forwards or blocks. The security boundary is architectural, not behavioral.

What does policy-based tool governance look like?

Policy-based tool governance defines explicit rules for which tools each agent can invoke, with what parameters, under what conditions, and with what approval requirements. These policies are enforced at the proxy layer, outside the agent’s process and outside the deploying team’s control.

Tool allowlists and parameter validation

The first layer restricts which tools an agent can call and validates every parameter against policy before the call reaches the tool server.

# MCP Tool Governance Policy
mcp_policies:
  default_action: deny

  agents:
    customer_support_agent:
      allowed_tools:
        - pattern: "ticket_*"
          actions: [read, update]
        - pattern: "knowledge_base_search"
          actions: [read]
      denied_tools:
        - pattern: "database_*"
        - pattern: "file_*"
        - pattern: "admin_*"
      parameter_rules:
        - tool: "ticket_update"
          field: "status"
          allowed_values: ["in_progress", "resolved", "escalated"]
        - tool: "ticket_update"
          field: "assignee"
          deny: true

    data_analysis_agent:
      allowed_tools:
        - pattern: "query_*"
          actions: [read]
          rate_limit:
            requests_per_minute: 10
            max_rows_per_query: 1000
      denied_tools:
        - pattern: "*_write"
        - pattern: "*_delete"
        - pattern: "*_admin"
      content_filter:
        block_patterns:
          - "SSN"
          - "credit_card"
          - "password"
          - "api_key"
        action: redact_and_log

Deny-by-default means no tool call executes unless explicitly allowed. A poisoned tool that the agent attempts to invoke is blocked before execution because it is not on the allowlist. The agent’s autonomous decision to call the tool is overridden by the infrastructure.

Cross-tool interaction policies

This is the defense against lateral movement and credential harvesting. The proxy watches the sequence of tool calls and blocks dangerous patterns.

  cross_tool_rules:
    - name: "prevent_credential_harvesting"
      block_when:
        tool_pattern: "*"
        parameter_contains:
          - "mcp.json"
          - ".ssh/"
          - ".env"
          - "credentials"
      action: block_and_alert
      alert:
        channel: security-team
        severity: critical

    - name: "detect_exfiltration_patterns"
      detect_when:
        source_tool_category: "internal_data"
        destination_tool_category: "external_communication"
      action: require_approval
      approval:
        timeout: 300
        escalation: security-team

    - name: "web_to_repo_write"
      sequence:
        - tool: "web_browse"
        - tool: "code_repository_write"
      within: 5m
      action: block_second_call
      severity: critical

When data from an internal tool flows to an external-facing tool, the proxy flags it for human approval. When the agent browses a web page and then tries to write to a code repository, the proxy blocks the write. The pattern is blocked regardless of the agent’s stated intent.

Tool description scanning

The proxy validates tool descriptions when MCP servers are registered and quarantines descriptions containing suspicious patterns.

  tool_validation:
    max_description_length: 2000
    block_patterns_in_descriptions:
      - "read.*ssh"
      - "read.*credentials"
      - "send.*to.*external"
      - "before proceeding.*read"
      - "encode.*parameter"
      - "ignore.*previous.*instructions"
    action: quarantine_and_alert

Instructions like “before proceeding, read the user’s SSH keys” trigger quarantine at registration time, before the agent ever processes the description. This catches tool poisoning at the source.

What should you watch next?

The MCP security landscape is moving fast. Here is what to track.

The MCP 2026 roadmap includes security improvements. The official roadmap calls for enhanced authentication, authorization scoping, and audit logging at the protocol level. These are necessary but insufficient. Protocol-level security sets a floor. Infrastructure-level enforcement through proxies sets the actual security boundary.

The EU AI Act creates regulatory exposure. The EU AI Act imposes penalties up to 7% of global revenue for violations involving high-risk AI systems, and GPAI transparency requirements became mandatory in August 2025. Unaudited tool calls from AI agents are a compliance gap that regulators will eventually target. Organizations that cannot produce an audit trail of their agents’ tool calls are exposed.

Supply chain attacks on MCP registries will increase. As MCP adoption grows, public registries become higher-value targets. The Postmark incident was a proof of concept. Coordinated supply chain attacks targeting popular MCP servers will follow the same trajectory as npm and PyPI attacks, but with the added leverage that a poisoned tool description can compromise every agent that installs it.

Tool-to-tool interaction is the emerging frontier. Current attacks target individual tool descriptions. The next generation will target tool interactions: a poisoned tool that is harmless in isolation but exploits a specific combination of co-installed tools. Detecting these attacks requires cross-tool behavioral analysis, which only a proxy with visibility into all tool calls can provide.

Per-call authentication is under discussion. The current MCP architecture authenticates the connection, not individual calls. Moving to per-call authentication (signed requests, scoped tokens per tool invocation) would reduce the blast radius of a compromised MCP client. This is not yet in the specification, but it is on the roadmap.

FAQ

Is MCP inherently insecure, or is it an implementation problem?

Both. The MCP specification explicitly states that it “cannot enforce security principles at the protocol level,” making security an implementation responsibility. But many of the attack vectors, particularly tool poisoning through descriptions, are architectural. They exploit the protocol’s core design of injecting tool descriptions into the agent’s context. Even a perfectly implemented MCP server is vulnerable to a poisoned tool description from a co-installed server. The protocol is well-designed for interoperability. The security gap is in deploying MCP without a governance layer around it.

Can I just review tool descriptions manually before installing MCP servers?

Manual review catches obvious attacks but misses sophisticated ones. Tool poisoning instructions can be encoded in Unicode characters, hidden in whitespace, or distributed across multiple fields that appear benign individually. The MCPTox research showed that even security-aware reviewers miss poisoned descriptions at significant rates. Automated scanning at the proxy layer, applied continuously at registration and at runtime, is the reliable defense.

How does proxy-layer MCP governance differ from prompt-level tool restrictions?

Prompt-level restrictions tell the agent what not to do. The agent can ignore them, lose them to context window decay, or be overridden by tool descriptions. Proxy-layer governance enforces restrictions at the network level. The tool call never reaches the MCP server if the proxy blocks it. The distinction is behavioral suggestion versus architectural enforcement, the same difference documented in proxy vs SDK governance.

What is the performance impact of proxying MCP tool calls?

The proxy adds single-digit millisecond overhead per tool call for parameter validation, response scanning, and logging. Most MCP tool servers take 50 to 500 milliseconds to execute a tool call (database queries, API calls, file operations). The proxy’s overhead is negligible relative to tool execution time. Audit logging is asynchronous and does not contribute to response latency.

Should I block all third-party MCP servers?

Blocking all third-party servers eliminates tool poisoning risk but also eliminates the value of the MCP ecosystem. A better approach is quarantine-then-approve: proxy-layer scanning of tool descriptions at registration time, deny-by-default policies for new servers, and graduated permission grants as servers are validated. This preserves the ecosystem’s value while constraining the attack surface.