The confused deputy problem in AI agents and MCP
A confused deputy is a privileged program tricked into misusing its authority for a less-privileged caller. In AI agents and MCP, the agent holds broad credentials, so prompt injection or a malicious tool can make it act with its own authority on the attacker's behalf. Mitigate with capability-based access, per-request scoping, user-context propagation, and least privilege.
Independent SEO consultant & AI practitioner who builds and tests these tools.
The confused deputy problem in AI agents and MCP
A confused deputy is a privileged program tricked into misusing its authority on behalf of a less-privileged caller. In AI agents and MCP, the agent itself is the deputy: it holds broad credentials, so a prompt injection or a malicious tool can make it act with its own authority for an attacker who holds nothing. The fix is structural, not behavioural: replace ambient authority with capability-based access, per-request scoping, and propagated user context so there is no broad authority left to confuse.
TL;DR:
- A confused deputy is a trusted program with broad authority that is fooled into using it for a caller who lacks that authority.
- In AI agents and MCP, the agent holds broad credentials, so prompt injection or a malicious tool borrows the agent’s authority without holding any of its own.
- MCP servers acting as intermediaries are a textbook case, especially with token passthrough or static client IDs, per the Model Context Protocol authorization spec.
- The durable mitigations are capability-based access, per-request scoping, user-context propagation, and least privilege.
- This sits in the same family as excessive agency; pair it with MCP security best practices and the guides library.
What is the confused deputy problem?
The confused deputy problem is a classic computer-security flaw, first named in Norm Hardy’s 1988 paper. A program with legitimate broad authority, the deputy, is manipulated into exercising that authority for a caller who does not hold it. The deputy is not compromised and not malicious; it is simply confused about whose request it is really serving. The original example was a compiler that could write to a privileged billing file: a user without that access could trick the compiler into overwriting the file by passing it as an output path, because the compiler used its own authority and never checked the user’s.
The root cause is ambient authority: the deputy holds standing permissions that apply to every request, with no per-request link between the requester’s rights and the action taken. Capability-security literature treats this as the canonical reason to bind authority to the request rather than to the actor.
Why are AI agents natural confused deputies?
AI agents are unusually exposed to this pattern because they combine three things: broad standing credentials, a tool-calling loop, and untrusted input flowing straight into their instructions. An agent is given wide authority on purpose, because calling tools and reaching other systems is what makes it useful, and that same breadth is exactly what a confused deputy attack needs.
Consider an agent with a database connection, an email tool, and a file store, all authenticated as a single high-privilege service account. The agent decides which tool to call based on text it reads, and some of that text is untrusted: web pages, emails, documents, or tool outputs. When a prompt injection plants an instruction inside that untrusted text, the agent treats it as a genuine task and runs it with its own broad credentials. The attacker never authenticates and never holds a permission; they simply borrow the deputy’s authority.
| Element | Classic confused deputy | AI agent or MCP version |
|---|---|---|
| The deputy | Privileged service or compiler | Agent or MCP server with broad credentials |
| The authority | Standing file or billing permissions | Service-account tokens, API keys, OAuth grants |
| The trick | Crafted filename or argument | Prompt injection or a malicious tool result |
| The attacker’s gain | Action they could not perform themselves | Reads, writes, or transactions under the agent’s identity |
| The root cause | Ambient authority, no requester check | Ambient authority, no user-context propagation |
How does a malicious tool trigger it?
Prompt injection is one trigger; a malicious or compromised tool is the other. In an agent or MCP setup, the model trusts the outputs of the tools it calls, and those outputs can carry instructions. A hostile tool can return text engineered to look like a system directive, redirecting the agent to call other, more sensitive tools with its standing authority. Because the agent aggregates many tools under one identity, one untrustworthy tool can reach the authority of all the others.
MCP makes the intermediary role explicit. An MCP server frequently sits between the client and a third-party API, holding credentials for that downstream service. Per the MCP authorization specification, such a server can become a confused deputy in two ways: token passthrough, where it forwards a token issued for one audience to a downstream service that wrongly trusts it; and static client IDs, where a proxy reuses a single registered client without obtaining per-request user consent, letting a stolen authorization code yield an access token without the user agreeing. The spec’s response is firm: a server MUST validate that a token was issued specifically for it, and MUST NOT pass through a client’s token to upstream APIs.
How do you mitigate the confused deputy problem in AI agents?
You mitigate it by removing the ambient authority that makes the deputy confusable in the first place. The controls below move authority from the actor onto the request:
- Capability-based access. Give each request a narrow, unforgeable capability that grants exactly the action allowed, instead of letting the agent hold broad standing permissions it can apply to anything. With no ambient authority, there is nothing for an injected instruction to commandeer.
- Per-request scoping. Mint short-lived, tightly scoped credentials for each task rather than a long-lived high-privilege key. The blast radius of any single confused action shrinks to that one scope.
- User-context propagation. Carry the end user’s identity and permissions through every tool call and into the downstream system, so authorisation is decided against the real requester, not the agent’s service account. This is the direct structural cure for the confused deputy: the deputy can no longer lend authority the caller never had.
- Least privilege. Default every tool and credential to the minimum access the task needs, and enforce that authorisation in the downstream system rather than trusting the agent to self-police. See the wider treatment in excessive agency.
- Token audience validation, no passthrough. For MCP and any intermediary, validate that inbound tokens were issued for you, require per-request consent for proxied flows, and never forward a client token upstream, exactly as the MCP best practices require.
The unifying idea is that authority should travel with the request, scoped to the original caller, never as a standing power the agent can be talked into spending. An agent that can only act within capabilities granted for the user in front of it has no broad authority to misuse, even when an upstream input is hostile.
Where to go next
The confused deputy is the attack; excessive agency is the over-provisioning that enables it, so read the two together. For the protocol-level controls, including token audience binding and the no-passthrough rule, see MCP security best practices and the Model Context Protocol docs. Browse the full guides library for related authority and least-privilege topics.
Frequently asked questions
What is the confused deputy problem in simple terms?
It is when a trusted program with broad authority is fooled into using that authority on behalf of someone who lacks it. The deputy is not malicious; it is confused about who it is really acting for, so it lends its own permissions to an attacker's request.
How does the confused deputy problem affect AI agents?
An agent holds broad credentials to do its job. If a prompt injection or a malicious tool plants an instruction, the agent executes it with its own authority, so the attacker borrows the agent's access without holding any credentials themselves.
How does it appear in MCP servers?
Per the Model Context Protocol authorization spec, an MCP server acting as an intermediary to third-party APIs can become a confused deputy, especially when it passes through tokens or reuses a static client ID without per-request user consent.
What is the main fix for the confused deputy problem?
Capability-based access. Instead of the deputy holding ambient authority it applies to any request, each request carries a narrow, unforgeable capability scoped to exactly what that caller is allowed to do, so a confused deputy has no broad authority to misuse.
Is the confused deputy problem the same as excessive agency?
They overlap. Excessive agency is the over-provisioning of an agent's functionality, permissions, or autonomy. The confused deputy is the attack pattern that exploits that over-provisioning, turning broad standing authority into damage on an attacker's behalf.