agenticaisecured

Supply-chain security for MCP servers: where the risk enters and how to control it

Supply-chain risk for MCP servers enters through third-party servers you do not control, model weights with weak provenance, and transitive npm or pip dependencies. A malicious or compromised server can poison tool descriptions, exfiltrate data, or run arbitrary code inside the agent. Pin versions, verify provenance, vet servers, sandbox them, and apply least privilege.

By Sunny Patel Updated

Independent SEO consultant & AI practitioner who builds and tests these tools.

Supply-chain security for MCP servers: where the risk enters and how to control it

Supply-chain risk for MCP servers enters through third-party servers you do not control, model weights with weak provenance, and transitive npm or pip dependencies that ship inside the server package. A malicious or compromised server can poison tool descriptions, exfiltrate data the agent passes it, or run arbitrary code on the host. The controls are well understood: pin versions, verify provenance, sign releases and produce an SBOM, vet third-party servers, sandbox them, and grant least privilege. This guide walks the attack surface and the defences in order.

TL;DR:

  • Where it enters: third-party MCP servers, model weights and their provenance, direct dependencies, tool packages, and transitive npm or pip dependencies.
  • The core trust problem: an agent trusts a server’s advertised tools, so a compromised server inherits that trust and can act inside the agent.
  • Top attacks: tool-description poisoning, output-borne prompt injection, data exfiltration, and arbitrary code execution from a local server process.
  • The controls: pin and lock versions, sign and SBOM, verify provenance, vet servers, sandbox, and apply least privilege.
  • Read alongside what is MCP, the OWASP LLM Top 10, and MCP security best practices.

Where does supply-chain risk enter an MCP system?

Supply-chain risk enters anywhere your agent runs code or loads data it did not write itself. In an MCP setup that means the server, the model, and everything those two pull in transitively. Per the OWASP Top 10 for LLM Applications, supply chain is its own category, covering compromised components, services, and datasets that undermine system integrity. Note the numbering: the current list files supply chain as LLM05, while LLM03 covers training-data poisoning, so cite the version you are reading.

The entry points break down like this.

Entry pointWhat it isWhy it is risky
Third-party MCP serversServers built by someone else that you connect to your hostYou run their code and trust their tool descriptions; you rarely audit either
Model weights and provenanceThe model file you load or the API you callA tampered or backdoored model can be hard to detect without verifiable provenance
Direct dependenciesLibraries the server author choseA vulnerable or malicious package becomes part of your server
Tool packagesPre-built tool integrations you installEach package adds code and credentials into the agent’s reach
Transitive npm or pip dependenciesPackages your dependencies depend onThe deepest and least visible layer; a compromise here is silent

The transitive layer is the one teams underrate. A single MCP server can pull dozens of indirect packages, any one of which can be typosquatted, hijacked, or shipped with a malicious post-install script. You did not choose those packages and you probably never read them.

How can a malicious or compromised MCP server attack an agent?

A malicious server attacks the agent by abusing the trust the agent places in its advertised capabilities. Because MCP standardises how a host discovers and calls tools, a compromised server can present poisoned tools, return weaponised output, or, as a local process, simply run code on your machine. The Model Context Protocol documentation describes servers exposing tools, resources, and prompts to the host; each of those is an attack vector if the server is hostile.

The main attack patterns:

  1. Tool-description poisoning. The model reads each tool’s description to decide when to call it. A server can hide instructions in those descriptions to steer the model toward leaking data or calling a dangerous tool.
  2. Output-borne prompt injection. A server returns content that the host feeds back to the model, smuggling injected instructions through what looks like ordinary tool output.
  3. Data exfiltration. Whatever the agent passes a tool, the server sees. A compromised server quietly forwards arguments, context, or credentials to an attacker.
  4. Arbitrary code execution. Local servers run as child processes of the host with the user’s privileges, so a malicious package can read files, reach the network, and persist.
  5. Rug-pull updates. A server that behaved well on day one ships a malicious update later, after you have already granted it trust and broad scopes.

For deeper context on how the trust model works, see what is MCP, which sets out the host, client, and server roles.

What controls reduce MCP supply-chain risk?

You reduce the risk by knowing exactly what code runs, proving where it came from, and limiting what it can do. The controls split into provenance, pinning, vetting, and containment, and they reinforce each other. No single control is enough on its own.

How do pinning, signing, and SBOMs help?

Pin server versions and lock every transitive npm or pip dependency to an exact version with a committed lockfile, so a silent malicious update cannot reach your agent and builds stay reproducible. Review and test before you bump a pin.

Signing and a software bill of materials (SBOM) make the artifact auditable. A signed release proves it came from the real maintainer, and an SBOM lists every component inside so you can match it against known-vulnerable packages. Per the SLSA framework, provenance generation, documented evidence of how an artifact was built, is the recommended first step, advancing through higher levels over time. The NIST Secure Software Development Framework sets out matching practices for producers and consumers to communicate that integrity through the acquisition chain.

How do I verify provenance for model weights?

Verify provenance before you load any model file or trust any server release. For model weights that means confirming who published the file and that it has not been tampered with, using checksums, signatures, or attestations from the publisher. Without provenance you cannot distinguish a genuine model from a backdoored copy, so treat an unverifiable weight file the way you would treat an unsigned binary from an unknown source.

How do I vet and contain third-party servers?

Vetting and containment limit the damage a bad server can do even if it slips past the other checks:

  • Vet before use. Read the source, check the maintainer and publish history, confirm it is the official package and not a typosquat, and review the tools and scopes it requests.
  • Sandbox. Run servers in a container or restricted process with no access to the host filesystem, secrets, or network beyond what the job needs.
  • Least privilege. Grant each server the narrowest credentials and scopes that let it work, and require human approval for sensitive actions.
  • Monitor. Log tool calls and watch for a server reaching for data or endpoints outside its remit.

Together these mean a compromised server runs in a box, holds only narrow credentials, and gets caught when it misbehaves. For the full checklist, pair this with MCP security best practices and the OWASP LLM Top 10.

The bottom line

MCP makes integrations cheap, and that same standardisation makes a hostile server cheap to abuse. Supply-chain risk enters through the servers, models, and transitive dependencies you do not personally audit, and a compromised server can act with the agent’s full trust. The defence is layered: pin and lock versions, sign releases and ship an SBOM, verify provenance on weights and servers, vet third parties, then sandbox them under least privilege. None of it is exotic; it is ordinary supply-chain hygiene applied to a new and fast-moving surface.

Frequently asked questions

Is supply-chain risk LLM03 or LLM05 in the OWASP list?

It depends on the version. In the current OWASP Top 10 for LLM Applications, supply chain is listed as LLM05, while LLM03 covers training-data poisoning. Earlier drafts numbered supply chain differently, so cite the version you are reading. Either way, the control set is the same: provenance, pinning, and vetting.

How can a malicious MCP server attack an agent?

A compromised server can poison tool descriptions to manipulate the model, return crafted output that triggers prompt injection, exfiltrate any data the agent passes it, or, because servers often run as local processes, execute arbitrary code on the host. The agent trusts the server's advertised tools, so a bad server inherits that trust.

What is provenance and why does it matter for model weights?

Provenance is verifiable evidence of how an artifact was produced and where it came from. For model weights it answers who trained or published this file and has it been tampered with. Without provenance you cannot tell a genuine model from a backdoored copy, so verifying it before loading weights is essential.

Should I pin MCP server and dependency versions?

Yes. Pin server versions and lock transitive npm or pip dependencies to exact versions with a committed lockfile. Pinning stops a silent malicious update from reaching your agent and makes builds reproducible, so you know exactly what code ran. Review and test before bumping a pin.

How do I vet a third-party MCP server before using it?

Read the source, check the maintainer and publish history, confirm it is the official package and not a typosquat, review the tools and scopes it requests, and run it sandboxed with least privilege first. Prefer signed releases with an SBOM and provenance, and avoid servers that demand broad credentials for a narrow job.