agenticaisecured

LLM Security Risks: The OWASP LLM Top 10 Explained

LLM security is the practice of protecting applications built on large language models from attack and misuse. The main LLM security risks are catalogued in the OWASP LLM Top 10, covering prompt injection, sensitive information disclosure, supply chain flaws, data and model poisoning, improper output handling, excessive agency, and more.

By Sunny Patel Updated

Independent SEO consultant & AI practitioner who builds and tests these tools.

LLM Security Risks: The OWASP LLM Top 10 Explained

LLM security is the practice of protecting applications built on large language models from attack and misuse. The main LLM security risks are catalogued in the OWASP LLM Top 10, covering prompt injection, sensitive information disclosure, supply chain flaws, data and model poisoning, improper output handling, excessive agency, and more. This pillar names each risk, maps it to its OWASP identifier, and points to the deeper guide for each one.

TL;DR:

  • LLM security is the protection of large-language-model applications across the model, prompts, data, application, and any tools an agent can call.
  • The authoritative catalogue of LLM security risks is the OWASP GenAI Security Project LLM Top 10, identified LLM01 to LLM10.
  • For autonomous agents, the sharpest risks are prompt injection (LLM01), improper output handling (LLM05), and excessive agency (LLM06).
  • Govern the programme with the NIST AI RMF, then prove your controls cover each OWASP risk. Start at the OWASP LLM Top 10 hub.

What is LLM security?

LLM security is the discipline of protecting applications built on large language models from attack, abuse, and unintended behaviour. It is broader than securing the model file alone: it spans the prompts, the retrieval and data pipeline, the surrounding application code, and any tools or APIs an agent is allowed to call. The threat surface runs the whole path from user input, through the model, to the output and any downstream action that output triggers.

The reason LLM security needs its own discipline is that the model treats instructions and data as the same stream of tokens. A traditional application can usually separate code from input; an LLM cannot reliably tell a trusted instruction from a piece of untrusted text pasted into a document it is asked to summarise. That single property is the root of most LLM security risks.

What are the main LLM security risks?

The main LLM security risks are catalogued by the OWASP GenAI Security Project as the OWASP LLM Top 10, a prioritised list of the ten most critical risks in applications that use large language models. Each entry carries an identifier from LLM01 to LLM10, a definition, example attacks, and recommended mitigations. The list is revised periodically, so always confirm current wording and ordering against the source.

Below, each risk is named, briefly explained, and linked to its deeper cluster page where one exists.

OWASP IDRiskWhat it means in one line
LLM01Prompt InjectionUntrusted text overrides or hijacks the model’s instructions.
LLM02Sensitive Information DisclosureThe model leaks secrets, personal data, or proprietary content.
LLM03Supply ChainCompromised models, datasets, plugins, or dependencies.
LLM04Data and Model PoisoningTainted training or fine-tuning data corrupts model behaviour.
LLM05Improper Output HandlingOutput is trusted and acted on without validation or sanitisation.
LLM06Excessive AgencyThe model is granted too much autonomy, permission, or tool access.
LLM07System Prompt LeakageThe hidden system prompt and its secrets are extracted.
LLM08Vector and Embedding WeaknessesFlaws in retrieval, embeddings, or vector stores in RAG systems.
LLM09MisinformationThe model produces false or fabricated content presented as fact.
LLM10Unbounded ConsumptionUncontrolled resource use drives cost, denial of service, or abuse.

What is LLM01 Prompt Injection?

Prompt injection is untrusted input overriding the instructions the model was given. Because the model cannot reliably separate trusted instructions from untrusted data, an attacker can embed commands in a web page, a document, or an email that the model then follows. It is the top-ranked risk and the foundation of most agent compromises. See prompt injection explained for attack patterns and defences.

What is LLM02 Sensitive Information Disclosure?

Sensitive information disclosure is the model revealing data it should not, such as secrets, personal data, or proprietary content. This can happen through the model’s own output, through retrieved context, or through tool responses surfaced back to a user. For agents that can read files or query systems, this overlaps with data exfiltration risks for AI agents.

What is LLM03 Supply Chain?

Supply chain risk covers compromised or untrustworthy components in the LLM stack: base models, fine-tuned weights, datasets, plugins, and software dependencies. A poisoned model pulled from a public hub or a malicious package can undermine the whole application before a single prompt is sent, so provenance and integrity checks matter as much as runtime controls.

What is LLM04 Data and Model Poisoning?

Data and model poisoning is the deliberate tampering of training, fine-tuning, or retrieval data to manipulate model behaviour, plant backdoors, or degrade quality. Because the effect is baked into the weights or the knowledge base, it can be hard to detect at runtime, which is why data provenance and lifecycle controls are the primary defence.

What is LLM05 Improper Output Handling?

Improper output handling is treating model output as safe and acting on it without validation or sanitisation. If output is passed into a shell, a database query, a browser, or another system unchecked, the LLM becomes a route to classic injection flaws. For agents this is acute, because output frequently becomes the next tool call.

What is LLM06 Excessive Agency?

Excessive agency is granting the model too much autonomy, permission, or tool access relative to the task. The more an agent can do without a human in the loop, the more damage a single hijacked instruction can cause. Deciding how much authority an agent should hold is a governance question first and a technical one second. See excessive agency explained.

What are LLM07 to LLM10?

These four complete the list:

  • LLM07 System Prompt Leakage is the extraction of the hidden system prompt, which may expose secrets, rules, or logic an attacker can then exploit.
  • LLM08 Vector and Embedding Weaknesses are flaws in the retrieval, embedding, and vector-store layer of RAG systems, including poisoned or leaking embeddings.
  • LLM09 Misinformation is the model generating false or fabricated content that users or downstream systems trust as fact.
  • LLM10 Unbounded Consumption is uncontrolled resource use, leading to runaway cost, denial of service, or model abuse such as extraction.

Which LLM security risks matter most for AI agents?

For autonomous agents, the highest-priority LLM security risks are LLM01 Prompt Injection, LLM05 Improper Output Handling, and LLM06 Excessive Agency, because an agent acts on model output and calls tools. The dangerous chain is simple: injected instructions become trusted output, trusted output becomes a tool call, and excessive permissions let that tool call do real harm. Breaking any link in that chain reduces the blast radius.

This is why agent hardening focuses on the tool layer. Scoping every tool to least privilege, validating output before it is acted on, separating trusted instructions from untrusted data, and requiring human approval for high-impact actions are the controls that turn an abstract risk into a verified defence.

How do you turn this list into a defence?

Treat the OWASP LLM Top 10 as the checklist, not the programme. A workable sequence:

  1. Inventory every LLM, agent, RAG pipeline, and tool integration in scope.
  2. Map each component to the relevant LLM01 to LLM10 risks.
  3. Assign a named, testable control to each risk that applies.
  4. Govern the whole effort with a framework such as the NIST AI RMF, so accountability and process are documented.
  5. Audit and improve continuously, because models, dependencies, and attacks all change.

Naming a control is not the same as proving it works. The frameworks give you the discipline and the audit trail; the OWASP LLM Top 10 gives you the concrete threats those audits must cover.

Where to go next

Start with the OWASP LLM Top 10 hub to see each risk mapped to defensive tooling. For the two risks most tied to agents, read prompt injection explained and excessive agency explained. To understand how a hijacked agent leaks data, see data exfiltration risks for AI agents.

This article is maintained against primary sources: the OWASP LLM Top 10 from the OWASP GenAI Security Project, and the NIST AI Risk Management Framework.

Frequently asked questions

What is LLM security?

LLM security is the discipline of protecting applications built on large language models from attack, abuse, and unintended behaviour. It spans the model, the prompts, the data pipeline, the surrounding application, and any tools an agent can call, covering the full path from user input to model output to downstream action.

What are the main LLM security risks?

The main LLM security risks are catalogued in the OWASP LLM Top 10: prompt injection, sensitive information disclosure, supply chain weaknesses, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.

What is the OWASP LLM Top 10?

Per the OWASP GenAI Security Project, the OWASP LLM Top 10 is a prioritised list of the ten most critical security risks in applications that use large language models. Each entry, identified LLM01 to LLM10, carries a definition, example attacks, and recommended mitigations.

Is prompt injection the biggest LLM security risk?

Prompt injection, listed as LLM01 in the OWASP LLM Top 10, is the top-ranked risk and is widely treated as the foundational LLM threat because untrusted text can override instructions. For autonomous agents it pairs dangerously with excessive agency and improper output handling.

How do I reduce LLM security risks for an AI agent?

Scope every tool to least privilege, validate and sanitise model output before it is acted on, separate trusted instructions from untrusted data, and add human approval for high-impact actions. These controls directly counter the highest-priority agent risks of prompt injection, improper output handling, and excessive agency.