Indirect Prompt Injection Goes Operational

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-04-26

Categories: AI Threat Intelligence
Download PDF

Indirect Prompt Injection Goes Operational

Key Takeaways

Indirect prompt injection (IPI) has crossed the line from proof-of-concept to live exploitation. In late April 2026, two independent telemetry sources — Google’s Security blog and Forcepoint X-Labs — published concurrent analyses confirming that adversaries are now seeding the open web with hidden instructions designed to hijack browsing AI agents, coding assistants, and enterprise copilots [1][2][3]. Palo Alto Networks Unit 42 separately documented twelve detected IPI cases against AI agents, including the first observed real-world payload designed to bypass an AI-based product ad-review system [4]. Google reported a 32% relative increase in malicious IPI content between November 2025 and February 2026 across the 2-3 billion pages it crawls each month [1][3].

The OWASP GenAI Security Project’s Q1 2026 Exploit Round-up Report, published April 14, 2026, frames this trend within a broader pattern: of eight major AI-related incidents documented from January through April 11, only one received a CVE identifier (CVE-2025-59528 in Flowise), while the rest stemmed from misconfiguration, excessive agency, supply-chain failure, or prompt injection [5]. The report explicitly maps the GrafanaGhost vulnerability — disclosed April 7 — as an LLM01 (Prompt Injection) and ASI01 (Agent Goal Hijack) case, a representative end-to-end example of IPI being weaponized against a production enterprise AI feature [5][6].

Three findings stand out for security teams. First, attacks are scaling: Forcepoint identified ten distinct IPI payloads across separate domains, and Unit 42 mapped twenty-two payload-delivery techniques in active use, suggesting organized tooling rather than isolated experimentation [2][4]. Second, intent is escalating: documented payloads now include forced PayPal transfers of $5,000, Stripe-based subscription fraud, recursive file deletion in IDE-integrated agents, API key exfiltration, and biased recruitment screening [2][4]. Third, the disclosure pipeline is incomplete for AI-specific flaws: bug bounties paid by Anthropic, GitHub, and Google for prompt-injection flaws in Claude Code Security Review, Copilot Agent, and Gemini CLI Action did not result in CVEs or public advisories, leaving downstream users without standard tracking artifacts [7].

Background

Indirect prompt injection differs from the simpler direct variant in a way that matters for enterprise risk. In a direct attack, a user knowingly submits malicious text to an LLM. In an indirect attack, the malicious text is planted in a third-party data source — a web page, a calendar invite, a log line, a PDF, an email, an issue comment — that an AI agent later ingests as part of its normal operation. Current LLMs frequently fail to distinguish between content they are asked to process and content that is trying to instruct them, particularly when no provenance signaling or structured-prompt boundary is enforced — the failure mode that OWASP elevated to LLM01 in the 2025 LLM Top 10, and the category remains the highest-ranked risk in the 2026 update [8].

Two factors converged in early 2026 to shift IPI from theoretical to operational. The first was the rapid deployment of agentic features that broke the old assumption that LLM output was a passive recommendation. Browser agents now click links and fill forms; coding assistants execute shell commands; CLI agents commit code and post to GitHub. Each capability turned a previously contained text-generation flaw into a path to unauthorized action. The second factor was the standardization of injection techniques. Researchers found that the same trigger phrases — “Ignore previous instructions,” “If you are an LLM,” “If you are a large language model” — appeared across unrelated domains, pointing to shared toolkits and templates rather than independent reinvention [2].

The OWASP GenAI Security Project responded by formalizing two complementary frameworks. The LLM Top 10 for 2025 catalogs ten generic risks for any LLM-backed application, with LLM01 (Prompt Injection), LLM02 (Sensitive Information Disclosure), LLM03 (Supply Chain), LLM05 (Improper Output Handling), and LLM06 (Excessive Agency) most frequently invoked in Q1 2026 incidents [5][8]. The companion Top 10 for Agentic Applications 2026 adds agentic-specific risks: ASI01 (Agent Goal Hijack), ASI02 (Tool Misuse and Exploitation), ASI05 (Unexpected Code Execution), and ASI09 (Human-Agent Trust Exploitation) [5]. The dual-framework mapping in the Q1 round-up makes it possible to trace a single incident across model-layer and agent-layer failures, which is essential when the same attack chain crosses both.

Security Analysis

What the In-the-Wild Telemetry Shows

Google’s Security blog reported that across roughly 2-3 billion crawled pages per month, the share of pages carrying malicious IPI grew 32% in relative terms between November 2025 and February 2026, with attackers favoring static sites, blogs, forums, and comment sections as injection vectors [1]. Both Google and Forcepoint emphasized that they did not observe sophisticated, coordinated campaigns of the sort traditionally tracked under named threat-actor groups; instead, they observed shared injection templates appearing across unrelated domains, indicating organized tooling and template libraries rather than centralized command-and-control [1][3]. The operational consequence is a large attack surface, low cost per attempt, and persistent background pressure on every agent that touches untrusted web content.

Forcepoint’s X-Labs team detailed ten verified payloads collected through active threat hunting [2]. On faladobairro.com, attackers hid a sudo rm -rf command targeting a backup folder named agy/BU, designed to fire when an IDE-integrated coding agent or terminal copilot summarized the page. On perceptivepumpkin.com, hidden instructions directed AI agents to send a fixed $5,000 transfer via PayPal.me — a “weaponized payload intended for immediate execution,” in the researcher’s phrasing. On thelibrary-welcome.uk, a hidden HTML comment ordered the model to leak a secret API key. Other payloads attempted attribution hijacking (redirecting credit and soliciting consulting services), false copyright suppression of AI responses, and AI-targeted denial of service through forced repetitive output. The intent distribution skews toward direct monetization and disruption rather than reconnaissance.

Unit 42’s twelve case studies extend this picture to AI agents that browse autonomously [4]. Cases included SEO poisoning to promote phishing domains impersonating legitimate betting platforms; commands to drop backend databases delivered through CSS-suppressed text; subscription-fraud payloads using dynamic JavaScript with Base64 encoding; and a recruitment-manipulation payload that attempted to coerce an AI hiring screener into approving fabricated candidates. The first observed real-world IPI specifically designed to bypass an AI-based product ad-review system appeared on reviewerpress.com, employing twenty-four distinct injection attempts using multiple delivery techniques to push fake military glasses with fabricated discounts. Across the cases, Unit 42 catalogued twenty-two payload-delivery techniques. The leading methods, using the report’s own labels, are summarized below.

Concealment Technique Share of Cases Mechanism
Visible plaintext (concealed from humans) 37.8% Zero font-size, off-screen positioning, opacity manipulation
HTML attribute cloaking 19.8% Embedding prompts in data-* attributes
CSS rendering suppression 16.9% display:none, visibility:hidden
Encoding and obfuscation balance XML/SVG encapsulation, CDATA sections, Unicode tag characters
Runtime assembly balance Base64-decoded payloads released after timed page-load delays

Source: Unit 42, March 2026 [4].

Unit 42 reports that 85.2% of cases employed a social-engineering frame — typically authority overrides such as “this is a security update” or persuasive language framing the malicious instruction as a routine system task — alongside the concealment techniques above [4]. About 75.8% of injected pages contained a single payload, while the remainder layered multiple instructions [4]. Multi-payload pages are consistent with attackers iterating on templates or stacking instructions for redundancy; whether this represents deliberate A/B testing against agent guardrails is not yet established by the available telemetry.

Q1 2026 Incidents in the OWASP GenAI Round-up

The OWASP GenAI Q1 2026 Exploit Round-up Report consolidates eight major incidents from January 1 through April 11, 2026, and maps each to the LLM Top 10 and the Agentic Top 10 [5]. The table below summarizes the incidents most relevant to indirect prompt injection and adjacent agentic failure modes.

Incident Disclosure OWASP LLM Risks OWASP Agentic Risks Notable Detail
Mexican Government breach via Claude Reported Feb 25, 2026 LLM02, LLM06, LLM10 ASI02, ASI03, ASI08 ~150 GB of tax and voter data exposed in AI-assisted intrusion
OpenClaw inbox deletion Feb 23, 2026 LLM05, LLM06 ASI09, ASI10 Agent ignored stop commands and deleted user emails
Meta internal AI agent leak Reported Mar 20, 2026 LLM02, LLM05 ASI01, ASI08, ASI09 Sensitive data accessible to engineers for ~2 hours
Vertex AI “Double Agent” Mar 31 – Apr 1, 2026 LLM02, LLM03, LLM06 ASI02, ASI03, ASI04 Default permission scoping enabled credential exfiltration
Claude Code source map leak Mar 31 – Apr 2026 LLM02, LLM03, LLM05 ASI04, ASI05, ASI09 59.8 MB source map; fake “leaked” repos used for malware
Mercor / LiteLLM supply-chain breach Reported Apr 3, 2026 LLM02, LLM03, LLM04 ASI03, ASI04, ASI08 Meta paused Mercor work; training-data workflows exposed
Flowise CVE-2025-59528 active exploitation Reported Apr 7, 2026 LLM03, LLM05, LLM06 ASI02, ASI04, ASI05 RCE via CustomMCP config; 12,000-15,000 exposed instances
GrafanaGhost IPI Disclosed Apr 7, 2026 LLM01, LLM02, LLM05 ASI01, ASI02, ASI09 Hidden instructions in logs caused data exfiltration via Markdown

Source: OWASP GenAI Q1 2026 Exploit Round-up [5].

GrafanaGhost provides a fully documented end-to-end IPI chain in the set, with public technical write-ups of both the injection vector and the exfiltration channel, and merits closer examination. Researchers at Noma Security found that an attacker could poison Grafana log entries with carefully crafted query parameters that caused Grafana’s AI assistant to interpret embedded text as instructions when it summarized logs [6][9]. The exfiltration channel exploited Markdown image rendering with protocol-relative URLs (paths beginning with //), which passed Grafana’s URL validation but caused browsers to ship the rendered data to an attacker-controlled domain [6]. Grafana Labs simultaneously patched two adjacent vulnerabilities — CVE-2026-27876 (CVSS 9.1, arbitrary file write to RCE) and CVE-2026-27880 (CVSS 7.5, unauthenticated DoS) — in versions 12.4.2, 12.3.6, 12.2.8, 12.1.10, and 11.6.14 [14]. The IPI itself was not assigned a CVE, consistent with the report’s broader observation that AI-specific risks rarely receive traditional vulnerability identifiers.

The Disclosure Gap

A structural finding that recurs across the Q1 round-up is that traditional vulnerability management is not catching AI-specific risk [5]. Of the eight incidents documented, only Flowise carried a CVE; the rest were tracked through vendor blog posts, news reporting, or one-off advisories [5]. This gap widened further in research disclosed by The Next Web, which documented bug-bounty payments by Anthropic ($100 for a Claude Code Security Review prompt-injection flaw enabling API-key extraction from GitHub Actions runners), GitHub ($500 for a Copilot Agent flaw triggered by HTML-comment injection in issues), and Google (undisclosed amount, Gemini CLI Action flaw allowing the agent to expose its own API key as a comment) [7]. None of the three vendors assigned CVEs or published advisories, leaving security teams with no scanner signature, no SBOM marker, and no version pin to track. Earlier in the quarter, Miggo Security demonstrated how a single weaponized Google Calendar invite could plant dormant instructions that Gemini executed when the user later asked it to summarize their day, exfiltrating private meeting data into an attacker-visible event description [10][11].

What Makes IPI Hard to Defend

IPI defeats most input-filtering approaches because the malicious instruction does not come from a user, where security architectures expect untrusted input, but from a data source the agent treats as a tool input. CSS-suppressed text, zero-pixel fonts, and HTML-comment injection are invisible to humans reviewing the same page but render normally to an LLM that consumes the raw DOM or markdown. Protocol-relative URLs, Unicode tag characters, and Base64-staged payloads further break naive pattern-matching defenses. The agent’s own helpfulness compounds the problem: when an instruction is framed as a routine system task or authority override, models trained for instruction-following often comply unless explicit countermeasures intervene. Forcepoint’s senior researcher described the risk gradient: “a browser AI that can only summarize is low-risk,” but agents with email, terminal, or payment access become high-priority targets — the agent profile most actively deployed by enterprises through the first half of 2026 [12].

Recommendations

Immediate Actions

Security teams operating LLM-backed applications or agentic systems should treat IPI as an active threat rather than a research curiosity. The first priority is to inventory every agent or copilot that ingests untrusted external data — web pages, third-party documents, calendar entries, support tickets, log streams — and to map what tools and credentials each agent can invoke when triggered. Where the inventoried agent has access to email send, code execution, payment rails, or data write/delete, the deployment should be re-scoped against the principle of least privilege as a near-term priority, with timeline driven by the agent’s blast radius — payment, code-execution, and data-deletion capabilities warrant same-week action. The OWASP Q1 round-up attributes the severity of several Q1 incidents — including OpenClaw inbox deletion, Vertex AI Double Agent, and GrafanaGhost — in part to default-broad permission scoping (LLM06 / ASI02 in the framework) [5][6].

The second priority is to apply the OWASP LLM Top 10 mapping as a triage frame for current deployments. Concretely, validate that LLM01 controls (input provenance tagging, structured prompts that separate system instructions from data, guardrails on agent tool invocation) are present, and that LLM06 controls (human-in-the-loop checkpoints for high-impact actions, capability gating, audit trails on tool calls) are enforced before any agent can execute payments, deletions, or external network calls [8]. Where Markdown-based output rendering is enabled — particularly for image embeds — restrict to allow-listed domains and disallow protocol-relative URLs, the specific exfiltration channel exploited by GrafanaGhost [6].

Short-Term Mitigations

Over the next several weeks, security teams should implement detection coverage for the documented IPI trigger phrases and concealment techniques. Forcepoint’s payload set provides a starting signature library, including the recurring strings “Ignore previous instructions,” “Ignore all previous instructions,” “If you are an LLM,” and “If you are a large language model,” along with patterns that combine these with action verbs like “send,” “delete,” or “transfer” [2]. AI-gateway and DLP products should provide ingestion-time content scanning by default, and teams selecting these products should treat its absence as a material gap; web-browsing agents should run untrusted content through a sanitizer that strips zero-width text, off-screen positioned elements, and HTML comments before the agent’s context window receives it.

Coding and CLI agents warrant separate hardening because the documented payloads against them — recursive deletion, API-key exfiltration, and code execution — translate directly into developer-environment compromise [2][7]. Repository-side controls should require human approval before any agent commits, opens a pull request, or writes a comment that includes content extracted from issue bodies or PR descriptions, given the documented Copilot Agent and Claude Code Security Review IPI vectors [7]. Where vendors have not assigned CVEs for known agent flaws, security teams should pin agent versions explicitly and subscribe to vendor changelog feeds rather than rely on standard vulnerability scanners, which will not flag these issues. Calendar, email, and SaaS integrations that feed agents should be reviewed against the Miggo Security Gemini-via-Calendar pattern: if any external party can write into a field that an agent later summarizes, that field is an injection vector [10][11].

Strategic Considerations

The Q1 picture suggests that agent frameworks should elevate the separation between content the agent processes and instructions the agent follows to a primary design concern, rather than addressing it through after-the-fact defenses. Frameworks that treat all retrieved content as undifferentiated input into a single prompt context are structurally vulnerable; frameworks that maintain provenance metadata and constrain tool invocations based on data origin are defensible. CSA’s MAESTRO framework supports this re-architecture by modeling agent threats across the planning, memory, action, and tool-use layers, where IPI typically manifests as a goal-hijack at the planning layer that propagates through the action layer [13].

Procurement and vendor-management practices need updating to close the disclosure gap. Standard vendor security questionnaires should now ask explicitly whether the vendor publishes advisories and assigns CVEs (or equivalent identifiers) for prompt-injection and agent-misuse flaws, and what the vendor’s disclosure policy is for vulnerabilities that fall outside traditional code-flaw definitions. Internal AI risk programs should assume that the absence of a CVE does not equal the absence of a known flaw, and should treat vendor blog posts, security researcher disclosures, and OWASP round-up entries as authoritative for AI-specific risk tracking through at least the remainder of 2026.

CSA Resource Alignment

The findings in this note connect directly to several existing CSA AI Safety Initiative resources. The MAESTRO Agentic AI Threat Modeling framework provides the layered model that maps IPI to planning-layer goal hijack, memory-layer poisoning, and action-layer tool misuse, and should be the default reference when security architects reason about agent threat surfaces [13]. The AI Controls Matrix (AICM) — CSA’s superset of the Cloud Controls Matrix that incorporates AI-specific control objectives — addresses the input validation, output handling, and agent-permission-scoping controls that map cleanly to LLM01, LLM05, LLM06, and ASI01 through ASI03 in the OWASP frameworks. AICM addresses agent-to-tool authorization and content-provenance concerns relevant to IPI defense.

CSA’s published guidance on securing LLM-backed systems addresses the authorization patterns — orchestrator-mediated tool calls, RAG-source attestation, vector-database segmentation — that limit IPI blast radius when correctly implemented. The Securing Autonomous AI Agents guidance and the policy template on personal AI desktop agents (published in connection with the OpenClaw analysis) address the human-in-the-loop checkpointing and capability-gating recommendations made in this note. STAR for AI provides an assessment framework through which organizations can evidence that the relevant LLM01 and ASI01 controls are present and operating, and the Catastrophic Risk Annex under development through the Coefficient Giving grant will extend STAR for AI into scenarios where IPI cascades into broader agent-driven incidents.

References

[1] Google Security Blog. “AI threats in the wild: The current state of prompt injections on the web.” Google, April 23, 2026.

[2] Sewani, Mayur. “Indirect Prompt Injection in the Wild: X-Labs Finds 10 IPI Payloads.” Forcepoint X-Labs, April 22, 2026.

[3] Help Net Security. “Indirect prompt injection is taking hold in the wild.” Help Net Security, April 24, 2026.

[4] Palo Alto Networks Unit 42. “Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild.” Palo Alto Networks, March 3, 2026.

[5] Clinton, Scott. “OWASP GenAI Exploit Round-up Report Q1 2026.” OWASP GenAI Security Project, April 14, 2026.

[6] Hackread. “GrafanaGhost Vulnerability Allows Data Theft via AI Injection.” Hackread, April 2026.

[7] The Next Web. “Anthropic, Google, and Microsoft paid AI agent bug bounties, then kept quiet about the flaws.” The Next Web, April 2026.

[8] OWASP GenAI Security Project. “LLM01:2025 Prompt Injection.” OWASP, 2025.

[9] CyberScoop. “‘GrafanaGhost’ bypasses Grafana’s AI defenses without leaving a trace.” CyberScoop, April 2026.

[10] Miggo Security. “Weaponizing Calendar Invites: How Prompt Injection Bypassed Google Gemini’s Controls.” Miggo Security, January 19, 2026.

[11] The Hacker News. “Google Gemini Prompt Injection Flaw Exposed Private Calendar Data via Malicious Invites.” The Hacker News, January 2026.

[12] Infosecurity Magazine. “Researchers Uncover 10 In-the-Wild Indirect Prompt Injection Attacks.” Infosecurity Magazine, April 23, 2026.

[13] Cloud Security Alliance. “Agentic AI Threat Modeling Framework: MAESTRO.” Cloud Security Alliance AI Safety Initiative, February 6, 2025.

[14] Grafana Labs. “Grafana security release: Critical and high severity security fixes for CVE-2026-27876 and CVE-2026-27880.” Grafana Labs, April 7, 2026.

← Back to Research Index