Agentjacking: Sentry MCP Injection Hijacks AI Coding Agents

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-06-14

Categories: Agentic AI Security, Developer Security, Model Context Protocol
Download PDF

Agentjacking: Sentry MCP Injection Hijacks AI Coding Agents

Key Takeaways

  • Researchers at Tenet Security documented a novel attack class called “agentjacking” in which adversaries inject malicious instructions into Sentry error events, causing AI coding agents that query Sentry via the Model Context Protocol (MCP) to autonomously execute attacker-controlled commands using the developer’s own system privileges [1].
  • The attack requires only a Sentry Data Source Name (DSN)—a public, write-only credential that Sentry intentionally embeds in frontend JavaScript and that attackers can discover at scale through website source inspection or GitHub search. Tenet Security identified at least 2,388 organizations with publicly exposed DSNs that could be used to inject malicious Sentry events [1][2].
  • Testing across Claude Code, Cursor, and OpenAI Codex CLI (OpenAI’s agentic coding assistant, released 2025) yielded an 85% exploitation success rate. Agents retrieved injected events from Sentry, interpreted them as authoritative diagnostic guidance, and executed attacker-supplied commands—exfiltrating AWS credentials, GitHub OAuth tokens, Kubernetes secrets, and other sensitive environment variables [1][7].
  • Because agents perform only authorized operations under developer identities, the attack bypasses endpoint detection and response (EDR), web application firewalls (WAF), identity and access management (IAM) controls, and VPN monitoring without triggering observable anomalies [1][2].
  • Sentry acknowledged the disclosure on June 3, 2026, but declined root-cause remediation, characterizing comprehensive platform-level fixes as “technically not defensible.” The company subsequently deployed a content filter addressing only the specific payload string identified during the research period [1].

Background

The Model Context Protocol, introduced by Anthropic in late 2024, provides a standardized interface through which AI coding agents connect to external tools, data sources, and services. MCP enables agents to query databases, browse documentation, inspect cloud infrastructure, and retrieve application telemetry—all without requiring developers to write bespoke integrations for each tool. The protocol has seen broad industry adoption: major development environments including Claude Code, Cursor, and OpenAI Codex CLI now support MCP natively, and a large and growing number of MCP server implementations are publicly available for services ranging from database clients to observability platforms [3].

Sentry is a widely deployed application monitoring and error-tracking platform used across the software industry. To accommodate the shift toward AI-assisted development, Sentry released an MCP server that allows coding agents to query error events, retrieve stack traces, analyze performance regressions, and invoke Sentry’s own AI-powered root-cause analysis tool, Seer. When a developer asks their AI coding agent to “investigate unresolved Sentry issues” or “fix the top error from this sprint,” the agent autonomously retrieves relevant error events through the Sentry MCP server and proceeds to interpret their contents as actionable diagnostic context [4].

Sentry’s event ingestion architecture is, by design, permissive. The DSN embedded in client-side JavaScript is a write-only credential whose public exposure is intentional and necessary for Sentry’s core functionality: applications in the field need to report errors to Sentry’s ingest endpoint without requiring an authenticated session. This architecture predates AI coding agents and was designed for a world where error events were read by human engineers, not interpreted and acted upon by autonomous software. Agentjacking exploits precisely this mismatch between the trust assumptions of an observability platform and the trust posture of an MCP-connected AI agent.

Security Analysis

The Attack Chain

The agentjacking attack unfolds in a sequence of six steps, each of which relies on capabilities that individually appear benign. The attack begins with DSN discovery. An adversary locates a Sentry DSN through routine reconnaissance—inspecting the JavaScript bundle of a target web application, querying Censys or Shodan for traffic to ingest.sentry.io, or searching GitHub code repositories with publicly known DSN patterns. No authentication is required: the DSN is a public credential, and Sentry’s ingest endpoint accepts POST requests from any source that presents it.

With the DSN in hand, the attacker crafts a synthetic Sentry error event and submits it directly to Sentry’s unauthenticated ingest endpoint using any standard HTTP client. The event’s message field and context key names are populated with carefully formatted markdown content designed to mimic the visual structure of Sentry’s own diagnostic templates. Critically, the “## Resolution” section of the fake event contains an npx command that fetches and executes an attacker-controlled npm package—or alternatively, a shell command that reads environment files and transmits them to an external collection server.

The attack then waits for a developer to engage their AI coding agent. When the developer asks the agent to review or resolve Sentry issues, the agent queries the Sentry MCP server. The MCP server returns the injected event alongside legitimate events, presenting all of them as structured tool output. During testing, Claude Code, Cursor, and OpenAI Codex CLI did not distinguish between data retrieved from an external source and operator instructions—a limitation common to most current agent frameworks—and therefore interpreted the injected markdown as authoritative guidance. The agent then executes the attacker-supplied command with the full privileges of the developer’s operating system account. Credential exfiltration—including AWS access keys, GitHub personal access tokens, Kubernetes cluster credentials, registry credentials from .npmrc and .docker/config.json, and the contents of CI/CD environment variables—completes without triggering alerts.

The Structural Vulnerability

Agentjacking is not a conventional injection vulnerability in Sentry’s software. No server-side code execution occurs at Sentry. The vulnerability is architectural: it arises from the combination of three independently reasonable design decisions that become dangerous in combination. First, Sentry’s ingest endpoint accepts arbitrary payloads from any presenter of a valid DSN. Second, the Sentry MCP server forwards those events to AI agents as tool output without applying content integrity checks. Third, AI coding agents treat tool output from connected MCP servers as trusted context, applying the same interpretive authority to MCP-returned data as they apply to operator system prompts or user instructions.

Broader MCP ecosystem research corroborates that this trust architecture is not unique to Sentry. Research reported by Elastic Security Labs, drawing on third-party analysis of publicly available MCP server implementations, found that 43% of implementations in the sample contained command injection flaws, and 30% permitted unrestricted URL fetching—conditions that create similar injection surfaces across many toolchains [5][10]. A benchmark study evaluating 45 live MCP servers against poisoned tool descriptions, as cited in Huang et al., found attack success rates above 60% across leading agents, with the highest exceeding 72% [8]. The agentjacking case demonstrates that the injection surface extends beyond the MCP server software itself to every external data source that the server exposes.

Evasion of Conventional Controls

What makes agentjacking particularly consequential from a security operations standpoint is that it succeeds entirely within the envelope of authorized behavior. The developer’s agent is doing exactly what the developer asked: investigating Sentry errors and proposing fixes. The commands the agent executes—running an npm package, reading environment files, making an outbound HTTPS request—are all operations a developer might legitimately perform. No malicious binary is dropped, no process injection technique is used, and no privilege escalation is required. Tenet Security found that in their tested environments, the attack produced no signal that EDR rules, WAF policies, IAM guardrails, or network monitoring tools were configured to detect—bypassing all four control categories across tested environments [1][2].

This evasion profile reflects a broader pattern in agentic AI threats: because agents operate with delegated authority and perform authorized actions, attacks that abuse that delegation tend to be invisible to controls designed around the concept of an adversary acting without authorization. Security operations teams have not historically needed to distinguish between “a developer ran an npm install” and “an agent ran an npm install on behalf of a developer in response to a malicious error event”—because the second scenario did not exist until AI coding agents became production tools.

Sentry’s Response and Its Limits

Sentry’s product leadership publicly acknowledged the Tenet Security disclosure on the day of publication, June 3, 2026. However, the company declined to pursue root-cause remediation, describing the problem as “technically not defensible” at the platform level [1]. The measure Sentry did implement—a content filter blocking the specific payload string identified during the research—addresses the disclosed proof-of-concept exploit but not the architectural pathway that enables injection. Any attacker who varies the payload structure, reformats the markdown, or routes instructions through indirect channels can circumvent a string-match filter. This situation reflects a challenge common in architectural vulnerabilities: when the root cause is embedded in fundamental design decisions rather than specific implementation errors, full remediation often requires re-engineering core product behaviors.

Defenders therefore cannot rely on Sentry’s filter as a durable control. The attack surface will remain open for as long as Sentry’s ingest endpoint accepts arbitrary payloads, its MCP server returns those payloads to agents without content validation, and AI coding agents lack mechanisms to distinguish data from instructions within tool output.

Recommendations

Immediate Actions

Organizations using AI coding agents connected to Sentry should audit all MCP server configurations across their developer environments to identify integrations that surface content originating from external or user-controlled inputs. The Sentry MCP integration should be assessed for operational necessity; where it is not required, it should be disabled. Where it remains enabled, teams should configure their coding agents to require explicit human approval before executing any shell command, installing any package, or making any outbound network request sourced from MCP-retrieved content. Teams should also rotate all Sentry DSNs for projects with confirmed DSN exposure in public-facing JavaScript bundles or public GitHub repositories.

Sentry DSN exposure can be mitigated independently of the MCP integration by routing client-side error reporting through a server-side Sentry relay or proxy. This architectural pattern allows applications to submit errors to Sentry without embedding a DSN in frontend code accessible to external parties, reducing the population of organizations susceptible to DSN-based injection.

Short-Term Mitigations

The principle of least privilege should be applied to agent execution environments. Developers should deploy AI coding agents in sandboxed containers or virtual machines that restrict access to credential files, cloud metadata service endpoints (such as 169.254.169.254), and environment variables containing secrets. Where agent sandboxing is not immediately feasible, developers should review their local development environments to ensure that sensitive credentials are not stored in plaintext files accessible to agent processes.

Organizations should maintain a formal inventory of all MCP servers connected to their AI coding agents, with documented assessments of whether each server surfaces data from external or user-controlled inputs. MCP server installations should go through a review process equivalent to the organization’s software dependency approval workflow—a practice OWASP now recommends explicitly in its MCP Tool Poisoning guidance [6]. Red-team exercises covering MCP injection and tool poisoning scenarios should be incorporated into existing agentic AI security testing programs. Human-in-the-loop checkpoints for agent actions with external side effects are a meaningful near-term safeguard while more durable mitigations mature.

Strategic Considerations

The agentjacking case reveals an accountability gap that will grow as organizations deploy AI coding agents more broadly. An organization that connects an AI agent to MCP-enabled observability platforms implicitly trusts that every platform enforces content integrity to the same standard the organization would apply to direct operator instructions. That trust is not warranted. Strategic AI deployment policy should explicitly define which categories of external data sources agents may query and should require that any MCP server surfacing external content implement verification controls—or be treated as an untrusted input channel with corresponding agent-side constraints.

At the industry level, the broader challenge is that MCP’s trust model was designed for tool invocation, not arbitrary data retrieval. MCP server implementations that expose external data—error events, user messages, web content, database records—introduce injection surface at the protocol level that tool-level controls cannot fully address. Protocol-level content integrity mechanisms, agent-side data-versus-instruction classifiers, and cryptographically verifiable MCP server identities are among the architectural directions the security community should pursue. CSA’s AI Working Groups are actively engaging these questions through the MAESTRO and AICM frameworks.

CSA Resource Alignment

Agentjacking fits within CSA’s MAESTRO threat modeling framework for agentic AI systems, which organizes threats across seven layers of an agentic architecture [9]. The attack instance most precisely fits the Agent Frameworks layer, where MAESTRO identifies “tool interface abuse” as a primary threat class. MAESTRO’s treatment of trust boundary violations—where an agent incorrectly applies operator-level trust to data retrieved from an external source—provides the conceptual foundation for assessing agentjacking risk across diverse MCP deployments.

The AI Controls Matrix (AICM) addresses the supply chain and trust dimensions of agentjacking through its AI Supply Chain Security domain and its controls requiring content integrity verification for data consumed by AI systems. Organizations performing AICM assessments should evaluate whether their MCP server deployments satisfy the framework’s controls for input validation at AI system boundaries and for human oversight of autonomous agent actions.

CSA’s Zero Trust guidance is directly applicable to MCP-connected agent deployments. The core zero-trust principle—that no data source should be trusted by default, and that trust must be explicitly established and continuously verified—translates naturally into a requirement that agents apply skeptical interpretation to MCP tool output, particularly when that output contains actionable instructions. CSA’s published Zero Trust guidance for LLM environments offers a starting framework for teams building out these controls.

CSA’s AI Safety Initiative tracks the emergence of agentic attack classes, including MCP injection and tool poisoning, as a priority focus area. Security teams and developers seeking ongoing guidance should monitor CSA Labs publications for updates as the MCP ecosystem matures and the threat landscape evolves.

References

[1] Tenet Security. “A Fake Bug Report Hijacks Your AI Coding Agent — and Nothing Catches It.” Tenet Security Blog, June 9, 2026.

[2] Cloud Security Alliance AI Safety Initiative. “Agentjacking: MCP Injection Hijacks AI Coding Agents.” CSA Labs, June 12, 2026.

[3] Ravie Lakshmanan. “Agentjacking Attack Tricks AI Coding Agents Into Running Malicious Code.” The Hacker News, June 2026.

[4] ZenML LLMOps Database. “Sentry: Model Context Protocol (MCP) Server for Error Monitoring and AI Observability.” ZenML, 2026.

[5] Elastic Security Labs. “MCP Tools: Attack Vectors and Defense Recommendations for Autonomous Agents.” Elastic, 2026.

[6] OWASP Foundation. “MCP Tool Poisoning.” OWASP, 2026.

[7] Infosecurity Magazine. “New ‘Agentjacking’ Attacks Could Hijack AI Coding Agents.” Infosecurity Magazine, June 2026.

[8] Charoes Huang et al. “Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning.” arXiv:2603.22489, March 2026.

[9] Ken Huang, CSA AI Safety Working Group. “Agentic AI Threat Modeling Framework: MAESTRO.” Cloud Security Alliance, February 2025.

[10] OX Security. “MCP STDIO Command Injection: Full Vulnerability Advisory.” OX Security Blog, 2026.

← Back to Research Index