Published: 2026-03-25
Categories: Agentic AI Security, Insider Threat, Identity and Access Management
Agents in the Wire: AI Agents as Enterprise Insider Threats
Executive Summary
Autonomous AI agents have arrived in the enterprise faster than the security controls designed to govern them. What were once narrow, task-specific automation tools have become persistent, credentialed entities that operate with broad access across file systems, APIs, communication channels, and cloud infrastructure — often without the oversight mechanisms that govern equivalent human activity. In this environment, an AI agent that has been compromised, manipulated, or simply misconfigured represents a threat archetype that the enterprise has not fully confronted: a persistent, credentialed process that operates continuously, executes actions without human deliberation, and can be redirected through injected instructions embedded in the content it processes.
This paper examines the post-compromise threat model for agentic AI. The central thesis is that the traditional kill chain — the sequence of steps by which an external attacker gains access and achieves objectives — is fundamentally altered when autonomous agents are present inside an organization. Once an agent is deployed with legitimate credentials and broad access, it can be weaponized into a persistence mechanism, a lateral movement tool, a credential harvester, and a data exfiltration channel, all while operating within the bounds of its authorized permissions. The result is a class of attack that registers as normal agent activity in most enterprise security tooling because, from a permissions standpoint, nothing unusual has occurred.
The evidence base for this threat is no longer theoretical. Documented CVEs include a zero-click prompt injection in Microsoft 365 Copilot that required no user interaction to exfiltrate OneDrive and SharePoint content [18], argument injection chains in AI coding agents that achieved remote code execution [11], and critical vulnerabilities in the Model Context Protocol infrastructure that enables agents to communicate with enterprise tools [12]. Meanwhile, the 2025 CSA Agentic Identity Survey found that 82% of organizations are not highly confident in their IAM systems’ ability to govern agent identities [1], and only 17% enforce runtime access controls consistently across all environments [1] — leaving the vast majority of enterprise agent deployments effectively ungoverned at the point where the threat materializes.
This paper provides security architects, CISOs, and risk managers with a structured threat taxonomy for agent-based compromise, an analysis of why traditional controls fail against this threat class, and a layered set of recommendations grounded in CSA’s MAESTRO framework and the AI Control Matrix (AICM).
Introduction and Background
The Agentic Shift in the Enterprise
The deployment of autonomous AI agents inside organizations accelerated significantly in 2025 and shows no sign of slowing. The 2025 CSA Agentic Identity Survey, conducted across 285 IT and security professionals, found that 40% of organizations already have AI agents operating in production environments, with another 31% running active pilots [1]. Forward-looking projections from the same study indicate that more than 70% of respondents expect to manage dozens to hundreds of deployed agents within the next twelve months [1]. Industry analysts have projected that by the end of 2026, 40% of all enterprise applications will integrate with task-specific AI agents, up from fewer than 5% in 2025 [4].
These agents are not the narrow bots of prior generations. Modern autonomous AI agents coordinate complex multi-step tasks, make contextual decisions about which tools to invoke, maintain memory across sessions, call external APIs, read and write files, send communications, and interact with other agents within orchestration frameworks. Each of these capabilities represents a potential attack surface — not because the agents themselves are malicious, but because they are architecturally designed to act on instructions, and they cannot distinguish between instructions that originate from their human operator and instructions that have been injected by an adversary into their operating environment.
The identity footprint of this agent population is substantial. According to an RSAC 2025 presentation by CSA and Workday, non-human identities already outnumber human identities in the enterprise at a ratio of 45:1 [2]. For an organization with 1,000 employees, that implies approximately 45,000 non-human identities in the form of service accounts, API keys, OAuth tokens, and agent credentials — the vast majority of which lack the lifecycle management, rotation policies, and behavioral monitoring that govern human access. One in five organizations had already experienced a security incident linked to non-human identities at the time of that report [2].
Why the Insider Threat Analogy Is Appropriate
The concept of the insider threat has historically focused on human employees or contractors who, whether through malice or negligence, cause harm using their legitimate access to organizational systems. AI agents align with this archetype across each defining characteristic. Like a privileged insider, a compromised agent already possesses valid credentials, operates continuously from within the network perimeter, has established trust relationships with the systems it accesses, and generates activity that looks legitimate because, technically, it is. Unlike a human insider, an agent can be fully manipulated by an external adversary without any social engineering, with effects that can be triggered conditionally, time-delayed, or propagated automatically to other agents in the same environment.
The CSA policy analysis of OpenClaw and similar desktop agents characterized this configuration as the “lethal quartet”: an agent that simultaneously possesses access to private organizational data, exposure to untrusted external content, the ability to communicate externally, and persistent memory across sessions [3]. Each individual characteristic is manageable in isolation. Their combination in a single entity can create a risk configuration that, if exploited, exposes a substantial portion of the agent’s authorized operational scope to adversary direction.
The Current Governance Gap
The governance infrastructure for managing this risk has not kept pace with deployment velocity. The 2025 CSA Agentic Identity Survey found that 44% of organizations rely on static API keys — long-lived credentials with no automatic expiration — as their primary method of authenticating agent identities [1]. Only 12% have fully implemented fine-grained access scopes, and just 13% have deployed continuous authentication for their agent population [1]. Perhaps most strikingly, only 28% of organizations can reliably trace agent actions back to a human sponsor or authorizing system across all environments [1]. When an agent takes an unauthorized action — or when it takes an action at the direction of an adversary — the forensic chain of custody needed for investigation and attribution simply does not exist for 72% of organizations.
This governance gap is not primarily a technology problem. It reflects the speed at which autonomous agent deployment has outrun the organizational frameworks designed to govern privileged access. Only 23% of organizations have a formal, organization-wide agent governance strategy [1]. The remaining 77% operate under informal agreements, ad-hoc policies, or no documented approach — a condition that limits their ability to detect, contain, or attribute agent-level compromise if and when it occurs.
The Anatomy of Agent-Based Compromise
Why Agents Are Structurally Different
Securing an AI agent requires confronting a structural property that has no direct analogue in traditional software security: the agent cannot reliably distinguish between its operator’s instructions and data in its environment that happens to resemble instructions. This is not a bug in any specific implementation; it is a consequence of the architecture that makes language models useful. The same mechanism that allows an agent to read a document and summarize its contents also allows malicious text embedded in that document to redirect the agent’s behavior. When an agent retrieves a webpage, processes an email, queries a database, or reads a file as part of its legitimate function, every piece of content it encounters is a potential injection vector.
OWASP has ranked prompt injection as the number one vulnerability in LLM-based applications across its Top 10 publications, reaffirming this ranking in its 2025 edition [9]. Unlike conventional injection attacks, which require exploiting a specific technical flaw in input handling, prompt injection requires only that malicious text be present in any context the model processes. This makes the attack surface difficult to bound: it is effectively coextensive with the set of all content the agent can access, which in an enterprise deployment may span email, documents, databases, web search results, code repositories, and API responses.
Initial Access: How Agents Become Compromise Vectors
Agent-based compromise enters the enterprise through several distinct channels. In some cases, the agent itself is the initial access mechanism, as when an attacker uses a prompt injection attack embedded in a phishing email, a malicious document, or a compromised third-party API response to redirect agent behavior. In other cases, the agent becomes a vector after an attacker has achieved some degree of initial access through other means — for instance, by compromising the agent’s credentials, poisoning a data source the agent consults, or introducing a malicious plugin into the agent’s tool ecosystem.
Research by Trail of Bits in October 2025 documented three distinct attack chains through which prompt injection could be escalated to remote code execution in AI coding agents [11]. These included argument injection through the abuse of test runner flags, regex filter bypass via hex-encoded dual-command chaining, and façade pattern argument injection through tool wrapper vulnerabilities. The resulting CVEs — including CVE-2025-54795 in Claude Code, GHSA-534m-3w6r-8pqr in Cursor, and a separately disclosed command injection in Amazon Q — confirmed that the path from content injection to full system compromise in agentic development environments is not merely theoretical [11]. All three exploit chains relied on CWE-88 (argument injection) and Living-Off-the-Land techniques that, once the agent was compromised, required no additional attacker infrastructure.
Beyond code execution, the EchoLeak vulnerability (CVE-2025-32711, CVSS 9.3) disclosed in June 2025 demonstrated the zero-click variant of agent-based data exfiltration [18]. A single malicious email sent to a Microsoft 365 Copilot user was sufficient to coerce the agent into accessing and exfiltrating internal OneDrive files, SharePoint content, Teams messages, and chat logs without any user interaction at all [18]. From the perspective of the authorization system, Copilot was performing its authorized function: summarizing content the user had access to. From the perspective of the adversary, a single email had turned every user’s AI assistant into an exfiltration mechanism.
Post-Compromise Capabilities: The Five Threat Families
Once an adversary has achieved influence over an agent — whether through prompt injection, credential theft, supply chain compromise, or tool poisoning — the agent’s authorized capabilities become the adversary’s attack toolkit. The following sections examine the five primary threat families that characterize agent-based post-compromise activity.
Prompt Injection and Lateral Movement
Prompt injection is the foundational technique that enables most other agent-based threats. Direct prompt injection involves an attacker who has access to the agent’s input channel inserting malicious instructions alongside legitimate user input. Indirect prompt injection, which is considerably more difficult to defend against, occurs when the agent retrieves and processes content from its environment — a webpage, a file, an API response — that contains embedded instructions the agent executes as if they had originated from its operator.
The Compositional Instruction Attack (CIA), a variant of indirect prompt injection, achieves attack success rates of over 95% across GPT-4, GPT-3.5, and Llama2 in controlled evaluations [17]. Adaptive indirect prompt injection has been demonstrated to bypass all eight of the most commonly deployed defensive measures with a success rate exceeding 50% [17]. These numbers suggest that prompt injection cannot be fully eliminated through any single defensive measure; it reflects a property of instruction-following architectures that mitigations can raise the cost of exploiting but cannot nullify.
Lateral movement through agent infrastructure follows naturally from successful injection. Unit 42 documented nine concrete attack scenarios applicable across the CrewAI, AutoGen, and comparable multi-agent frameworks [10]. Server-Side Request Forgery (SSRF) attacks exploit web-browsing tools to route agent requests through internal networks, reaching systems that would not be accessible from the public internet. Broken Object-Level Authorization (BOLA) vulnerabilities in agent APIs allow an attacker who has compromised one agent to access data belonging to other users or systems by manipulating object references. Cloud metadata service exploitation — instructing an agent to query the AWS IMDS endpoint or its Azure equivalent — yields cloud service account tokens that can be used to pivot into the organization’s cloud infrastructure [10].
The multi-agent dimension of this threat adds a propagation mechanism that amplifies the impact of any individual compromise. Research documented under the “Toxic Agent Flow” attack pattern demonstrates that a single compromised GitHub issue can hijack one agent, which then propagates malicious instructions to downstream agents through their natural interaction channels — without any continued involvement by the attacker after the initial injection [17]. In multi-agent pipelines where orchestrating agents delegate tasks to sub-agents that share context, a single successful injection can traverse the entire pipeline before any human review occurs.
Documented state-sponsored use of this capability indicates that the technique has moved beyond academic research. According to reporting by Cybersecurity Dive and Zenity’s 2026 Threat Landscape Report [8][21], Chinese state-sponsored actors leveraged AI coding agents to automate 80–90% of intrusion activity across an observed campaign — a characterization drawn from industry reporting rather than peer-reviewed operational analysis — including reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration, with only four to six human decision points across the entire operation. The implication is that AI agents are already being used by sophisticated threat actors to scale and accelerate attack operations in ways that significantly reduce the cost and expertise required for complex intrusions.
Memory Poisoning and Persistent Compromise
A defining characteristic of modern AI agents is persistent memory: the ability to retain context, learned patterns, and behavioral history across sessions. This capability, which enables agents to become more useful over time by remembering user preferences and operational context, also creates a novel persistence mechanism for adversarial influence. An adversary who can inject content into an agent’s memory does not need to re-exploit a vulnerability in each subsequent session; the poisoned memory carries the malicious instruction forward automatically.
The MINJA attack pattern injects crafted records into an agent’s memory banks during a compromised session, producing manipulation that sustains across all subsequent interactions without requiring repeated exploitation [17]. PoisonedRAG, which targets retrieval-augmented generation pipelines, demonstrates that five malicious documents are sufficient to achieve a 90% success rate in corrupting an agent’s knowledge base — establishing persistent, query-time manipulation that activates whenever the agent retrieves relevant information [17]. Because the poisoned content is stored in the same retrieval system as legitimate organizational knowledge, it is difficult to distinguish from authorized memory through inspection alone.
Model-level backdoor techniques represent an even deeper form of persistence, though they require access to the training or fine-tuning process. Research has identified several techniques with high operational characteristics. The absence of detection rate data for most of these techniques is itself significant: it suggests they have not been rigorously evaluated against deployed enterprise detection systems, which limits generalizability of the success rate claims but also indicates defenders lack validated detection methods.
| Technique | Success Rate | Detection Rate | Activation Mechanism |
|---|---|---|---|
| DemonAgent | ~100% | 0% (encrypted payloads) | Keyword or environmental trigger |
| Composite Backdoor Attack (CBA) | 100% | Not reported | Requires only 3% training data poisoning |
| BadAgent | High | Not reported | Dormant until trigger tokens appear |
| PoisonedRAG | 90% | Not reported | Five malicious texts sufficient |
Source: arxiv 2506.23260 [17]
Beyond model-level techniques, a compromised LLM can be configured through prompt injection to retrieve fresh attacker instructions from a command-and-control server on each invocation — effectively creating a dynamically updatable backdoor that allows the adversary to modify agent behavior in real time, triggered by a unique keyword in user input or by a URL fetch instruction embedded in the agent’s context [17].
The combination of memory persistence with autonomous action capability creates what the CSA Desktop Agent Policy memo characterized as “time-shifted prompt injection”: fragmented malicious payloads distributed across multiple interactions, configured to remain dormant until specific conditions are met, at which point they detonate as a coherent attack sequence [3]. This technique defeats detection approaches that analyze individual agent sessions in isolation, because no single interaction contains a complete attack signature.
MCP and Tool Abuse
The Model Context Protocol (MCP), developed to provide a standardized interface through which AI agents access enterprise tools, data sources, and APIs, has rapidly become one of the most significant attack surfaces in agentic AI deployments. Its adoption accelerated substantially in 2025, and with that adoption have come systematic vulnerability disclosures that reveal structural security weaknesses in both the protocol and its implementation ecosystem.
Independent security assessments by eSentire and others found widespread misconfigurations across MCP server deployments, including OAuth authentication flow vulnerabilities and command injection vulnerabilities enabling remote code execution [12]. CVE-2025-6514, assigned a CVSS score of 9.6, affects mcp-remote versions 0.0.5 through 0.1.15 and allows arbitrary OS command execution when an MCP client connects to an untrusted server — a condition readily achieved through a man-in-the-middle attack on insecure HTTP connections [12]. This vulnerability was discovered by JFrog Security Research and confirmed that the transport-layer exposure of MCP servers created a class of pre-authentication RCE vulnerabilities that required no prior compromise of the target environment.
The Anthropic mcp-server-git implementation received multiple CVE disclosures in the same period, covering path validation bypass, unrestricted git_init operations that allowed the .ssh directory to be converted into a git repository, and argument injection in the git_diff command [13]. These disclosures demonstrate that even reference implementations from the protocol’s author were not immune to the class of argument injection vulnerabilities that Trail of Bits had documented across AI coding agents more broadly.
Beyond individual CVEs, MCP introduces three attack patterns that are architectural rather than implementation-specific. Rug Pull attacks exploit the fact that MCP servers are permitted to modify their tool definitions between sessions: an MCP server can present a benign capability during the approval process, then silently switch to a malicious capability definition in subsequent sessions without triggering re-authorization [13]. Tool Poisoning embeds adversarial instructions directly in tool description metadata — text that the agent reads to understand what a tool does, but which a human approving the tool’s installation typically does not see in full [13]. The Confused Deputy pattern allows a compromised MCP server to acquire and replay OAuth tokens that the agent has legitimately obtained for enterprise services, converting the agent’s authorized access into a token theft mechanism [13].
Semantic Privilege Escalation
Traditional privilege escalation exploits technical vulnerabilities to acquire permissions the attacker is not authorized to hold. Semantic privilege escalation operates on a different axis: the agent acts entirely within its authorized technical permissions, but performs actions that fall outside the semantic scope of the user’s original request. Because every individual action passes authorization checks, the attack leaves no conventional security signal.
Acuvity documented the canonical form of this attack: a user asks an agent to summarize a shared document; hidden instructions in the document — invisible to human readers but processed by the agent — direct it to search the organization’s file systems for API keys and credential files, then transmit the found credentials to an attacker-controlled email address [16]. No permission boundary is crossed at any point. The agent has been authorized to read documents, access file systems, and send email on the user’s behalf. It does each of these things, in sequence, as directed by injected text the user never saw [16].
The Hacker News analysis of this threat class noted that AI agents function as authorization bypass paths: rather than directly compromising the agent, attackers compromise or prompt-inject it to perform actions the attacker themselves lacks permission to execute [6]. This represents a structural inversion of the traditional access model. The agent’s elevated trust becomes the vulnerability, because any instruction that reaches the agent — regardless of its origin — executes with the agent’s full authorization scope.
Semantic escalation also manifests through emergent behavior drift, in which agents may interpret instructions broadly in ways that gradually expand the scope of actions taken — an emergent pattern arising from optimization toward task completion rather than adversarial intent — and through context loss in multi-agent handoffs, where the original intent of the human sponsor is misinterpreted or stripped out as tasks pass between agents in a pipeline. Unlike explicit injection attacks, these forms of semantic escalation may occur in the absence of any adversarial actor, driven purely by the agent’s optimization toward task completion.
CVE-2025-10725, a privilege escalation vulnerability in Red Hat OpenShift AI, provides a concrete infrastructure-level example [28]. While technical in nature, it illustrates that privilege escalation vulnerabilities in AI platforms extend beyond the model layer to the orchestration and runtime infrastructure that hosts and manages agents — expanding the attack surface that defenders must account for.
Supply Chain Compromise
The supply chain of an autonomous AI agent comprises the model itself, the plugins and skills it can invoke, the MCP servers it connects to, the training data it was built on, and the third-party integrations that extend its capabilities. Each layer of this supply chain represents a point of potential compromise that can be exploited to modify agent behavior at scale, affecting every deployment that consumes the compromised component.
The Cisco State of AI Security 2026 Report identified malware hidden in public model and code repositories as a primary source of AI-related breaches [24]. The same report noted that 83% of organizations planned to deploy agentic AI, while only 29% felt they had the security capabilities to do so safely [24]. This readiness gap is particularly pronounced in the supply chain domain, where organizations frequently have no mechanism to verify the integrity of models they download, tools they install, or plugins they activate.
The CSA analysis of the OpenClaw desktop agent ecosystem documented the practical reality of this risk. Researchers identified 341 malicious skills in the ClawHub marketplace, the agent’s primary plugin distribution channel, distributing Atomic Stealer malware that targeted credentials, API keys, and cryptographic wallet files across macOS and Windows systems [3]. The Moltbook incident — in which a database misconfiguration at an AI agent social networking platform exposed 1.5 million API tokens — demonstrated that the supporting infrastructure around agent ecosystems can itself become a credential theft vector [3].
ReversingLabs documented the distinct risks introduced when agents have write access to code repositories and CI/CD pipelines [23]. An agent operating in a software development context with repository write permissions can introduce vulnerabilities or backdoors into source code at scale, targeting not just the immediate organization but every user of software packages the organization publishes. This represents a convergence of the agent insider threat with traditional software supply chain attacks, with the agent’s automation capabilities amplifying the potential impact beyond what a human insider could achieve.
MCP’s distributed server ecosystem has introduced supply chain risks specific to the protocol itself. Because agents dynamically resolve and connect to MCP servers, a compromised server in the MCP registry can affect all agents that subscribe to it. The Rug Pull attack pattern is particularly relevant here: an apparently legitimate MCP server that accumulates a base of installations can subsequently modify its tool definitions to introduce malicious capabilities, activating compromise across all connected agents simultaneously without requiring any additional action by the agent operators [13].
The Identity Problem: Non-Human Identities at Scale
The governance of AI agent identities — the credentials, tokens, and authorization grants that define what agents are permitted to do — is the foundational control through which all other agent security measures are anchored. Without accurate, comprehensive, and continuously maintained agent identity governance, the threat techniques described in the preceding sections cannot be effectively detected, contained, or attributed. The current state of agent identity governance in most enterprises falls well short of this requirement.
The 2025 CSA Agentic Identity Survey found that the most common method of authenticating agent identities is the static API key, used by 44% of organizations [1]. Static API keys are long-lived credentials that do not expire automatically, are not bound to specific sessions or contexts, and are frequently stored in plaintext in local configuration files, environment variables, or code repositories where they are accessible to any process — or any agent — running on the same system. Forty-three percent of organizations use username and password combinations, and 35% use shared service accounts, both of which compound the attribution problem by making it impossible to distinguish between legitimate and illegitimate agent actions at the credential level [1].
The deployment of genuinely agent-appropriate authentication methods remains limited. Only 34% of organizations use OpenID Connect, 27% use OAuth PKCE, and 19% have deployed SPIFFE/SVID workload identities — all of which provide short-lived, cryptographically verifiable credentials that can be tied to specific workloads and sessions [1]. The difference matters at investigation time: when an incident occurs, static API keys do not carry the session context, timing information, or workload binding needed to reconstruct what an agent did, when, and on whose authorization. The 72% of organizations that cannot trace agent actions to a human sponsor are, in practice, blind to agent-level post-compromise activity [1].
LLMjacking — the theft and misuse of credentials used to access large language model APIs at the victim’s expense — has emerged as a distinct attack category with a growing criminal ecosystem. Compromised credentials for services such as Amazon Bedrock, OpenAI, and Anthropic are sold on underground markets or used to run unauthorized inference at the victim’s expense. The first infostealers specifically engineered to target AI agent credential storage were identified in February 2026, indicating that the criminal ecosystem has recognized the value of agent credentials and begun developing purpose-built tools to harvest them at scale [10].
Detection Gaps and Why Traditional Controls Fail
The threat techniques described throughout this paper share a structural characteristic that makes them difficult to detect with security tooling designed around human user behavior: they produce activity that is, from an authorization and telemetry perspective, indistinguishable from normal agent operation. An agent that has been prompt-injected to exfiltrate documents exercises the same permissions, calls the same APIs, and generates the same log entries as an agent performing a legitimate document summary. An agent whose memory has been poisoned behaves consistently with its operational profile across sessions, because its poisoned memory is, to the system, part of its legitimate state.
The volume and velocity of agent activity compounds this problem. Research across multiple sources indicates that AI agents move substantially more data across enterprise systems than human users, often by orders of magnitude depending on the tasks the agent is performing [10]. At this scale, anomaly detection approaches that might identify unusual access patterns in human-generated telemetry are overwhelmed by the sheer volume of agent-generated activity, which itself spans wide ranges of behavior depending on the tasks the agent is performing. Eighty percent of organizations surveyed reported observing risky agent behaviors including unauthorized system access and improper data exposure, yet only 21% reported having complete visibility into agent permissions, tool usage, or data access patterns [7]. The gap between observed risk and available visibility indicates that most organizations are detecting the most obvious agent misbehavior while missing more sophisticated techniques.
Human-in-the-loop (HITL) oversight, while valuable, does not fully address this gap. The 2025 CSA Agentic Identity Survey found that 68% of respondents consider HITL oversight essential or very important for agentic systems [1]. But HITL mechanisms are typically applied at decision points that the agent surfaces for human review — high-risk actions, financial transactions, sensitive data access. An adversary who understands the HITL architecture will design attacks that operate entirely within the action space the agent executes autonomously, without triggering a review. Time-shifted injection attacks and semantic escalation attacks possess structural properties that allow them to pass through HITL checkpoints — each individual action falls within the agent’s autonomous operating envelope, and no single interaction contains a complete attack signature that would surface a review trigger.
Conclusions and Recommendations
Framing for Security Architecture
The threat model documented in this paper argues for treating deployed AI agents as a class of privileged insider that requires the same — or greater — scrutiny as a human employee with equivalent system access. This reframing has practical implications for how organizations scope their agent governance programs: if an agent has permission to read sensitive files, access internal APIs, and communicate externally, the security controls governing that agent should match the controls governing a human employee with equivalent access, extended to account for the agent’s unique characteristics (persistent operation, inability to exercise moral judgment, susceptibility to injection attacks).
Immediate Actions
Organizations should prioritize the following actions in the near term, beginning before additional agent deployments expand the ungoverned attack surface.
First, establish a complete inventory of deployed agents. Without a real-time agent registry that captures agent identities, authorization grants, tool access, and the human sponsors responsible for each agent, governance is impossible. The 40% of organizations that lack even a non-real-time registry have no foundation for the remaining controls [1].
Second, rotate all static, long-lived agent credentials to short-lived, cryptographically bound workload identities using OIDC, OAuth PKCE, or SPIFFE/SVID. Static API keys should be treated as a legacy pattern incompatible with agentic deployments at production scale.
Third, implement content boundary controls that prevent agent-processed content from being treated as instructions without explicit delimiters and validation. Prompt injection cannot be fully prevented by any single control, but systematic separation of instruction and data channels — through structured prompting, system prompt protections, and output validation — provides meaningful defense-in-depth, raising the cost and complexity of successful attacks even if it cannot prevent all injection attempts in open environments.
Architecture-Level Controls
At the architectural level, organizations should apply the principle of least privilege to agent authorization with greater rigor than is typically applied to human users. Agents should receive access only to the specific tools, data sources, and APIs required for their current task, with access revoked or scoped down when tasks complete. The 88% of organizations that have not fully implemented fine-grained access scopes for their agents are exposing their entire agent access footprint to any injection attack that succeeds against any single agent in their environment [1].
Multi-agent architectures require explicit trust establishment between agent tiers. Agent orchestrators should not automatically extend their full authorization scope to sub-agents; each agent-to-agent interaction should carry only the permissions needed for the delegated subtask. The absence of inter-agent authentication in the current implementations of MCP, A2A, and related protocols — a gap documented in arxiv research [17] — means that organizations must implement compensating controls at the orchestration layer until these protocols mature.
Memory and RAG systems require integrity controls equivalent to those applied to any other data store the agent can read. Retrieval-augmented knowledge bases should be populated only from authenticated, validated sources, with cryptographic provenance for stored content. Agents should not be permitted to write directly to their own persistent memory from untrusted content inputs, as this is the mechanism through which memory poisoning attacks establish persistence.
Governance and Policy
The governance gap documented throughout this paper is not solely a technical problem. Only 23% of organizations have formal, organization-wide agent governance strategies [1]; the remaining 77% are managing a growing population of privileged autonomous actors without the policy framework needed to hold them to consistent security standards. Security teams should work with AI, IT, and legal stakeholders to develop agent governance policies that cover at minimum: the authorization process for deploying new agents, the credential standards all agents must meet, the logging and traceability requirements for agent activity, and the incident response procedures for agent compromise.
Shadow AI adoption — the deployment of unauthorized agents by individual employees or teams — represents a particularly difficult governance challenge. The OpenClaw ecosystem documented 135,000 exposed gateway instances and 22% employee adoption of unauthorized agents [3], indicating that prohibition alone is insufficient when sanctioned alternatives do not adequately meet employee productivity needs. Organizations should invest in approved, enterprise-grade agent platforms that provide the security controls — sandboxing, audit logging, administrative oversight, scoped file access — that unmanaged alternatives lack, while creating accessible paths for employees to request access to emerging tools through a controlled evaluation process.
CSA Resource Alignment
This whitepaper connects directly to several active workstreams and publications from the Cloud Security Alliance AI Safety Initiative.
MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome), published in February 2025, provides the seven-layer threat modeling framework most directly applicable to the attack categories documented here [25]. MAESTRO’s threat categories — goal misalignment, data poisoning, evasion attacks, agent collusion, and supply chain vulnerabilities — align directly with the threat families described in this paper, providing a structured modeling framework for each of the five attack classes. Organizations using MAESTRO’s open-source TITO tool can generate MITRE ATT&CK-mapped attack path analyses for their specific agent architectures, providing a structured starting point for red team exercises and defensive prioritization.
The AI Control Matrix (AICM), CSA’s superset of the Cloud Controls Matrix extended for AI systems, provides the control framework for operationalizing the recommendations in this paper. Relevant AICM domains include Orchestrated Service Provider (OSP) controls governing agent orchestration security, Model Provider (MP) controls addressing training data integrity and model supply chain, and AI Customer (AIC) controls governing runtime authorization and data access governance. Organizations building agent governance programs should map their controls to AICM rather than CCM alone, as AICM captures the AI-specific control requirements that CCM does not address.
The 2025 CSA Agentic Identity Survey [1] and the RSAC 2025 presentation on Securing Non-Human Identities [2] provide the empirical foundation for the identity governance recommendations in this paper and should be read alongside it by teams developing agent identity programs.
The CSA Desktop Agent Policy Template [3] provides a directly deployable policy artifact for organizations needing to respond immediately to unauthorized personal AI agent deployments in their environment, addressing the shadow AI adoption challenge discussed in the governance section.
References
[1] Hillary Baron et al., “2025 CSA Agentic Identity Survey,” Cloud Security Alliance, 2026. Sponsored by Strata Identity. Survey conducted September–October 2025, n=285.
[2] Albert Attias and Alon Jackson, “Securing Non-Human Identities in the Age of AI Agents,” Cloud Security Alliance / Workday / Astrix, presented at RSA Conference 2025.
[3] Cloud Security Alliance AI Safety Initiative, “Policy on Personal AI Desktop Agents (OpenClaw and Similar Tools),” 2026. References research from Palo Alto Networks Unit 42, SecurityScorecard, Koi Security.
[4] Palo Alto Networks, “2026 Predictions for Autonomous AI,” November 2025. https://www.paloaltonetworks.com/blog/2025/11/2026-predictions-for-autonomous-ai/
[6] The Hacker News, “AI Agents Are Becoming Authorization Bypass Paths,” January 2026. https://thehackernews.com/2026/01/ai-agents-are-becoming-privilege.html
[7] Help Net Security, “Enterprise AI Agent Security 2026,” March 2026. https://www.helpnetsecurity.com/2026/03/03/enterprise-ai-agent-security-2026/
[8] Zenity, “2026 Threat Landscape Report,” 2026. https://zenity.io/resources/white-papers/2026-threat-landscape-report
[9] OWASP, “OWASP Top 10 for LLM Applications 2025,” Open Web Application Security Project, 2025. https://owasp.org/www-project-top-10-for-large-language-model-applications/
[10] Palo Alto Networks Unit 42, “Agentic AI Threats,” 2026. https://unit42.paloaltonetworks.com/agentic-ai-threats/
[11] Trail of Bits, “Prompt Injection to RCE in AI Agents,” October 2025. https://blog.trailofbits.com/2025/10/22/prompt-injection-to-rce-in-ai-agents/
[12] eSentire, “Model Context Protocol Security: Critical Vulnerabilities Every CISO Should Address in 2025,” 2025. https://www.esentire.com/blog/model-context-protocol-security-critical-vulnerabilities-every-ciso-should-address-in-2025
[13] authzed, “Timeline of MCP Security Breaches,” 2025–2026. https://authzed.com/blog/timeline-mcp-breaches
[16] Acuvity, “Semantic Privilege Escalation: The Agent Security Threat Hiding in Plain Sight,” 2025. https://acuvity.ai/semantic-privilege-escalation-the-agent-security-threat-hiding-in-plain-sight/
[17] Multiple authors, “From Prompt Injections to Protocol Exploits: A Survey of Agentic AI Security,” arxiv preprint 2506.23260, 2025. https://arxiv.org/html/2506.23260v1
[18] Vectra AI, “EchoLeak: Zero-Click Prompt Injection in Microsoft 365 Copilot (CVE-2025-32711),” June 2025. https://www.vectra.ai/topics/prompt-injection
[20] Obsidian Security, “Agentic AI Security and LLMjacking,” 2025. https://www.obsidiansecurity.com/blog/agentic-ai-security
[21] Cybersecurity Dive, “Research Shows AI Agents Are Highly Vulnerable to Hijacking Attacks,” 2025. https://www.cybersecuritydive.com/news/research-shows-ai-agents-are-highly-vulnerable-to-hijacking-attacks/757319/
[23] ReversingLabs, “How AI Agents Upend Software Supply Chain Security,” 2025. https://www.reversinglabs.com/blog/how-ai-agents-upend-sscs
[24] Cisco, “State of AI Security 2026 Report,” 2026. https://blogs.cisco.com/ai/cisco-state-of-ai-security-2026-report
[25] Cloud Security Alliance, “Agentic AI Threat Modeling Framework — MAESTRO,” February 2025. https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro
[28] ZeroPath, “CVE-2025-10725: Red Hat OpenShift AI Privilege Escalation,” 2025. https://zeropath.com/blog/cve-2025-10725-redhat-openshift-ai-privilege-escalation