Agent Commander: Promptware Turns AI Agents into C2 Infrastructure

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-03-17

Categories: AI Security, Threat Intelligence, Prompt Injection, Agentic AI, Malware Analysis
Download PDF

Agent Commander: Promptware Turns AI Agents into C2 Infrastructure

How Prompt Injection Payloads Compose into Multi-Agent Command-and-Control Networks


Key Takeaways

On March 16, 2026, security researcher Johann Rehberger (wunderwuzzi, Embrace The Red) published “Agent Commander: Promptware-Powered Command and Control,” a framework demonstrating that multiple AI agents from different vendors can be simultaneously compromised via prompt injection and enrolled into a unified command-and-control network where an operator issues natural language tasking through a centralized dashboard [1]. Unlike earlier ZombAI demonstrations that showed individual agent hijacking, Agent Commander represents the operational maturation of promptware — a term describing prompt injection payloads complex enough to function as malware — into a multi-agent C2 platform capable of persistence, tasking, and coordinated objectives across heterogeneous agent architectures.

The research demonstrated compromise of three distinct agent platforms — Kimi Claw via document analysis, OpenClaw via email-based GCP Pub/Sub delivery, and NanoClaw via malicious website visits — all checking in to a single C2 dashboard for natural language commands [1]. Persistence was achieved through heartbeat exploitation in OpenClaw’s HEARTBEAT.md mechanism and scheduled task injection in NanoClaw, while detection evasion leveraged response suppression flags and cheaper model variants that are more susceptible to prompt injection [1][2]. For agents configured with shell execution, file access, and network connectivity — capabilities that many current agent platforms grant by default — the research demonstrates that the distinction between agent compromise and traditional host compromise effectively collapses, because an attacker who controls such an agent controls the underlying system.

Enterprise security teams face a new class of threat that their existing monitoring infrastructure was not designed to detect. Conventional EDR and SIEM tooling, configured for traditional process-level monitoring, has limited visibility into promptware-based C2 because no malware binary is deployed and the C2 channel operates through the agent’s own legitimate communication pathways. Behavioral anomaly detection at the network and process level may surface some indicators, but requires purpose-built rules targeting agent execution patterns. Organizations deploying agentic AI workflows should use this research as a catalyst to prioritize agent-specific monitoring, capability restriction, and architectural isolation before attackers operationalize these techniques at scale.


Background

Promptware as a New Malware Class

The concept of promptware — prompt injection payloads that exhibit the behavioral complexity of traditional malware — has evolved through several distinct phases since Kai Greshake and colleagues first demonstrated indirect prompt injection against Bing Chat in February 2023 [3]. Early demonstrations were single-stage: an attacker embedded a hidden instruction in a webpage, the AI agent processed it, and an unintended action occurred. The payload was static, the effect was immediate, and the compromise ended when the session closed. These initial proofs of concept established the foundational vulnerability — that LLMs cannot reliably distinguish between developer instructions and adversarial text embedded in retrieved content — but they did not demonstrate the sustained, evolving, attacker-directed behavior that characterizes traditional malware campaigns. Palo Alto Networks Unit 42 has since confirmed that this class of attack has moved from theoretical risks to active weaponization, documenting twelve real-world cases of web-based indirect prompt injection observed in the wild as of March 2026 [18].

The transition from single-stage injection to multi-stage promptware accelerated through 2024 and 2025 as AI agents gained persistent memory, broader tool access, and multi-agent coordination capabilities. Brodt, Feldman, Schneier, and Nassi formalized this evolution in January 2026 with the Promptware Kill Chain (arXiv:2601.09625), analyzing at least twenty-one multi-stage attacks and identifying a seven-phase framework: Initial Access, Privilege Escalation (jailbreaking), Reconnaissance, Persistence, Command-and-Control, Lateral Movement, and Actions on Objective [4]. The analysis found that persistence appeared in a substantial proportion of documented attacks, lateral movement grew significantly from zero documented cases in 2023, and the C2 stage — where behavior shifts from static payload execution to dynamic, attacker-modifiable direction — emerged as the capability that transforms a one-shot injection into a controllable trojan [4]. Bruce Schneier endorsed the framework in a February 2026 blog post, characterizing promptware as “a distinct class of malware execution mechanisms” that mirrors the multi-stage progression of traditional campaigns [5]. The growing recognition of this threat class has been amplified by industry coverage characterizing the AI industry as broadly unprepared for promptware-based attacks [19].

Rehberger’s own research trajectory illustrates this progression. His 2024 Claude Computer Use ZombAI demonstration showed that a malicious webpage could instruct Claude to download and execute a Sliver C2 binary — a straightforward social engineering attack that relied on Claude interpreting a download instruction as a legitimate help request [6]. The 2025 SpAIware/ChatGPT ZombAI attack introduced persistence through memory injection and dynamic C2 through sequentially numbered GitHub issues that the compromised agent would fetch as instructions [7]. The August 2025 Month of AI Bugs documented 29 prompt injection vulnerabilities across 13 major AI agent platforms, establishing that the attack surface was ecosystem-wide rather than vendor-specific [8]. Agent Commander, published in March 2026, synthesizes these individual techniques into a unified operational framework that demonstrates what a real-world attacker campaign would look like.

The Shift from Individual Exploits to Operational C2

Agent Commander’s significance lies not in any single novel technique but in its demonstration that the individual components documented across two years of research — initial access via indirect prompt injection, persistence via memory and configuration manipulation, C2 via natural language tasking, and multi-agent coordination — compose into a coherent operational capability. Traditional C2 frameworks like Cobalt Strike and Sliver provide operators with a dashboard, agent management, tasking queues, and post-exploitation modules. Agent Commander provides the same operational structure, but the agents are AI systems rather than malware implants, the tasking language is English rather than compiled commands, and the C2 channel operates through the agents’ own tool-call infrastructure rather than covert network tunnels.

This represents a qualitative shift in the threat model. Prior to Agent Commander, each ZombAI demonstration required the researcher to manually craft and deliver an injection, observe the result, and repeat the process for each target agent. Agent Commander automates this cycle: compromised agents check in to the C2 server on a regular cadence, receive new tasking, execute it through their native tool capabilities, and report results [1]. The operator manages multiple compromised agents simultaneously from a single interface, can issue different objectives to different agents based on their capabilities and access, and can modify tasking in real time as objectives evolve. The barrier to entry shifts accordingly — an attacker needs to craft effective prompt injections but does not need to write, compile, or deploy traditional malware. The skill set required is fundamentally different from traditional malware development, though the exact difficulty threshold remains an open question as the field matures.


Security Analysis

The Agent Commander Architecture

Agent Commander operates as a C2 server where compromised AI agents check in for tasking delivered through natural language rather than raw operating system commands [1]. The architecture targets the agent execution layer as a computing abstraction above the traditional operating system — a layer where commands are interpreted by an LLM rather than a CPU, where persistence is achieved through configuration files and memory stores rather than registry keys and scheduled services, and where lateral movement occurs through tool-call chains and inter-agent communication rather than network exploitation.

Rehberger demonstrated the framework against three distinct agent architectures, each compromised through a different initial access vector. The following table summarizes the compromise vectors and their characteristics.

Agent Platform Initial Access Vector Mechanism Human Interaction Required
Kimi Claw Document analysis File containing embedded prompt injection submitted for processing Yes (file submission)
OpenClaw Email delivery GCP Pub/Sub proactive notification processing No (autonomous workflow)
NanoClaw Website visit Indirect prompt injection in page content Minimal (browsing)

All three agents, despite running on different platforms with different underlying models and different tool configurations, successfully enrolled in the Agent Commander C2 network and began accepting tasking from the operator dashboard [1].

The multi-agent, multi-vector nature of the demonstration carries significant implications for enterprise environments that increasingly deploy multiple AI agent platforms for different functions — coding assistants, email triage, document analysis, customer service, DevOps automation. Agent Commander shows that an attacker who compromises agents across these platforms can manage them through a single interface, leveraging each agent’s specific capabilities and access for different objectives. A compromised email agent provides inbox access; a compromised coding agent provides repository and shell access; a compromised document analysis agent provides access to confidential files. The operator orchestrates these capabilities in combination, in a manner parallel to how a traditional red team operator coordinates different implants on different systems.

Persistence Through Heartbeat Exploitation

The persistence mechanisms demonstrated in Agent Commander exploit built-in agent management features rather than requiring any modification to the host operating system. In OpenClaw, the primary persistence vector targets the HEARTBEAT.md file — a configuration artifact that the agent processes every 30 minutes by default as a health check mechanism [1][2]. By injecting instructions into the heartbeat configuration, the attacker ensures that compromised behavior is re-executed on a regular cadence without requiring a new injection event.

Several technical details of the heartbeat exploitation are significant for defenders. OpenClaw supports response suppression through a HEARTBEAT_OK parameter that causes the agent to suppress user-visible output from heartbeat processing [1]. This means the injected instructions execute silently — the user sees no output, no notification, and no indication that the heartbeat cycle is doing anything other than confirming the agent’s health. OpenClaw also allows configuration of cheaper, less capable model variants for heartbeat processing as a cost optimization. Rehberger discovered that these cheaper models are significantly more susceptible to prompt injection than the primary models used for interactive tasks, creating a dynamic where cost optimization directly degrades security posture [1]. Additionally, Rehberger observed that agents sometimes normalize malicious instructions over repeated executions — partial compliance in early heartbeat cycles gives way to fuller compliance as the injected instructions become part of the agent’s established behavioral context [1].

In NanoClaw, persistence was achieved through a different mechanism: the indirect prompt injection added persistent scheduled tasks that fire regularly without visible user notifications [1]. The scheduled task mechanism is native to the agent platform — no operating system scheduler is involved — making it invisible to conventional endpoint monitoring that watches for cron job or Windows Task Scheduler modifications.

A third persistence vector applies across agent platforms and is particularly difficult to detect. Agents that can write to files, databases, or memory definition files can establish backdoors through configuration drift. An agent instructed to modify its own configuration file, add entries to its memory store, or alter its tool definitions creates persistence that survives agent restarts and session boundaries [1]. This form of configuration-level persistence operates entirely within the agent’s authorized capabilities — the agent is doing exactly what it is designed to do (modifying its own files), but the instructions directing it originate from the attacker rather than the user.

Detection Evasion and the Visibility Gap

Agent Commander’s detection evasion techniques exploit the fundamental gap between what conventional security tooling monitors and where promptware operates. Traditional EDR watches for malicious process execution, suspicious file modifications, anomalous network connections, and indicators of compromise at the operating system level. Promptware-based C2 generates few if any of these conventional signals. The agent process itself is legitimate. The tool calls the agent makes — reading files, sending HTTP requests, executing commands — are the same calls it makes during normal operation. The C2 traffic flows through the agent’s own communication channels to destinations that may be legitimate services.

Rehberger documented several specific evasion techniques. User notification suppression via configuration flags (NO_REPLY, HEARTBEAT_OK) ensures that compromised agent activity produces no visible output that would alert the user [1]. The use of cheaper model variants in heartbeat routines reduces the likelihood that the model itself will recognize and refuse malicious instructions [1]. Message propagation between sub-agents and main agents — where a compromised sub-agent passes a sanitized summary to the main agent — can bypass sandbox boundaries that would otherwise contain the compromise [1]. The researcher also noted that partial instruction following creates unpredictable agent behavior that complicates both automated detection and manual investigation: a compromised agent may comply with some attacker instructions and refuse others, producing a behavioral pattern that does not clearly match either benign or malicious profiles [1].

The detection gap is compounded by the probabilistic nature of LLM behavior. Unlike traditional malware that executes deterministically, promptware produces approximate outcomes. Rehberger estimated that attack success is probabilistic — using approximately 50% as an illustrative figure — meaning that attackers trade precise control for breadth of targeting [1]. This probabilistic behavior means that signature-based detection, which relies on identifying specific sequences of actions, is poorly suited to identifying promptware C2 activity. A compromised agent may execute the attacker’s instructions differently each time, with different tool calls, different orderings, and different levels of compliance.

Post-Exploitation Capabilities and Living-Off-Agents

Once agents are enrolled in the Agent Commander network, the operator can issue natural language objectives that the agents fulfill using their native capabilities. Rehberger demonstrated a range of post-exploitation tasking that included host enumeration with executive-level summarization, email inbox screenshot capture and exfiltration, monitoring of third-party websites for changes, multi-system source code discovery and theft, and influence campaign execution including social media posts and ad-click fraud [1]. The research also demonstrated image upload and download capabilities, suggesting that multimodal agents introduce additional exfiltration channels beyond text.

The concept Rehberger describes as “living-off-agents” parallels the living-off-the-land techniques that have dominated advanced persistent threat operations for the past decade [1]. In living-off-the-land attacks, threat actors avoid deploying custom malware by using legitimate system administration tools — PowerShell, WMI, PsExec — to achieve their objectives. Living-off-agents applies the same principle to the agent layer: rather than deploying traditional C2 implants, the attacker instructs the compromised agent to fulfill objectives using the tools and access it already possesses. The agent reads files using its file access tool, sends data using its HTTP tool, executes commands using its shell tool, and communicates with other systems using its API integrations. Every action is taken by a legitimate process using authorized capabilities, making attribution and detection significantly more difficult with current tooling than detecting traditional post-exploitation tools that introduce foreign binaries.

A particularly concerning finding involved NanoClaw’s architecture: because Claude generates custom agents on-demand with no traditional installer, skills can be dynamically added at runtime through commands such as /add-telegram [1]. This means a compromised agent can expand its own capability set in response to attacker tasking, acquiring new tools that the original deployment did not include. Additionally, Rehberger identified a security issue where inference requests without an explicit model specification fall back to older Claude versions (Sonnet 3.5), which are significantly easier to exploit than current models [1]. This default behavior means that agents running on the latest, most hardened models may still process some requests through older, more vulnerable model versions.

Relationship to Prior ClawJacked and ClineJection Research

Agent Commander builds on and extends vulnerabilities previously documented in the ClawJacked and ClineJection disclosures. ClawJacked, disclosed February 26, 2026 by Oasis Security, demonstrated that OpenClaw’s local WebSocket gateway could be hijacked by any malicious website through a chain of missing origin enforcement, disabled rate limiting for loopback connections, and automatic device pairing [9]. ClineJection, disclosed in February 2026 by Adnan Khan, showed that prompt injection via GitHub issue titles could compromise the Cline AI coding assistant’s CI/CD pipeline, leading to the exfiltration of publication credentials and the distribution of a backdoored npm package to approximately 4,000 developer machines [10].

Where ClawJacked and ClineJection demonstrate specific vulnerability chains that enable initial compromise of individual agents, Agent Commander demonstrates what an attacker does after achieving that initial access at scale. The progression from vulnerability disclosure to operational framework mirrors the traditional security research lifecycle: individual CVEs identify entry points, while C2 frameworks operationalize those entry points into sustained campaigns. OpenClaw’s own security advisory history — hundreds of vulnerabilities patched and advisories issued in just a few weeks, as Rehberger observed [1] — suggests that the initial access attack surface remains substantial despite active remediation efforts, and that Agent Commander-style operational frameworks will have no shortage of entry vectors to exploit.

The Normalization of Deviance Risk

Rehberger raised a behavioral security concern that transcends the technical attack surface: the normalization of deviance in human-agent trust relationships [1]. The normalization of deviance — a phenomenon first analyzed by sociologist Diane Vaughan in the context of the Challenger disaster and subsequently documented across safety-critical industries including healthcare and aviation — poses a significant risk in human-agent interactions. As users become accustomed to AI agents performing tasks autonomously, there is a substantial risk that they gradually reduce their scrutiny of agent behavior. An agent that occasionally produces unexpected output, accesses resources the user did not explicitly request, or takes actions that are approximately but not precisely aligned with the user’s intent is tolerated as a feature of probabilistic AI behavior rather than investigated as a potential indicator of compromise. This normalization creates an environment where compromised agent behavior blends into the background noise of normal agent imprecision. NIST’s CAISI division has separately published research on strengthening AI agent hijacking evaluations, recognizing that current evaluation methodologies may underestimate the practical impact of agent compromise scenarios [20].

The implication for enterprise security is that user vigilance cannot be relied upon as a detection mechanism. Users who have been trained to accept that AI agents are imprecise — that they sometimes misinterpret instructions, take unnecessary steps, or produce outputs that require correction — are unlikely to notice when that imprecision is attacker-directed rather than stochastic. Detection must therefore be implemented at the system level through prompt monitoring, behavioral baselining, and configuration integrity verification rather than delegated to end users who are psychologically primed to accept the very behaviors that indicate compromise.


Recommendations

Immediate Actions

Organizations deploying AI agents with tool access should immediately audit the capabilities granted to each agent and revoke any capability not strictly required for the agent’s declared function. Agent Commander’s effectiveness depends on compromised agents having broad tool access — shell execution, file system read/write, network connectivity, API integrations — that can be repurposed for attacker objectives. An agent configured solely to summarize documents has limited value as a C2 node; an agent configured with shell access, outbound HTTP, and email capabilities provides an attacker with a fully functional implant. The principle of least capability provides foundational risk reduction that complements all other mitigations: an agent with restricted capabilities limits attacker options regardless of whether other controls are in place.

For organizations running OpenClaw specifically, the HEARTBEAT.md configuration should be reviewed immediately for injected content, and heartbeat processing should be configured to use the same model version as interactive tasks rather than cheaper variants [1]. The heartbeat interval should be evaluated against the organization’s risk tolerance — longer intervals reduce the cadence at which injected instructions execute, though they also reduce the agent’s health monitoring responsiveness. Any auto-approval mechanisms for device pairing, tool activation, or capability expansion should be disabled and replaced with explicit user confirmation requirements.

Security operations teams should assess whether their current monitoring infrastructure has any visibility into AI agent behavior. If agent activity is not logged — prompts received, tool calls made, external requests initiated, files accessed, commands executed — the organization has no ability to detect or investigate promptware-based compromise. Implementing agent activity logging should be treated as an urgent gap closure, with logs stored in a system that the agent itself cannot access or modify.

Short-Term Mitigations

At the architectural level, organizations should enforce separation between the agent’s planning context (where it reasons about what to do) and its execution environment (where it actually takes actions). High-consequence actions — code execution, outbound data transmission, credential use, file modification, email sending — should require human confirmation implemented at the system level rather than as an instruction to the model. A confirmation gate that exists outside the agent’s context window cannot be bypassed through prompt injection that instructs the model to skip confirmation.

Network egress filtering should be applied to agent execution environments. Agents should be permitted to make outbound connections only to explicitly approved domains and endpoints. The ZombAI demonstrations and Agent Commander both rely on compromised agents being able to reach attacker-controlled infrastructure; egress controls that restrict outbound connectivity to a known-good allowlist degrade C2 channel establishment even when an injection succeeds. For agents that require broad internet access as part of their function, a logging proxy provides visibility into what destinations the agent contacts and what data it transmits.

Organizations should implement configuration integrity monitoring for agent configuration files, memory stores, tool definitions, and scheduled task configurations. Changes to these artifacts should generate alerts that are reviewed by security operations, with particular attention to modifications that add new capabilities, alter heartbeat or scheduled task behavior, reference external URLs, or contain instruction-like text. The HEARTBEAT.md exploitation and scheduled task injection demonstrated in Agent Commander both modify configuration artifacts that would be detectable through file integrity monitoring — but only if that monitoring extends to the agent layer, which is not covered by default EDR configurations.

Prompt monitoring — logging and analyzing the content of every prompt delivered to production agents — provides the behavioral detection layer that compensates for the inability of signature-based tools to identify promptware. Anomalous prompt patterns, including instructions that reference external C2-like infrastructure, commands to suppress user notifications, instructions to modify the agent’s own configuration, and reconnaissance-oriented queries about available tools and connected systems, should be flagged for investigation. Microsoft’s approach of detecting UPIA and XPIA activity through behavioral correlation rather than payload signatures provides a reference architecture for this capability [11].

Strategic Considerations

Agent Commander demonstrates that the threat model for agentic AI deployment must expand beyond individual vulnerability remediation to encompass operational attack frameworks. Just as the security community treats Cobalt Strike and Sliver as indicators that the adversary has moved beyond opportunistic exploitation to sustained campaign operations, the emergence of promptware C2 frameworks signals that AI agent compromise is transitioning from research curiosity to operational tradecraft. Organizations should update their threat models, red team scenarios, and incident response playbooks to account for this transition.

The theoretical difficulty underlying promptware defense remains fundamental: current LLM architectures process developer instructions and untrusted retrieved content through the same inference mechanism, with no hardware or cryptographic separation between the two. The challenge is analogous to classical undecidability results in computability theory — no filtering, training, or classifier approach has been shown to guarantee prevention, and the theoretical landscape suggests that complete prevention may be fundamentally intractable. Defense strategies must therefore assume that prompt injections will succeed at some rate — Anthropic has publicly acknowledged a residual attack success rate of approximately 1% even after extensive hardening [13], and OpenAI has documented ongoing efforts to reduce prompt injection success rates in its agent deployments [12] — and focus on minimizing the blast radius through capability restriction, sandboxing, human confirmation gates, and behavioral monitoring.

The living-off-agents paradigm introduces a detection challenge analogous to living-off-the-land: when the attacker’s actions are indistinguishable from the agent’s legitimate operations, detection requires understanding intent rather than observing behavior. This is a problem that the security community has not fully solved for traditional living-off-the-land attacks, and it is likely to be even more difficult for agent-layer attacks where the behavior is inherently probabilistic. Organizations should invest in behavioral baselining for their agent deployments — understanding what normal agent activity looks like in terms of tool call patterns, data access volumes, external communication destinations, and execution timing — so that deviations can be identified when they occur.

Rehberger noted that Agent Commander remains unpublished, with selective access limited to defensive and authorized red team purposes [1]. History suggests that the techniques it demonstrates will be independently developed and potentially weaponized by threat actors regardless of publication status. The window between defensive awareness and adversarial adoption is the period during which organizations should implement the controls described above.


CSA Resource Alignment

The Agent Commander research directly engages several active CSA frameworks and publications that provide structured guidance for the threat class it represents.

The MAESTRO framework (Multi-Agent Environment Security Threat and Risk Overview) is the most directly applicable CSA resource for analyzing promptware C2 risk across the agentic AI architecture stack. MAESTRO’s seven-layer model provides threat modeling templates that map precisely to the Agent Commander attack chain: Layer 2 (Model Services) addresses the prompt injection vulnerability at the inference layer, Layer 4 (Agent Runtime) addresses the tool access and capability configuration that determines what a compromised agent can do, and Layer 6 (Orchestration) addresses the multi-agent coordination that Agent Commander exploits when compromised agents interact with other agents or services [14]. MAESTRO’s guidance on agent capability sandboxing — restricting tool access to the minimum required by the declared task — provides the architectural foundation for the least-capability recommendations in this note. The February 2026 MAESTRO update extended the framework’s application to real-world CI/CD and pipeline threat models, directly relevant to organizations integrating AI agents into development and operational workflows [15].

The CSA Agentic AI Red Teaming Guide (2025) includes 629 structured test cases across categories that directly map to Agent Commander’s demonstrated capabilities [16]. The following table maps Agent Commander’s attack techniques to the corresponding Red Teaming Guide categories.

Agent Commander Technique Red Teaming Guide Category Applicability
Document/email/website injection Agent authorization and control hijacking Initial compromise vectors
HEARTBEAT.md exploitation Memory poisoning and context manipulation Persistence mechanisms
Configuration drift persistence Memory poisoning and context manipulation Long-term backdoors
Multi-platform C2 enrollment Multi-agent exploitation Coordinated attack scenarios
Living-off-agents post-exploitation Supply chain dependency Capability abuse

Organizations should use these test cases as the basis for red team exercises that specifically attempt to replicate Agent Commander-style multi-agent C2 enrollment.

The OWASP Top 10 for Agentic Applications (2026) addresses the agentic-specific risk dimensions through ASI01 (Agent Goal Hijack), which describes the precise mechanism Agent Commander exploits — redirecting an agent’s objective from user-directed tasks to attacker-directed C2 participation [17]. ASI06 (Memory Poisoning) addresses the persistence mechanisms, and ASI07 (Insecure Inter-Agent Communication) addresses the lateral movement risk when compromised agents pass instructions to other agents through shared communication channels.

The Cloud Controls Matrix (CCM) v4.0 applies through its threat and vulnerability management domain (TVM), identity and access management domain (IAM), and data security domain (DSP). The C2 channel establishment demonstrated in Agent Commander relies on agents having unrestricted internet access; CCM’s network controls guidance supports the egress filtering recommendations above. The IAM domain’s principle of least privilege applies directly to agent capability configuration, and organizations should extend IAM controls to treat AI agents as distinct principals with independently scoped and auditable permissions rather than entities that inherit their user’s full access.

CSA Zero Trust Guidance applies to the trust boundaries between AI agents and the systems they access. Agent Commander’s architecture fundamentally challenges Zero Trust principles: a compromised agent operates with its user’s full permission set, accessing systems and data based on persistent trust rather than continuous verification. Zero Trust architecture should be extended to model the AI agent as a separate principal that requires its own per-request verification, with particular attention to actions that could constitute C2 behavior — outbound data transmission, configuration modification, and capability expansion.


References

[1] Johann Rehberger (wunderwuzzi), “Agent Commander: Promptware-Powered Command and Control,” Embrace The Red, March 16, 2026. https://embracethered.com/blog/posts/2026/agent-commander-your-agent-works-for-me-now/

[2] Johann Rehberger, “AgentHopper: An AI Virus,” Embrace The Red, 2025. https://embracethered.com/blog/posts/2025/agenthopper-a-poc-ai-virus/

[3] Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz, “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” arXiv:2302.12173, February 2023. https://arxiv.org/abs/2302.12173

[4] Oleg Brodt, Elad Feldman, Bruce Schneier, and Ben Nassi, “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a New Class of Cyberattacks,” arXiv:2601.09625, January 2026. https://arxiv.org/abs/2601.09625

[5] Bruce Schneier, “The Promptware Kill Chain,” schneier.com, February 2026. https://www.schneier.com/blog/archives/2026/02/the-promptware-kill-chain.html

[6] Johann Rehberger, “ZombAIs: From Prompt Injection to C2 with Claude Computer Use,” Embrace The Red, 2024. https://embracethered.com/blog/posts/2024/claude-computer-use-c2-the-zombais-are-coming/

[7] Johann Rehberger, “AI Domination: Remote Controlling ChatGPT ZombAI Instances,” Embrace The Red, 2025. https://embracethered.com/blog/posts/2025/spaiware-and-chatgpt-command-and-control-via-prompt-injection-zombai/

[8] Johann Rehberger, “Agentic ProbLLMs: The Month of AI Bugs 2025,” monthofaibugs.com, August 2025. https://monthofaibugs.com/

[9] Oasis Security, “ClawJacked: OpenClaw Vulnerability Enables Full Agent Takeover,” Oasis Security Blog, February 26, 2026. https://www.oasis.security/blog/openclaw-vulnerability

[10] Adnan Khan, “Clinejection: From GitHub Issue Title to Credential Exfiltration,” February 2026. Referenced in CSA Research Note CSA_research_note_clinejection_prompt_injection_cicd_cache_poisoning_20260310.

[11] Microsoft MSRC, “How Microsoft Defends Against Indirect Prompt Injection Attacks,” microsoft.com, July 2025. https://www.microsoft.com/en-us/msrc/blog/2025/07/how-microsoft-defends-against-indirect-prompt-injection-attacks

[12] OpenAI, “Continuously hardening ChatGPT Atlas against prompt injection attacks,” openai.com, December 2025. https://openai.com/index/hardening-atlas-against-prompt-injection/

[13] Anthropic, “Prompt Injection Defenses for Claude,” anthropic.com, November 2025. https://www.anthropic.com/research/prompt-injection-defenses

[14] Cloud Security Alliance, “Agentic AI Threat Modeling Framework: MAESTRO,” cloudsecurityalliance.org, February 6, 2025. https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro

[15] Cloud Security Alliance, “Applying MAESTRO to Real-World Agentic AI Threat Models: From Framework to CI/CD Pipeline,” cloudsecurityalliance.org, February 11, 2026. https://cloudsecurityalliance.org/blog/2026/02/11/applying-maestro-to-real-world-agentic-ai-threat-models-from-framework-to-ci-cd-pipeline

[16] Cloud Security Alliance, “Agentic AI Red Teaming Guide,” cloudsecurityalliance.org, 2025.

[17] OWASP GenAI Security Project, “Top 10 for Agentic Applications for 2026,” genai.owasp.org, December 2025. https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

[18] Palo Alto Networks Unit 42, “Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild,” unit42.paloaltonetworks.com, March 2026. https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/

[19] BankInfoSecurity, “‘Promptware’ Attacks Await an Unprepared AI Industry,” bankinfosecurity.com, 2026. https://www.bankinfosecurity.com/pomptware-attacks-await-unprepared-ai-industry-a-30785

[20] NIST CAISI, “Technical Blog: Strengthening AI Agent Hijacking Evaluations,” nist.gov, January 2025. https://www.nist.gov/news-events/news/2025/01/technical-blog-strengthening-ai-agent-hijacking-evaluations


This research note was produced by the Cloud Security Alliance AI Safety Initiative. It represents point-in-time analysis as of 2026-03-17. The field is evolving rapidly; organizations are encouraged to monitor CSA publications and the referenced sources for updated guidance.

← Back to Research Index