Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-03-19

Categories: AI Security, Threat Intelligence, Agentic AI, Multi-Agent Security, Supply Chain Security

ClawWorm: Self-Propagating Worm Attacks on LLM Agent Ecosystems

Key Takeaways

Researchers from Peking University, Sun Yat-sen University, Wuhan University, Tsinghua University, and Singapore Management University have demonstrated what the authors characterize as the first academically documented self-propagating worm designed to exploit production-scale LLM agent infrastructure [1], targeting OpenClaw’s more than 40,000 internet-exposed instances. Named ClawWorm, the attack achieves an 85% aggregate success rate across experimental trials and spreads autonomously from agent to agent through a broadcast mechanism that simultaneously infects all co-resident agents in a shared group context—a fundamental departure from traditional worm propagation that moves host to host sequentially. The research establishes that the architectural trust assumptions baked into current agentic frameworks are systematically exploitable and that a single adversarial payload can trigger a cascade of infection without any further attacker interaction.

ClawWorm targets OpenClaw, an open-source local AI agent platform with over 40,000 internet-exposed instances and 150,000 GitHub stars as of early 2026 [1]. OpenClaw integrates with WhatsApp, Telegram, Slack, and Discord, and provides agents with persistent configuration storage, tool-execution privileges, and the ability to install third-party skill packages from the ClawHub marketplace. These features—precisely the capabilities that make OpenClaw useful—are the attack surface that ClawWorm exploits. The worm’s three-phase cycle (persistence, execution, propagation) is triggered by a single message delivered to any agent in the ecosystem, after which it replicates autonomously across connected peers without requiring further attacker input.

The ClawWorm paper arrives in a context of documented real-world incidents that corroborate the threat classes it formalizes. The ClawHavoc supply chain campaign distributed 1,184 malicious skill packages across ClawHub between January 27 and February 5, 2026 [2]; independent research confirmed at least 341 of these packages as malicious [3]. CVE-2026-25253, a zero-click remote code execution vulnerability in OpenClaw’s Control UI, left 63% of exposed instances vulnerable before patching [4][5][15]. Security researchers from Simula Research Laboratory, as reported by CyberSecureFox [6], identified hidden prompt injection payloads in approximately 2.6% of sampled Moltbook posts—an agent-oriented social network—demonstrating that worm-like prompt propagation is already occurring in production environments at scale. ClawWorm provides the formal academic analysis of how these attack classes converge into a fully autonomous, self-replicating threat.

Background

The Architecture That Enables Agent Worms

To understand why ClawWorm is possible, it is necessary to understand the structural condition that all current LLM agent platforms share. When a user or developer configures an AI agent, the platform combines the developer’s system instructions, the user’s requests, content retrieved from external sources, and outputs from tool invocations into a single context window. The underlying language model processes this entire context without any architectural boundary that enforces a distinction between authoritative developer instructions and untrusted external content. If a piece of content that an agent retrieves contains text that resembles an instruction, the model evaluates it using the same mechanism it uses to evaluate the developer’s system prompt.

This condition—indirect prompt injection—was formally analyzed by Kai Greshake and colleagues in February 2023, who demonstrated that hidden instructions embedded in webpages would cause Bing Chat to adopt covert objectives and exfiltrate user data through manipulated links [7]. Researchers have argued that no reliable filter can eliminate this condition because determining whether an arbitrary string constitutes a malicious instruction is semantically—rather than syntactically—a problem equivalent to halting, and thus intractable in the general case. While practical mitigations can substantially reduce exploitation rates, they cannot provide a formal elimination guarantee. Most agentic platforms built on transformer-based LLMs share this architectural property to varying degrees, though the specific exploitability depends on the capabilities the platform grants its agents and the protections it applies to sensitive resources such as configuration files and skill execution privileges.

OpenClaw’s design makes several choices that are individually reasonable but collectively create a compoundable attack surface. Configuration files are loaded unconditionally at startup without integrity verification. Skill packages installed from ClawHub execute with full agent privileges without sandboxing. Tool invocations—including shell execution and outbound HTTP requests—are authorized solely by the LLM’s own reasoning rather than by an independent policy engine. Group chat events trigger passive ingestion by all co-resident agents simultaneously through an event-listener architecture. Each of these design properties is exploited by at least one phase of the ClawWorm attack cycle.

The Lineage of Self-Replicating AI Attacks

ClawWorm sits at the end of an accelerating two-year research trajectory on self-propagating AI attacks. Morris II, published in March 2024 by Stav Cohen, Ron Bitton, and Ben Nassi, was the first worm designed explicitly for GenAI ecosystems [8]. It operated by embedding adversarial self-replicating prompts in email content that would contaminate a RAG database; subsequent retrievals would re-surface the poisoned data in new queries, sustaining propagation through passive contamination of shared memory. Morris II also demonstrated image-based propagation, where adversarial perturbations embedded in image attachments would steer multimodal agents to forward emails to additional recipients without user interaction. The research was disclosed to OpenAI and Google through coordinated vulnerability disclosure.

Prompt Infection, published in October 2024 by Donghyun Lee and Mo Tiwari, extended the model to direct LLM-to-LLM propagation in multi-agent architectures [9]. A compromised agent would replicate the malicious prompt into messages sent to adjacent agents, coordinating infected agents to exchange data and issue instructions to agents with specific tool access—enabling data theft, misinformation injection, and financial scams while propagating covertly even when agents do not share their full communications. Lee and Tiwari proposed “LLM Tagging” as a mitigation, finding that it significantly reduced but did not eliminate propagation rates when combined with existing safeguards. The broader trajectory of this research line—from passive RAG contamination to direct agent-to-agent replication—was characterized in January 2026 by Brodt, Feldman, Schneier, and Nassi, whose Promptware Kill Chain analysis documented how prompt injection attacks have evolved into multi-stage malware delivery mechanisms capable of chaining initial access, persistence, and payload delivery across LLM-mediated environments [13].

ClawWorm differs from both predecessors in two structural dimensions that expand the threat’s practical scope. First, it targets a production-scale platform with tens of thousands of real-world deployments rather than a constructed experimental environment. Second, it replaces the linear (point-to-point or RAG-mediated) propagation of Morris II and Prompt Infection with hypergraph broadcast propagation, where a single payload delivered to a group context is simultaneously ingested by all co-resident agents through OpenClaw’s event listener architecture—compressing the per-hop infection from a sequential process into a parallel broadcast event.

Security Analysis

The Three-Phase Attack Cycle

ClawWorm executes through three phases that operate in sequence once triggered by a single initial message. Understanding the mechanics of each phase is necessary to appreciate why the attack is difficult to interrupt at any single point in the chain.

The persistence phase exploits OpenClaw’s startup configuration mechanism using what the researchers term a “dual-anchor” technique. One anchor writes the worm payload into the Session Startup section of the core configuration file, ensuring re-execution on every reboot. A second anchor injects a join-event trigger into the Group Chats configuration section, which activates the worm during active sessions whenever a new agent joins the group. The core configuration file is loaded unconditionally at platform startup without any integrity verification, meaning that once the worm has modified it, the modification persists indefinitely across restarts. The researchers measured a 92% persistence rate across their experimental trials [1].

The execution phase delivers an arbitrary payload on each subsequent startup. The ClawWorm paper tested three payload categories: P1 performs reconnaissance by executing shell commands to collect host identity information; P2 performs resource exhaustion by spawning a process that consumes CPU, memory, and LLM token budget; and P3 implements command-and-control by fetching attacker-supplied instructions from a remote URL at each activation [1]. The P3 payload is particularly significant because it bypasses shell-execution defenses by routing the outbound request through the agent’s HTTP retrieval capability rather than a shell command—illustrating that any agent capable of making web requests is capable of receiving dynamic attacker instructions even if direct shell access is restricted. This capability equivalence means that restricting tool categories individually, without an overarching policy engine, leaves a functionally equivalent C2 channel intact. The overall execution rate was 85% across experimental conditions [1].

The propagation phase is where ClawWorm diverges most sharply from traditional worm behavior. Three infection vectors were tested. Vector A delivers the worm by instructing the infected agent to share an attacker-controlled URL; when peers retrieve it, they receive a configuration containing the malicious payload. Vector B—the highest-performing vector at a 95% attack success rate—exploits the ClawHub skill marketplace: the infected agent recommends installation of an attacker-published malicious skill package to group peers, who install it with full platform privileges [1]. Vector C enables direct instruction replication, where the infected agent extracts its own payload and sends it as a code block accompanied by a verification mechanism that requests the recipient paste back an updated configuration—achieving social-engineering-based propagation through the agent’s conversational interface.

The hypergraph propagation architecture underlying all three vectors distinguishes ClawWorm from its predecessors. Traditional worm propagation scans for and infects individual hosts sequentially, with infection depth growing one hop per transmission cycle. ClawWorm’s exploitation of OpenClaw’s group event-listener architecture means that a single payload delivered to a group chat is passively received by all co-resident agents simultaneously. A single infected agent in a group of ten agents does not infect one agent per cycle—it broadcasts infection to all nine peers in a single event. This compresses the time to ecosystem-wide infection in proportion to group size—a group of ten agents is infected in one broadcast event rather than ten sequential hops. The researchers observed a mean propagation depth of 4.0 hops with a maximum of 5 hops. When configuration modification succeeded, propagation proceeded at a 100% rate—meaning that the primary constraint on ecosystem-wide infection was the persistence phase, not the propagation phase itself [1].

Trust Boundaries and Why They Fail

The ClawWorm researchers identified five structural trust boundaries that current agent architectures rely upon, and documented how each is defeated [1]. The context boundary fails because the LLM cannot distinguish token provenance—messages from group participants carry the same weight as system instructions, so an adversarial message originating from another user in the group cannot be rejected as untrusted. The configuration boundary fails because the core configuration file is loaded unconditionally at startup without integrity verification, making it a persistent and privileged target for modification. The skill boundary fails because skill definitions can modify core configuration, enabling an attack to escalate from medium trust (skill execution) to the highest trust level (startup configuration) through a single step. The tool boundary fails because tool invocations—including shell commands and outbound HTTP requests—are authorized by the LLM’s own reasoning rather than by an independent policy engine with explicit permission requirements. The supply chain boundary fails because ClawHub packages execute with full agent privileges without sandboxing or cryptographic publisher verification.

This analysis reveals that ClawWorm does not exploit a single vulnerability—it exploits five distinct trust failures in sequence, and interrupting the chain at any one point is likely to leave the other four intact. A platform that validates configuration integrity but does not sandbox skill execution remains vulnerable to Vector B. A platform that sandboxes skill execution but does not partition context trust remains vulnerable to Vectors A and C. The researchers’ finding that ClawWorm succeeds at 85% aggregate across 180 experimental trials—covering all combinations of vectors and payloads—reflects the depth of the underlying architectural exposure rather than any single exploitable flaw.

The Real-World Incident Context

The ClawWorm research does not emerge in a vacuum. Three concurrent real-world incidents in the OpenClaw ecosystem validate its threat model against production conditions. The ClawHavoc supply chain campaign, active from January 27 through February 5, 2026, distributed 1,184 malicious skill packages across ClawHub through 12 publisher accounts [2]; Koi Security researchers independently confirmed at least 341 packages as malicious across 2,857 audited packages [3]. Antiy Labs researchers, whose analysis is detailed in [2], identified staged payload downloads, Python-based reverse shells, and direct credential-harvesting as attack techniques, with the primary macOS payload being an upgraded Atomic macOS Stealer variant targeting browser credentials, keychains, Telegram session data, SSH keys, cryptocurrency wallets, and .env files containing bot configuration secrets. This campaign represents exactly the kind of supply chain poisoning that ClawWorm’s Vector B automates and propagates autonomously.

CVE-2026-25253, disclosed February 3, 2026, demonstrated a complementary attack path: OpenClaw’s Control UI trusted a gatewayURL query parameter without origin validation and auto-connected on load, sending the stored gateway authentication token in the WebSocket connection payload [4][5][15]. A malicious web page could retrieve this token, establish an authenticated WebSocket session, disable confirmation dialogs through the operator.admin scope, escape the Docker container sandbox, and execute arbitrary commands on the host machine. At the time of disclosure, 63% of the more than 40,000 internet-exposed OpenClaw instances were unpatched and vulnerable [4][5]. A patched version was available in OpenClaw 2026.1.29, but patch adoption at that scale is neither immediate nor uniform.

Security researchers from Simula Research Laboratory, as reported by CyberSecureFox [6], identified hidden prompt injection payloads in approximately 2.6% of sampled Moltbook posts—a social network designed for AI agents. These payloads propagate when AI agents process compromised content as part of normal social network activity, representing early in-the-wild evidence that worm-like prompt propagation is already occurring through shared content channels at production scale, not solely in isolated experimental conditions.

Recommendations

Immediate Actions

Organizations deploying LLM agent platforms should treat startup configuration files as cryptographically sensitive assets. Configuration integrity verification should be implemented as a two-stage pipeline: a lightweight rule-based scanner that flags self-replicating payload signatures at load time, followed by cryptographic integrity tags verified before any startup-phase execution proceeds [1]. Any configuration modification that cannot be traced to an authenticated, authorized source should be quarantined and logged for human review rather than applied silently. This control directly interrupts the persistence phase of the ClawWorm attack cycle.

Skill and plugin installation should require human approval and minimal-privilege execution. No third-party skill package should execute with the ability to modify core configuration files, access system directories, or make outbound network connections without explicit, capability-specific authorization granted at install time. Organizations should audit existing skill installations against the ClawHub malicious package indicators published by Antiy Labs and the MITRE ATLAS OpenClaw investigation report [2][10] and remove any packages matching known-malicious publisher accounts or payload signatures.

All agents with access to group communication channels should be treated as potential propagation vectors in the same way that a shared network segment is treated in traditional network security. Group-context messages should not be capable of triggering configuration modifications or skill installations without explicit out-of-band authorization. Event listener architectures that deliver group messages to all co-resident agents simultaneously—the hypergraph property that ClawWorm exploits—should be reviewed for whether passive ingestion of group content carries execution privileges.

Short-Term Mitigations

Context privilege isolation should be implemented to partition the agent’s context window into a privileged zone (developer system prompt, authenticated operator instructions) and an untrusted zone (retrieved web content, group messages, user-provided content, skill outputs) [1]. Pre-screening of tokens entering the untrusted zone should be applied before those tokens can influence high-risk tool invocations such as shell execution, file writes to configuration directories, or outbound HTTP requests. While this partitioning cannot provide a hard security guarantee—the underlying LLM processes both zones with the same architecture—it substantially raises the cost of successful prompt injection by requiring that the payload successfully subvert classification in addition to the model’s own judgment.

Tool execution should be governed by an independent policy engine rather than delegated entirely to the LLM’s in-context reasoning. A policy engine external to the model can enforce explicit rules—such as prohibiting shell execution without an authenticated operator approval, or requiring human confirmation before any outbound request to a URL not on an approved allowlist—without relying on the model’s reasoning to correctly classify the request as high-risk. The ClawWorm P3 payload’s ability to bypass shell-execution defenses by routing through HTTP retrieval illustrates that capability-by-capability restrictions enforced only by the model are insufficient; each tool must be governed independently, and any outbound communication capability must be treated as a potential C2 channel regardless of the specific mechanism.

Platform operators should examine their deployment exposure against the CVE-2026-25253 patch status and confirm that all instances are running OpenClaw 2026.1.29 or later. Instances that cannot be immediately updated should have their Control UI network access restricted to loopback or authenticated VPN segments to eliminate the WebSocket token exfiltration attack path.

Strategic Considerations

The ClawWorm research reveals that the current generation of LLM agent platforms carries significant architectural security gaps: the five trust boundary failures it documents are structural design properties, not implementation bugs, and eliminating them requires platform-level architectural changes rather than patches to individual components. Fixing them requires architectural changes to platform design, not patches to individual components.

Organizations developing or procuring LLM agent platforms should require that platform vendors document their trust model explicitly: how are developer instructions distinguished from user content? How is user content distinguished from retrieved external data? What independent controls govern tool execution? How are skill packages verified before installation? A platform that cannot answer these questions has not modeled its threat surface. Security teams evaluating agentic AI deployments should treat the absence of a documented trust model as a high-severity finding that warrants either remediation before deployment or documented risk acceptance at an appropriate level of organizational authority. The companion analysis by Xinhao Deng and colleagues [14] provides expanded mitigation guidance specifically addressing OpenClaw’s attack surface, and organizations managing OpenClaw deployments should review its platform-specific recommendations in conjunction with the controls described here.

The supply chain dimension of the ClawWorm threat warrants sustained attention. ClawWorm’s highest-performing propagation vector exploits an agent marketplace with no cryptographic publisher verification, no sandboxed execution, and no mandatory static analysis before package publication. The same structural conditions apply to MCP server registries, where security researchers have documented proof-of-concept attack vectors including prompt injection through MCP-mediated context poisoning [11]. Any platform that allows agents to install and execute third-party code with full platform privileges is implicitly operating a software supply chain risk that the industry has two decades of experience demonstrating is exploitable at scale.

CSA Resource Alignment

The ClawWorm threat maps directly to CSA’s MAESTRO framework for agentic AI threat modeling. MAESTRO Layer 1 (Model), Layer 2 (Agent Frameworks), and Layer 4 (Deployment and Infrastructure) are each implicated: the context-trust failure is a Layer 1 architectural property; the configuration, skill, and tool boundary failures are Layer 2 platform design issues; and the supply chain and exposure at scale are Layer 4 deployment concerns. Organizations applying MAESTRO to OpenClaw-class deployments should classify multi-agent group communication channels as an explicit trust boundary requiring control validation, and should document skill marketplace integration as a supply chain risk node in their MAESTRO threat model.

CSA’s AI Organizational Responsibilities guidance is directly applicable to the governance dimension of the ClawWorm findings. The requirement that organizations document trust models for deployed AI systems, maintain inventories of connected tools and integrations, and apply vendor risk management practices to AI platform providers maps to the structural gaps ClawWorm exploits. Platform procurement decisions should include evaluation of the vendor’s documented approach to context isolation, configuration integrity, and skill package vetting.

The CSA Cloud Controls Matrix (CCM) and its superset, the AI Controls Matrix (AICM), provide control mappings relevant to three dimensions of the ClawWorm threat. Supply chain integrity controls (STA domain) apply to ClawHub-class marketplace risk; threat and vulnerability management controls (TVM domain) apply to CVE-2026-25253 patch management; and cryptography and key management controls (CRY domain) apply to the configuration integrity verification mechanism that the ClawWorm researchers propose as defense D2. Organizations mapping their control environments against the AICM should ensure that agentic AI deployments are included in scope for supply chain, vulnerability management, and configuration integrity controls—not treated as a category separate from cloud infrastructure.

OWASP’s Top 10 for LLM Applications 2025 identifies Prompt Injection (LLM01) as the leading vulnerability, and the OWASP Top 10 for Agentic Applications explicitly addresses inter-agent trust exploitation and “Agent Session Smuggling” attacks that align with ClawWorm’s lateral movement phase [12]. Organizations using OWASP’s LLM security guidance should ensure they have reviewed the agentic applications supplement and applied its recommendations to any deployment involving multi-agent coordination, group communication channels, or agent marketplace integrations.

References

[1] Yihao Zhang, Zeming Wei, Xiaokun Luan, Chengcan Wu, Zhixin Zhang, Jiangrong Wu, Haolin Wu, Huanran Chen, Jun Sun, Meng Sun. “ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems.” arXiv:2603.15727. March 16, 2026. https://arxiv.org/abs/2603.15727

[2] CyberPress. “ClawHavoc Poisons OpenClaw’s ClawHub with 1,184 Malicious Skills.” February 2026. https://cyberpress.org/clawhavoc-poisons-openclaws-clawhub-with-1184-malicious-skills/

[3] The Hacker News. “Researchers Find 341 Malicious ClawHub Skills.” February 2026. https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html

[4] SOCRadar. “CVE-2026-25253: RCE in OpenClaw via Auth Token Exfiltration.” February 2026. https://socradar.io/blog/cve-2026-25253-rce-openclaw-auth-token/

[5] ProArch. “OpenClaw RCE Vulnerability CVE-2026-25253.” February 2026. https://www.proarch.com/blog/threats-vulnerabilities/openclaw-rce-vulnerability-cve-2026-25253

[6] CyberSecureFox. “OpenClaw AI Agent Security: Malicious Skills and Prompt Worms.” February 2026. https://cybersecurefox.com/en/openclaw-ai-agent-security-malicious-skills-prompt-worms/

[7] Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. “Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.” arXiv:2302.12173. February 2023. https://arxiv.org/abs/2302.12173

[8] Stav Cohen, Ron Bitton, Ben Nassi. “Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications.” arXiv:2403.02817. March 2024. https://arxiv.org/abs/2403.02817

[9] Donghyun Lee, Mo Tiwari. “Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems.” arXiv:2410.07283. October 2024. https://arxiv.org/abs/2410.07283

[10] MITRE ATLAS. “OpenClaw Security Investigation.” PR-26-00176-1. February 2026. https://www.mitre.org/sites/default/files/2026-02/PR-26-00176-1-MITRE-ATLAS-OpenClaw-Investigation.pdf

[11] Palo Alto Networks Unit 42. “Model Context Protocol (MCP) Attack Vectors.” 2025–2026. https://unit42.paloaltonetworks.com/model-context-protocol-attack-vectors/

[12] OWASP. “Top 10 for Large Language Model Applications 2025.” https://owasp.org/www-project-top-10-for-large-language-model-applications/

[13] Oleg Brodt, Elad Feldman, Bruce Schneier, Ben Nassi. “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism.” arXiv:2601.09625. January 2026. https://arxiv.org/abs/2601.09625

[14] Xinhao Deng et al. “Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats.” arXiv:2603.11619. March 2026. https://arxiv.org/abs/2603.11619

[15] CyberSecurityNews. “OpenClaw 0-Click Vulnerability.” February 2026. https://cybersecuritynews.com/openclaw-0-click-vulnerability/

← Back to Research Index