LLM Agents as Active Post-Exploitation Tools

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-06-02

Categories: Artificial Intelligence, Threat Intelligence
Download PDF

LLM Agents as Active Post-Exploitation Tools

Key Takeaways

  • On May 10, 2026, Sysdig researchers documented the first confirmed wild intrusion in which an LLM agent autonomously drove the entire post-exploitation phase — from an unauthenticated shell via CVE-2026-39987 to full internal database exfiltration — in under one hour across four distinct pivots.
  • The attacker’s agent demonstrated adaptive, inference-driven decision-making rather than scripted playbook execution: it inferred non-standard database table names, dynamically chained outputs from prior commands into subsequent actions, and distributed API calls across eleven Cloudflare Workers egress IPs to defeat source-IP detection.
  • The Marimo case suggests LLM agents may substantially reduce post-exploitation from a skill-intensive manual activity toward an inference-budget problem — a shift of operational complexity from human expertise to compute cost. If this pattern generalizes, sophisticated multi-stage intrusions become replicable at scale with reduced adversary expertise, though one forensically documented case does not establish this as a settled pattern and warrants treating it as a credible near-term threat.
  • The AI toolchain itself — developer notebooks, orchestration frameworks, coding assistants — has become primary attack surface; AI infrastructure is not incidental to these intrusions but the initial access vector.
  • Defenders must apply runtime behavioral monitoring specifically designed to detect machine-speed, output-chained command sequences that no human operator would produce, supplementing traditional signature- and rule-based detection.

Background

Large language model agents — LLMs equipped with tool-calling capabilities that allow them to execute code, issue API requests, and interact with external systems — represent a qualitative change in the adversarial landscape rather than a quantitative one. Traditional post-exploitation activity relies on either a skilled human operator working interactively or a pre-built script that executes a fixed playbook. Both models have inherent limitations: human operators are slow, expensive, and cannot parallelize across many targets simultaneously; static scripts are brittle and fail when they encounter unexpected configurations or naming conventions. LLM agents dissolve both constraints.

An agent operating in post-exploitation mode receives a goal — enumerate the environment, harvest credentials, pivot to internal systems — and dynamically selects and sequences tools based on the output of each prior step. It can read an environment file, extract credentials it finds there, construct an API call using those credentials, interpret the response, and decide which endpoint to query next, all without human intervention and at machine speed. The agent does not need to know in advance what credentials it will find, what AWS account it will access, or what tables a downstream database contains. It infers these details from context, exactly as a skilled human penetration tester would, but without the time constraints or cognitive limitations of a human operator.

This capability has been theorized and demonstrated in controlled research environments for some time. University of Illinois researchers demonstrated that GPT-4-class models could autonomously exploit disclosed one-day vulnerabilities at an 87% success rate across tested CVEs [1][2]. Honeypot research by academic teams collected millions of automated probe attempts and detected what appeared to be AI-driven hacking agents in the wild [3]. CrowdStrike’s 2026 Global Threat Report documented an 89% year-over-year increase in AI-enabled adversary operations and noted that the average attacker breakout time — the interval between initial access and first lateral movement — had fallen to 29 minutes, a 65% improvement in speed from the prior year [4]. The theoretical and the empirical were converging. What remained missing was a forensically documented, confirmed case of an LLM agent driving post-exploitation in a real intrusion against a real target. That case arrived in May 2026.

Security Analysis

The Marimo Vulnerability and the Speed of Weaponization

Marimo is an open-source reactive Python notebook with approximately 20,000 GitHub stars, used as a modern alternative to Jupyter for interactive data science and AI development work. On April 8, 2026, security researchers disclosed CVE-2026-39987, a pre-authentication remote code execution vulnerability in Marimo rated CVSS 9.3. The flaw’s mechanics were straightforward and damning: Marimo’s /terminal/ws WebSocket endpoint, which provides an interactive PTY-backed shell to the host system, completely omitted the validate_auth() call that all other WebSocket endpoints in the application correctly invoked. Any unauthenticated actor who could reach the server’s port needed only to open a WebSocket connection to receive a full interactive shell running with the privileges of the Marimo process — no credentials, no payload crafting, no exploit tooling beyond what could be constructed directly from reading the advisory [5][6].

The vulnerability was added to CISA’s Known Exploited Vulnerabilities catalog on April 23, 2026, with a Federal Civilian Executive Branch patch deadline of May 7, 2026. That timeline understates the urgency: the first confirmed exploitation attempt occurred just 9 hours and 41 minutes after the advisory was published, before any public proof-of-concept code existed [7]. This initial session, traced to a source IP in India (likely a proxy or VPN), followed a pattern consistent with manual credential harvesting — marker strings delineating the proof-of-concept boundaries, standard reconnaissance commands, and targeted extraction of .env files containing AWS access keys. Methodical, but consistent in its sequencing and pace with human-operated credential harvesting.

The second session, observed by Sysdig on May 10, 2026, was something categorically different [8].

The First Confirmed LLM-Agent-Driven Intrusion

Sysdig’s Threat Research Team documented an attack originating from IP address 157.66.54.26 (AS141892, Indonesia) that proceeded in four pivots over the course of approximately one hour. The first pivot used CVE-2026-39987 in the same manner as the April session — a WebSocket connection to /terminal/ws, followed by id and whoami commands confirming shell access. What followed, however, bore behavioral signatures consistent with automated, inference-driven operation rather than human control — signatures Sysdig researchers characterized as diagnostic of LLM-agent activity.

The second pivot targeted credential storage locations in a comprehensive, systematically ordered sweep: /app/.env*, /etc/environment, /proc/*/environ, ~/.aws/credentials, and ~/.pgpass. This enumeration was more thorough than a typical human would execute under time pressure and followed a consistent structural pattern throughout — echo '---' delimiters between each probe, | head -N output limits of 30 to 40 lines, universal 2>/dev/null stderr suppression, and HEREDOC bundling of logically related commands. These formatting conventions — delimiter injection, bounded head limits, universal stderr suppression, HEREDOC batching — are rarely if ever seen in human-operated attacker sessions; they are consistent with an agent that requires machine-parseable outputs at each step to feed subsequent inference.

Following the credential sweep, Sysdig observed a 48-minute gap — most likely the time required for the agent to process harvested outputs, plan subsequent actions, and receive inference results. The third pivot resumed at 19:26:31 UTC. The agent began with an sts:GetCallerIdentity call to validate the scope of the extracted AWS credentials, then issued secretsmanager:ListSecrets before making 12 secretsmanager:GetSecretValue calls in 22 seconds. These calls were distributed across 11 distinct Cloudflare Workers egress IPs in the 104.28.0.0/16 range — a deliberate technique to defeat source-IP-based detection that would have flagged 12 rapid-fire Secrets Manager queries from a single address. The agent had not only planned this distribution; it had implemented it by routing requests through Cloudflare Workers as a per-request egress pool. The retrieved secret was an SSH private key.

The fourth pivot, beginning at 19:30:30 UTC, opened eight parallel SSH sessions to an internal bastion server, again distributed across six distinct Cloudflare Workers IPs over 113 seconds. Once connected to the bastion, the agent issued three sequential PostgreSQL operations: a schema enumeration query against pg_tables, a targeted dump of a table called credential, and finally a HEREDOC-bundled multi-table dump of api_key, credential, user, variable, flow, and message. The full database schema and contents were exfiltrated in under two minutes. From initial WebSocket connection to complete database dump, the total elapsed time was under one hour [8][9].

Evidence of LLM Agency

Sysdig researchers identified four behavioral signatures that distinguish this attack from scripted automation. First, the agent targeted a table named credential — a name that does not exist in any standard LangFlow or common database schema. The agent inferred the likely name of a high-value table from context and queried it successfully, demonstrating the adaptive reasoning that differentiates LLM agents from static playbooks. Second, a Chinese-language comment — “看还能做什么” (“See what else we can do”) — appeared mid-command-stream across commands dispatched from six IPs at sub-second intervals, a language and timing pattern inconsistent with any human operator but entirely consistent with an LLM’s planning trace leaking into its tool output. Third, the consistent machine-readable output formatting described above — delimiter injection, bounded output, pager suppression, HEREDOC batching — represents a coherent set of conventions applied uniformly throughout the session rather than ad hoc operator habits. Fourth, and most diagnostically, the agent consumed the output of each prior command as input to the next: the PGPASSWORD used in the database connection was lifted from the .pgpass content found in pivot two; the SecretId values passed to GetSecretValue were lifted from the ListSecrets response. This output-dependent chaining is the operational signature of an agent that is reading its own prior results and reasoning over them, not executing a pre-scripted sequence [8].

The Broader Threat Landscape

The Marimo/Sysdig case is the most forensically documented instance of LLM-agent-driven post-exploitation, but it exists within a threat landscape that had been developing for at least eighteen months. CrowdStrike’s 2026 Global Threat Report documented an 89% year-over-year increase in AI-enabled adversary operations, noting that average attacker breakout time had fallen to 29 minutes — a 65% improvement in speed from the prior year [4]. Security researchers deploying honeypot infrastructure had already documented what appeared to be AI-driven hacking agents conducting automated probe attempts against exposed systems in late 2024 [3]. Multiple vendor threat intelligence programs tracked LLM-assisted operational activity across state-sponsored and criminal actor groups throughout the preceding quarters, with AI tooling being applied to reconnaissance, exploit development, and lateral movement phases of the attack lifecycle. What remained absent before May 2026 was a forensically complete, single-actor case demonstrating autonomous LLM operation through an entire post-exploitation chain — with forensic evidence at each stage capable of supporting attribution to LLM agency rather than human or script-based operation.

Orca Security has formalized a parallel threat vector they call AI-Induced Lateral Movement (AILM), which describes scenarios where an organization’s own deployed AI agents become the pivot mechanism. The technique involves embedding malicious instructions in metadata fields — EC2 tags, order comments, log entries — that AI agents routinely ingest. Because current LLM architectures often struggle to reliably distinguish data from instructions, injected content can coerce a deployed agent into extracting tool inventories, issuing database queries, triggering terminal commands, or modifying cloud resources entirely outside its intended operational scope [10]. Orca demonstrated this concretely against Prowler’s Lighthouse AI assistant, a security platform, escalating from novelty prompt injection to execution of prowler_hub_list_providers through a series of injections that progressively widened the agent’s actions.

The OWASP Top 10 for LLM Applications v2.0 (2025) designates prompt injection as LLM01 — the highest-priority vulnerability in the taxonomy — and the December 2025 OWASP Top 10 for Agentic Applications introduced Agent Goal Hijack (AA01) as the lead category for agentic-specific risk, reflecting that autonomous action amplifies the consequences of instruction manipulation far beyond what is possible with a passive chatbot [11]. Unit 42’s 2026 Incident Response Report found AI fueling the majority of significant breaches, with the 2025 incident dataset showing a median time to exfiltration of two days and the fastest 25% of intrusions reaching exfiltration in under 1.2 hours [12].

Survey data from CSA’s own research programs confirms the governance gap that enables these incidents. The 2026 CSA State of AI Agents Security Survey found that 47 percent of organizations experienced a security incident involving an AI agent in the past 12 months, only 8 percent report that their agents never exceed intended permissions, and 58 percent of incident detection and response takes five or more hours — a window that, as the Marimo case demonstrates, is more than sufficient for a complete intrusion cycle [13]. The 2026 CSA AI Agent Governance Survey found that 65 percent of organizations had experienced an agent security incident in the past year, with 61 percent reporting data exposure or mishandling as the primary consequence [14].

Why the AI Toolchain Is the Attack Surface

A detail worth emphasizing about CVE-2026-39987 is that the vulnerable application is an AI development tool. Marimo is not a general web application that happens to use AI; it is infrastructure specifically designed to support AI workloads — data science notebooks, interactive Python environments, notebook-as-web-app deployments. The same is true of the MCP (Model Context Protocol) ecosystem, where OX Security identified a systemic architectural flaw in April 2026 affecting all official MCP SDK implementations and estimated 200,000 vulnerable instances and 7,000-plus publicly accessible servers [15]. Check Point Research disclosed critical vulnerabilities in Claude Code itself (CVE-2025-59536 and CVE-2026-21852) in February 2026, where malicious repository-level configuration files could trigger RCE and exfiltrate API tokens when a developer cloned and opened the repository [16]. IDEsaster security research (Ari Marzouk) identified 30-plus vulnerabilities across ten major AI coding assistants, including RCE chains in GitHub Copilot, Cursor, Windsurf, and Roo Code [17].

The pattern is consistent: adversaries are targeting the infrastructure that AI developers use to build and deploy AI systems, because that infrastructure runs with elevated privileges, holds valuable credentials, and is often less rigorously secured than production application code. LLM agents then use that initial foothold to move laterally, harvest secrets, and reach downstream systems. The AI toolchain is among the highest-value targets for adversaries in AI-intensive environments and has received comparatively less sustained security investment than production application code — a gap that the Marimo case and its counterparts across the AI developer tooling ecosystem make concrete.

Recommendations

Immediate Actions

Organizations running Marimo should verify they have patched to version 0.23.0 or later, which closes the unauthenticated WebSocket terminal endpoint, and should treat any Marimo instance that was network-accessible prior to patching as potentially compromised. Credential rotation should encompass AWS access keys, PostgreSQL passwords, SSH private keys, and any other secrets that could plausibly have been stored in environment files or passed through the application environment. AWS CloudTrail logs should be reviewed for secretsmanager:ListSecrets and secretsmanager:GetSecretValue calls, paying particular attention to bursts of API calls distributed across multiple source IPs in short intervals — this is the specific detection signature the Sysdig case provides. Additionally, organizations should review all AI developer tooling (notebooks, coding assistants, MCP servers, orchestration frameworks) against recent CVE disclosures; the attack surface is broad, patch currency is low, and the adversarial focus on this category is intensifying.

Short-Term Mitigations

Network segmentation is among the most direct structural controls against the lateral movement pattern this case demonstrates. AI development environments — notebooks, experimentation servers, model serving infrastructure — should be isolated from production databases, secret stores, and cloud management APIs. If a Marimo instance can reach an AWS Secrets Manager endpoint directly, the blast radius of its compromise is the entire cloud environment. If it cannot, the lateral movement chain breaks. Zero Trust network architecture applied to AI infrastructure means treating each component as untrusted even within the development perimeter, requiring explicit authorization for every cross-system connection.

Credential scoping and dynamic issuance substantially limit what an attacker can do with harvested secrets. AWS IAM roles should be scoped to the minimum necessary permissions for each application component; a Marimo notebook server should not hold credentials capable of calling secretsmanager:GetSecretValue against production secrets. Where static credentials are necessary, AWS Secrets Manager automatic rotation, HashiCorp Vault’s dynamic secrets engine, or equivalent platform capabilities should be used to limit the validity window of any harvested key. SSH key management should be centralized rather than distributed through Secrets Manager in ways that create a single API call path to private key material.

Behavioral detection tuned for agentic activity patterns should be deployed in parallel with traditional SIEM rules. The Marimo case provides a specific behavioral signature: machine-readable output formatting conventions (delimiter injection, bounded output, HEREDOC batching), distributed API calls originating from Cloudflare Workers IP ranges within narrow time windows, and rapid sequential database queries against tables whose names do not appear in any prior session context. Cloud-native detection services (AWS GuardDuty’s anomalous API activity findings, for example) can flag the Secrets Manager abuse pattern, but only if thresholds are calibrated for agentic-speed operation rather than human-interactive rates.

Strategic Considerations

The most important strategic reorientation this case requires is treating AI infrastructure security as a first-class discipline rather than a subset of application security. AI development tooling, orchestration frameworks, model serving infrastructure, and MCP server deployments collectively represent an attack surface that did not exist in meaningful form three years ago and is now actively being exploited. Security architecture reviews, threat modeling exercises, and penetration testing programs should explicitly scope to include this layer.

Threat modeling for AI deployments should incorporate MITRE ATLAS techniques that specifically address LLM agent abuse, including the agent context poisoning, memory manipulation, and tool poisoning techniques introduced through Zenity Labs’ contribution to ATLAS in October 2025 and formalized in the v5.4.0 release (February 2026) [18]. The Marimo attack chain maps to multiple ATLAS techniques and provides a validated real-world scenario for evaluating detection and response capabilities. Red team exercises that use this case as a template — initial access via exposed AI tooling, credential harvesting, AWS API abuse, lateral movement via SSH, database exfiltration — will surface gaps that generic penetration tests are unlikely to identify.

At the governance level, the “access times autonomy” risk model that CSA’s AI agent governance research formalizes applies directly to post-exploitation scenarios: an agent’s risk is not just a function of its permissions but of the degree to which it can act without human review [14]. Organizations deploying AI agents in any capacity should establish explicit human-in-the-loop requirements for high-impact action categories — secret retrieval, network connection initiation, database modification — and implement runtime authorization checkpoints rather than relying on behavioral guardrails built into the model itself. The Sysdig case demonstrates that adversarial agents do not respect model-level constraints; only architectural controls that operate outside the agent’s execution context can be relied upon.

CSA Resource Alignment

The CSA MAESTRO framework (Multi-Agent Environment, Security, Threat, Risk, and Outcome), developed by Ken Huang and published by CSA in February 2025, provides the most directly applicable threat modeling structure for analyzing the Marimo attack chain. MAESTRO’s seven-layer architecture — Foundation Models, Data Operations, Agent Frameworks, Deployment and Infrastructure, Evaluation and Observability, Security and Compliance, and Agent Ecosystem — maps the Marimo intrusion to Layer 4 (Deployment Infrastructure) as the initial access vector and Layers 3 and 7 (Agent Frameworks and Agent Ecosystem) as the subsequent post-exploitation surface. MAESTRO explicitly addresses cross-layer attack paths where a compromise at one layer propagates through adjacent layers, which reflects the structural pattern of the four-pivot chain Sysdig documented [19].

CSA’s Agentic AI Red Teaming Guide (2025) defines twelve threat categories for agentic AI systems, among them Critical System Interaction, Impact Chain and Blast Radius, and Agent Untraceability — all three of which are directly implicated by the Marimo case. The Guide provides attack scenario templates and benchmark methodologies (including AgentDojo’s 629 test cases) that security teams can use to evaluate whether their AI infrastructure would resist the specific techniques this intrusion employed [20].

The CSA AI Controls Matrix (AICM) extends the Cloud Controls Matrix with AI-specific control families covering agent identity and access management, supply chain integrity for AI tooling, and runtime behavioral monitoring. AICM is CSA’s recommended framework for organizations with significant AI infrastructure exposure, providing AI-specific control families that complement and extend CCM-alone approaches. Organizations using AICM as their governance baseline should prioritize controls in the Deployment Infrastructure and Agent Ecosystem domains identified by MAESTRO as the critical control gaps this case exposes.

CSA’s Zero Trust for LLM Environments guidance (2026), developed under the ZT6 and ZT7 pillars, provides implementation architecture for the network segmentation, credential partitioning, and micro-segmentation controls that would have limited the Marimo attack chain’s lateral movement potential. The guidance explicitly addresses MCP and agentic AI deployments as emerging attack surfaces requiring Zero Trust controls beyond those applied to conventional cloud workloads [21].

The 2026 CSA Agentic Identity Survey found that only 18 percent of organizations are highly confident their IAM infrastructure can manage agent identities, and only 17 percent enforce runtime access control consistently across all environments [22]. These statistics frame the governance gap that adversaries are exploiting: agents are being deployed with permissions inherited from human user accounts or broad service roles, without the scoped, ephemeral credential patterns that would constrain what an attacker can do with a harvested credential. Closing this gap is not a technology problem — the tooling for dynamic credential issuance, scoped IAM roles, and runtime authorization exists — it is an adoption and prioritization problem that CSA’s agentic governance research directly addresses.

References

[1] Fang, R., et al. “LLM Agents can Autonomously Hack Websites.” arXiv:2402.06664, February 2024.

[2] Fang, R., et al. “LLM Agents can Autonomously Exploit One-day Vulnerabilities.” arXiv:2404.08144, April 2024.

[3] Piet, J., et al. “LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild.” arXiv:2410.13919, October 2024.

[4] CrowdStrike. “2026 Global Threat Report.” CrowdStrike, 2026.

[5] Sysdig Threat Research Team. “Marimo OSS Python Notebook RCE: From Disclosure to Exploitation in Under 10 Hours.” Sysdig, 2026.

[6] Cloud Security Alliance Labs. “Marimo Pre-Auth RCE: AI Development Toolchain Under Attack — CVE-2026-39987.” CSA Labs, 2026.

[7] CSO Online. “Critical Flaw in Marimo Python Notebook Exploited Within 10 Hours of Disclosure.” CSO Online, 2026.

[8] Sysdig Threat Research Team. “AI Agent at the Wheel: How an Attacker Used LLMs to Move from a CVE to an Internal Database in 4 Pivots.” Sysdig, May 2026.

[9] The Hacker News. “Attackers Use LLM Agent for Post-Exploitation After Marimo CVE-2026-39987 Exploit.” The Hacker News, May 2026.

[10] Orca Security. “AI-Induced Lateral Movement (AILM).” Orca Security, 2026.

[11] OWASP GenAI Security Project. “OWASP Top 10 for LLM Applications v2.0 and Agentic Applications.” OWASP, 2025–2026.

[12] Palo Alto Networks. “Unit 42 2026 Global Incident Response Report.” Palo Alto Networks, 2026.

[13] Cloud Security Alliance. “Enterprise AI Security Starts with AI Agents: State of AI Agents Security Survey.” CSA AI Safety Initiative, 2026.

[14] Cloud Security Alliance. “Autonomous But Not Controlled: AI Agent Governance Survey.” CSA AI Safety Initiative, 2026.

[15] OX Security. “The Mother of All AI Supply Chains: Critical Systemic Vulnerability at the Core of the MCP.” OX Security, April 2026.

[16] Check Point Research. “RCE and API Token Exfiltration Through Claude Code Project Files — CVE-2025-59536.” Check Point, February 2026.

[17] The Hacker News. “Researchers Uncover 30 Flaws in AI Coding Tools.” The Hacker News, December 2025.

[18] Vectra AI. “MITRE ATLAS: 16 Tactics, 84 Techniques for Adversarial AI.” Vectra AI, 2026.

[19] Cloud Security Alliance. “MAESTRO: Agentic AI Threat Modeling Framework.” CSA AI Safety Initiative, February 2025.

[20] Cloud Security Alliance. “Agentic AI Red Teaming Guide.” CSA AI Organizational Responsibilities Working Group, 2025.

[21] Cloud Security Alliance. “Using Zero Trust to Secure Enterprise Information in LLM Environments.” CSA Zero Trust Working Group, 2026.

[22] Cloud Security Alliance. “Securing Autonomous AI Agents: 2025 Agentic Identity Survey.” CSA AI Safety Initiative, 2026.

← Back to Research Index