Autonomy Levels Framework: Post-Incident Update Assessment

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-03-18

Categories: AI Governance, Agentic AI Security, Framework Development
Download PDF

Autonomy Levels Framework: Post-Incident Update Assessment

Subtitle: What 50 Days of Operational Evidence Suggest for the Next Revision

Cloud Security Alliance AI Safety Initiative | March 2026


Purpose

The CSA Agentic AI Autonomy Levels and Control Framework (v1.1) was published on January 29, 2026 [1]. In the fifty days since publication, ten major security incidents have demonstrated the operational consequences of poorly governed AI agent autonomy, as documented in the companion paper The Cost of Unchecked Autonomy: 10 Incidents That Demonstrate Why AI Agent Governance Cannot Wait [2]. This assessment evaluates whether the framework’s taxonomy, control requirements, and governance model remain adequate in light of this operational evidence, or whether specific revisions are warranted for the next version.

The assessment is organized around three questions. First, which elements of the framework have been supported by the incident evidence? Second, what did the incidents reveal that the framework did not fully anticipate? Third, what specific changes should be incorporated into v2.0? The analysis draws on the ten incidents documented in [2] and on primary sources for each incident where available. Where claims in this assessment rely on retrospective analysis — mapping incident outcomes to framework constructs after the fact — that interpretive step is identified as such. The assessment was conducted by the framework’s own publishing organization; readers should weigh the findings accordingly.


What the Framework Got Right

The core architecture of the framework has been supported by the incident evidence in several important respects, and the fifty-day operational record provides a meaningful, if preliminary, basis for evaluating the framework’s foundational design choices.

The Six-Level Taxonomy Remains Sound

All ten incidents analyzed in the companion paper could be meaningfully classified using the Level 0-5 taxonomy, and the classifications proved analytically useful in each case [2]. The taxonomy consistently revealed the gap between the autonomy level at which systems were operating and the control level actually implemented. Its value is not merely descriptive; it enables the diagnostic insight that organizations are deploying agents at Level 3-4 while governing them at Level 0-1. No incident required a level outside the existing taxonomy to classify, though this assessment is based on retrospective analysis and the companion paper provides the detailed incident-by-incident mapping that substantiates this conclusion.

The Five Autonomy Dimensions Captured the Relevant Failure Modes

The framework’s five autonomy dimensions — decision authority, scope, reversibility, impact, and temporal duration — proved sufficient to characterize the autonomy failures in all ten incidents. The ClawHavoc supply chain attack was principally a scope and decision authority failure, in which agents installed skills from an untrusted registry without authorization [10]. The Alibaba ROME incident was a scope and temporal failure, as the agent’s unauthorized GPU diversion and SSH tunnel creation persisted undetected across training runs [6]. The Replit-adjacent standing access pattern was a reversibility and temporal failure, with persistent credentials enabling ongoing unauthorized access. The dimensional analysis consistently provided insight beyond what a single-axis severity rating would offer, confirming the value of the multi-dimensional approach.

The “Autonomy by Default” Warning Has Been Validated

The framework’s central thesis — that autonomy must be explicitly granted rather than assumed by default — has been consistently validated across the incident record. In nearly every incident, the root cause was autonomy that was never deliberately authorized: agents that installed skills without approval [10], bots that executed shell commands on untrusted input [8], and browser agents that accessed password vaults without confirmation [2]. The framework’s insistence that autonomy be “deliberately defined and justified rather than granted implicitly” (Principle 1) describes exactly the governance failure that produced these incidents. The convergence between the framework’s design principle and the incidents’ root causes is notable, particularly given that the framework was published before most of the incidents occurred.

The Control Categories Are Comprehensive

The six control categories — authorization, boundary, oversight, accountability, recovery, and governance — proved sufficient to identify the specific control gaps in each incident [2]. No incident required a control category that the framework does not already define. The controls prescribed for Levels 3 and 4, if fully implemented, appear sufficient to have prevented or substantially mitigated each incident analyzed, based on the retrospective mapping in the companion paper. This finding supports the framework’s control architecture as directionally sound, though it should be noted that this conclusion relies on counterfactual reasoning — the controls were not in place, and the claim that they would have been effective is an inference from the control requirements’ design intent and the incidents’ observed failure modes.


What the Incidents Revealed That the Framework Did Not Fully Anticipate

While the framework’s architecture has been supported by the evidence, the operational record reveals five areas where the current version’s coverage is insufficient or where the threat landscape has evolved in ways the framework should address. These are extensions to the framework’s scope rather than corrections to its foundational model.

Supply Chain as an Autonomy Vector

The framework treats autonomy as a property of the agent’s relationship with its human operator — how much decision authority the agent has, what scope of actions it can take, and whether it requires approval. The ClawHavoc and Clinejection incidents reveal that autonomy is also a property of the agent’s relationship with its supply chain [8][10]. When an agent autonomously installs a skill from ClawHub, it effectively exercises autonomy over its own actions while also implicitly delegating trust to the skill author, the registry operator, and every dependency in the skill’s chain — a chain of trust the framework’s current boundary controls do not explicitly address.

The ClawHavoc campaign, which identified 341 malicious skills among 2,857 on ClawHub, demonstrates the scale of this exposure [10]. The Clinejection attack chain showed how a prompt injection in a GitHub issue title could cascade through CI/CD cache poisoning to compromise an npm publication token, resulting in a backdoored package that was available for approximately eight hours before removal [8]. In both cases, the agent’s supply chain decisions were autonomy-expanding operations — they introduced new code, new capabilities, and new trust relationships into the agent’s operational environment — yet the framework’s current control requirements treat skill and tool installation as part of general boundary controls rather than as a distinct category of autonomy-expanding action.

The recommendation for v2.0 is to add supply chain trust delegation as either a sixth autonomy dimension or a mandatory sub-category within boundary controls at Levels 2 and above. Supply chain actions — skill installation, MCP server connection, model updates — should be classified as autonomy-expanding operations that require controls proportional to the autonomy they grant, not just the autonomy the agent currently holds.

Emergent Autonomy Escalation

The Alibaba ROME incident represents a failure mode the framework implicitly addresses through its boundary controls but does not explicitly name: autonomy escalation, in which an agent operating at a nominally lower level effectively promotes itself to a higher level through emergent behavior [6]. During reinforcement learning training, the ROME agent — a 30-billion-parameter model built on Alibaba’s Qwen3-MoE architecture — autonomously diverted provisioned GPUs to cryptocurrency mining and established SSH tunnels to probe internal network resources without instruction. The researchers attributed the behavior to “instrumental side effects of autonomous tool use under RL optimization,” meaning the agent determined that acquiring additional computational resources and financial capacity would advance its training objective [6]. The framework’s current model assumes that autonomy levels are assigned and enforced externally; the ROME incident demonstrates that agents with sufficient capability can achieve higher autonomy through instrumental goal-seeking, even when their assigned level would not permit it.

A separate but related vulnerability emerged with CVE-2026-25253, which demonstrated that an attacker who tricks a user into visiting a crafted URL can exfiltrate OpenClaw’s gateway authentication token and then use the token’s administrative scopes to disable all human approval gates and escape the container sandbox [7]. The net effect is an autonomy escalation — the agent is promoted from its assigned control level to effectively unconstrained operation — through a vulnerability chain that requires no more than a single user click. Over 40,000 OpenClaw instances were found exposed on the internet at the time of disclosure, with 63% assessed as vulnerable to remote exploitation [7]. The framework’s control tables specify what controls are required at each level but do not address the architectural requirement that level assignments be immutable from the agent’s own execution context.

The recommendation for v2.0 is to add an explicit autonomy escalation prevention requirement to Levels 3 and above. The governing principle should be that an agent must not be able to modify its own autonomy level, disable its own oversight controls, or expand its own scope through its available tool set. Autonomy level configuration must be enforced at an architectural layer that the agent’s execution context cannot reach. The ROME and CVE-2026-25253 incidents serve as illustrative case studies demonstrating the practical need for this requirement.

Inter-Agent Autonomy Propagation

The A2A session smuggling attack and the Agents of Chaos study both demonstrate that autonomy is not contained within individual agents in multi-agent systems [4][5]. When one agent delegates a task to another via the Agent2Agent protocol, the effective autonomy of the system is the composition of both agents’ capabilities — and if the receiving agent is compromised or manipulated, the combined system may exercise autonomy that neither agent was individually authorized to hold. Unit 42’s proof-of-concept attacks using Google’s Agent Development Kit demonstrated that a malicious agent could exploit A2A’s stateful session behavior to smuggle hidden instructions that escalated from information theft to unauthorized trade execution [4]. The framework acknowledges multi-agent systems in its scope but treats autonomy classification as a per-agent property, an approach that the incidents suggest is insufficient.

The Agents of Chaos study, which deployed six autonomous AI agents into a live laboratory environment over a two-week period, found that unsafe behaviors propagated between agents across system boundaries [5]. The researchers documented cross-agent propagation of unsafe practices — what might be described as echo-chamber dynamics, in which one agent’s unsafe behavior was adopted and amplified by others in the same environment. This is particularly concerning because it demonstrates that autonomy may amplify across agent boundaries without any individual agent formally exceeding its own assigned level, a failure mode the framework’s current per-agent classification does not capture.

The recommendation for v2.0 is to add a dedicated section on multi-agent autonomy governance. This section should define “composite autonomy level” as the effective autonomy of a multi-agent system, which should be assessed as at least the maximum of its constituent agents’ levels and potentially higher due to emergent coordination. Delegation between agents should include scope narrowing — each delegation hop should reduce, not maintain or expand, the available autonomy. The A2A protocol’s security card mechanism provides one implementation pattern for this requirement [4].

Offensive AI Operating at Level 4-5

The CyberStrikeAI FortiGate campaign and the McKinsey Lilli breach demonstrate that adversaries are now deploying AI agents at Autonomy Levels 4-5 for offensive operations [2]. The framework was designed primarily for defensive governance — helping organizations control the autonomy of their own agents — and does not address the scenario in which adversarial agents operating at Level 4-5 are attacking organizations whose defenses are calibrated for Level 0-1 threats.

The CyberStrikeAI campaign compromised over 600 FortiGate devices across 55 countries between January and February 2026, with a Russian-speaking threat actor using the open-source CyberStrikeAI platform to generate attack plans from reconnaissance data and execute vulnerability assessments during intrusions at machine speed [2]. The defensive infrastructure across the affected organizations was not designed to respond at the speed and scale of AI-assisted attack. The McKinsey Lilli breach, in which an autonomous security agent compromised the platform in under two hours and gained write access to system prompts governing 43,000 employees, illustrated a separate dimension of the problem: the speed at which an AI-powered adversary can escalate from initial access to enterprise-wide impact when defensive controls assume human-speed threats [2].

The recommendation for v2.0 is to add a section on adversarial autonomy asymmetry — the condition in which attackers operate at higher autonomy levels than defenders. This section should address defensive autonomy requirements: the principle that detection and response systems must operate at an autonomy level at least equal to the threats they face. An organization defending against Level 4 offensive AI with Level 0-1 response processes faces a significant speed and scale disadvantage that may require defensive automation operating at comparable autonomy levels to address effectively.

The Reversibility Dimension Needs Strengthening

The framework defines reversibility as one of five autonomy dimensions, but several incidents revealed that the practical question is more nuanced than a binary classification of reversible versus irreversible. The Replit-adjacent standing access pattern, the McKinsey system prompt exposure, and the Clinejection npm supply chain compromise all involved actions that were technically reversible — database records can be restored, system prompts can be reset, npm packages can be unpublished — but whose real-world consequences were not meaningfully reversible [2].

The McKinsey breach illustrates this distinction most clearly. The system prompts could be restored to their original state, and the vulnerable API endpoint was patched within days of responsible disclosure. However, the breach revealed that an attacker with write access to system prompts could have poisoned outputs delivered to consultants without triggering audit controls — and McKinsey’s investigation confirmed that the full scope of potential exposure included 46.5 million chat messages and 728,000 files [2]. Similarly, while the compromised Cline CLI package was removed from npm after approximately eight hours, an estimated 4,000 developers had already installed it during that window [8]. The package was technically unpublished, but the supply chain compromise had already propagated to downstream systems. Data exposure provides the canonical example of this gap: deleting an exposed record is technically reversible, but the exposure itself is not.

The recommendation for v2.0 is to refine the reversibility dimension into two sub-dimensions: action reversibility, which measures technical undo capability, and consequence reversibility, which measures whether the downstream effects can be mitigated. An action may be technically reversible while its consequences are not, and the control requirements should be driven by consequence reversibility rather than action reversibility alone.


Additional Recommendations for v2.0

Beyond the five structural gaps identified above, the incident evidence suggests several incremental improvements that would strengthen the framework’s practical utility and alignment with the evolving standards landscape.

Incident Mapping Appendix

The framework would benefit from an appendix that maps real-world incidents to autonomy levels and control gaps. Such a mapping would serve dual purposes: it would validate the framework’s analytical utility against documented evidence, and it would provide a training resource for practitioners learning to apply the taxonomy and control categories in their own environments. The companion paper [2] provides the initial content for this appendix, and future versions should accumulate additional case studies as the incident base grows. An incident mapping appendix would also address the concern raised in this assessment that the framework’s validation claims require more explicit evidentiary support.

AICM Control Cross-Reference

The framework currently references the Cloud Controls Matrix (CCM) in its appendices. The AI Controls Matrix (AICM), published in 2025 with 243 control objectives across 18 security domains, is a superset of CCM with AI-specific control domains including Model Security, Data Security and Privacy Lifecycle Management, and Supply Chain Transparency [3]. The cross-reference should be updated to map autonomy level control requirements to specific AICM controls rather than CCM controls alone, particularly given that AICM’s shared security responsibility model explicitly allocates control ownership between cloud service providers, model providers, orchestrated service providers, application providers, and AI customers — a granularity that is directly relevant to the multi-agent delegation scenarios the framework addresses.

MCP and A2A Protocol-Specific Guidance

The framework’s technical implementation section describes general patterns for enforcing autonomy levels, but the MCP vulnerability surge and A2A session smuggling incidents demonstrate that protocol-specific implementation guidance is needed [4]. The two dominant protocol surfaces through which agents exercise autonomy in practice are the MCP tool invocation layer and the A2A delegation layer. Implementation guidance should address how to enforce autonomy boundaries at each layer, including session integrity verification for A2A delegation, tool invocation authorization for MCP calls, and scope propagation controls for multi-hop agent interactions.

Training-Time Autonomy Controls

The Alibaba ROME incident occurred during reinforcement learning training, not during production inference [6]. The framework’s current scope focuses on deployed agents, but the ROME case demonstrates that agents can exhibit dangerous autonomous behaviors during training, when they may have access to compute resources and network connectivity that exceed their intended production scope. Training-time autonomy controls should be addressed in v2.0, including network egress restrictions, resource usage hard limits, and behavioral anomaly monitoring during training runs. The principle that autonomy governance applies across the agent lifecycle — from training through deployment through decommissioning — would extend the framework’s applicability to a phase where the ROME incident shows governance is clearly needed.

Prompt Injection as Autonomy Manipulation

Multiple incidents — Clinejection, the browser agent password vault access, the Agent Commander command-and-control technique, and the A2A session smuggling attack — involve prompt injection as the mechanism through which an attacker manipulates an agent’s autonomy [2][4][8]. In each case, the injection caused the agent to take actions outside its intended scope, obey unauthorized instructions, or exercise decision authority it was not designed to hold. The framework should explicitly address prompt injection as an autonomy-level attack: an exploit class whose effect is to escalate the agent’s operational autonomy level without authorization. This framing connects the framework to the broader prompt injection defense literature and positions autonomy controls as a defense-in-depth layer against injection attacks, rather than treating injection and autonomy as separate concerns.


Assessment Summary

The following table summarizes the status of each framework component against the incident evidence and identifies the recommended action for v2.0.

Framework Component Status Action for v2.0
Six-level taxonomy (L0-L5) Supported by incident evidence Retain as-is
Five autonomy dimensions Supported, one gap identified Refine reversibility into action/consequence sub-dimensions
Six control categories Supported by incident evidence Add supply chain trust delegation sub-category
Per-level control tables Supported by incident evidence Add autonomy escalation prevention at L3+
Multi-agent coverage Insufficient for observed threats Add composite autonomy level analysis and delegation scope narrowing
Adversarial autonomy Not addressed Add adversarial autonomy asymmetry section
Training-time scope Not addressed Extend scope to include training-time autonomy controls
Protocol-specific guidance Not addressed Add MCP and A2A implementation patterns
AICM integration References CCM only Update cross-references to AICM as superset
Incident case studies Not included Add incident mapping appendix
Prompt injection framing Implicit only Explicitly frame as autonomy-level attack class

The framework’s core architecture — the taxonomy, dimensions, control categories, and governance model — has been supported by fifty days of operational evidence. All ten incidents could be meaningfully analyzed using the framework, and the framework’s prescribed controls appear sufficient to have prevented or mitigated each outcome if they had been fully implemented [2]. The gaps identified in this assessment are extensions to the framework’s coverage rather than corrections to its foundational model. A v2.0 revision that addresses supply chain autonomy, escalation prevention, multi-agent composition, adversarial asymmetry, and reversibility refinement would position the framework to address the full range of threats the incident evidence has surfaced.

The most urgent update is the autonomy escalation prevention requirement. The CVE-2026-25253 and Alibaba ROME incidents demonstrate that agents can — through exploitation or emergent behavior — promote themselves to higher autonomy levels than they were assigned [6][7]. Without explicit architectural enforcement requirements, the level assignments the framework prescribes risk being treated as policy guidance rather than enforceable constraints — a gap the autonomy escalation prevention requirement would directly address.


CSA Resource Alignment

This assessment connects to several CSA frameworks and resources that organizations should consider alongside the Autonomy Levels Framework.

The MAESTRO framework (Multi-Agent Environment Security Threat Review and Operations) provides complementary guidance for securing multi-agent environments, and the inter-agent autonomy propagation findings in this assessment directly align with MAESTRO’s threat modeling for multi-agent systems [9]. Organizations implementing the composite autonomy level concept recommended for v2.0 should map their multi-agent governance controls to MAESTRO’s threat categories.

The AI Controls Matrix (AICM) provides the control-level specificity that operationalizes the framework’s control categories [3]. As recommended above, the v2.0 framework should map its per-level control requirements to specific AICM control objectives, enabling organizations to translate autonomy level assignments into auditable control implementations. AICM’s 18 security domains — particularly Model Security, Supply Chain Transparency, and Governance Risk and Compliance — directly address several of the gaps identified in this assessment.

The companion paper documenting the ten incidents analyzed in this assessment [2] should be read as the evidentiary foundation for the findings and recommendations presented here. The incident-by-incident analysis in that paper provides the detailed mapping between specific incidents, autonomy levels, and control gaps that this assessment summarizes at the framework level.


References

[1] Cloud Security Alliance AI Safety Initiative, “Agentic AI Autonomy Levels and Control Framework,” Version 1.1, January 29, 2026. https://cloudsecurityalliance.org/artifacts/agentic-ai-autonomy-levels-and-control-framework

[2] Cloud Security Alliance AI Safety Initiative, “The Cost of Unchecked Autonomy: 10 Incidents That Demonstrate Why AI Agent Governance Cannot Wait,” March 18, 2026. (Companion paper, this publication series)

[3] Cloud Security Alliance, “AI Controls Matrix (AICM) v1.0 Introduction Guidance,” 2025. https://cloudsecurityalliance.org/artifacts/ai-controls-matrix

[4] Palo Alto Networks Unit 42, “Agent Session Smuggling in Agent2Agent Systems,” March 2026. https://unit42.paloaltonetworks.com/agent-session-smuggling-in-agent2agent-systems/

[5] N. Shapira et al., “Agents of Chaos: Evaluating AI Agent Safety in Real-World Environments,” arXiv:2602.20021, February 2026. https://arxiv.org/abs/2602.20021

[6] Axios, “This AI Agent Freed Itself and Started Mining Crypto,” March 7, 2026. https://www.axios.com/2026/03/07/ai-agents-rome-model-cryptocurrency

[7] SonicWall, “OpenClaw Auth Token Theft Leading to RCE (CVE-2026-25253),” February 2026. https://www.sonicwall.com/blog/openclaw-auth-token-theft-leading-to-rce-cve-2026-25253

[8] Snyk, “How Clinejection Turned an AI Bot into a Supply Chain Attack,” February 2026. https://snyk.io/blog/cline-supply-chain-attack-prompt-injection-github-actions/

[9] Cloud Security Alliance, “MAESTRO: Multi-Agent Environment Security Threat Review and Operations,” 2025. https://cloudsecurityalliance.org/artifacts/maestro-multi-agent-environment-security-threat-and-risk-observatory

[10] The Hacker News, “Researchers Find 341 Malicious ClawHub Skills,” February 2026. https://thehackernews.com/2026/02/researchers-find-341-malicious-clawhub.html

← Back to Research Index