Published: 2026-03-18
Categories: AI Security, AI Governance, Agentic AI
Agentic AI Autonomy Levels and Control Framework
Cloud Security Alliance
AI Safety Initiative
Version 2.0 | March 2026
Executive Summary
The emergence of agentic AI systems — AI capable of autonomous decision-making and action execution — presents fundamental questions about how much autonomy should be granted under various circumstances. Unlike traditional AI systems that provide information or recommendations, agentic AI takes actions with real-world consequences, from executing code to making financial transactions to controlling physical systems. The distinction between providing advice and taking action represents a qualitative shift in risk that demands a correspondingly rigorous approach to governance.
This white paper establishes a comprehensive framework for defining, implementing, and governing AI autonomy levels. The framework provides organizations with a structured approach to matching agent capabilities with appropriate control mechanisms, ensuring that autonomy is granted deliberately rather than by default. Version 2.0 incorporates lessons from fifty days of operational evidence — ten security incidents documented between the framework’s initial publication on January 29, 2026 and this revision date — that demonstrated the framework’s analytical utility while revealing areas requiring extension [1][2].
The framework comprises five interconnected components that together enable effective autonomy governance. The Autonomy Level Taxonomy provides a six-level classification ranging from fully supervised to fully autonomous operation. The Capability-Control Matrix maps agent capabilities to required controls, ensuring appropriate safeguards for each type of action. The Governance Model defines organizational structures and processes for making and reviewing autonomy decisions. Implementation Patterns provide technical architectures for enforcing each autonomy level. The Escalation Framework enables dynamic adjustment of autonomy based on context and risk.
Five key principles underpin the framework’s approach to autonomy governance. First, autonomy should be explicitly defined and justified rather than granted implicitly or by default. Second, higher autonomy requires correspondingly stronger controls to manage the increased risk. Third, human oversight should be proportional to the risk posed by autonomous actions. Fourth, autonomy boundaries must be technically enforced rather than relying solely on policy. Fifth, continuous monitoring is essential at all autonomy levels to detect anomalies and ensure appropriate operation.
The operational evidence from January through March 2026 provided strong support for each of these principles. In the incidents analyzed, AI agents were consistently operating at autonomy levels for which the required controls were absent. Organizations deployed agents at Autonomy Levels 3 through 5 while implementing controls that would be insufficient even for Level 2, creating a governance deficit that adversaries and emergent behaviors actively exploited [2]. Version 2.0 addresses five structural gaps the incidents revealed: supply chain as an autonomy vector, emergent autonomy escalation, inter-agent autonomy propagation, offensive AI operating at Level 4-5, and the distinction between action reversibility and consequence reversibility [1].
Table of Contents
- Introduction
- Understanding AI Autonomy
- Autonomy Level Taxonomy
- Control Requirements by Level
- Capability-Control Matrix
- Supply Chain Trust Delegation (New in v2.0)
- Autonomy Escalation Prevention (New in v2.0)
- Multi-Agent Autonomy Governance (New in v2.0)
- Adversarial Autonomy Asymmetry (New in v2.0)
- Prompt Injection as Autonomy-Level Attack (New in v2.0)
- Training-Time Autonomy Controls (New in v2.0)
- Governance Framework
- Technical Implementation
- Escalation and Dynamic Autonomy
- Assessment and Certification
- Conclusions and Recommendations
- References
- Appendix A: Autonomy Decision Tree
- Appendix B: Glossary
- Appendix C: Integration with CBRA (Capabilities-Based Risk Assessment)
- Appendix D: Integration with AI Controls Matrix (AICM)
- Appendix E: Integration with MAESTRO Threat Modeling Framework
- Appendix F: Incident Case Studies Mapping (New in v2.0)
1. Introduction
1.1 The Autonomy Challenge
Agentic AI systems introduce a fundamental tension between competing considerations that organizations must navigate thoughtfully. Greater autonomy enables AI to complete complex tasks efficiently, delivering productivity benefits that drive adoption. However, greater autonomy also creates potential for unintended or harmful actions, increasing risk exposure. Extensive oversight reduces efficiency gains from automation, creating control costs that diminish return on investment. Unclear responsibility when autonomous systems act raises liability questions that organizations must address proactively. Balancing these competing forces requires a structured approach that enables organizations to make deliberate, risk-informed decisions about how much autonomy to grant in specific contexts.
1.2 Current State
Organizations largely lack structured approaches to autonomy decisions, leading to inconsistent and often inadequate controls. The Cloud Security Alliance’s 2025 State of AI Security and Governance study reveals significant gaps in how organizations manage AI governance overall, with direct implications for autonomy management specifically [3].
| Finding | Percentage |
|---|---|
| Organizations with comprehensive AI governance programs | 26% |
| Organizations with developing/partial AI governance | 64% |
| Organizations identifying oversight as a top AI security concern | 55% |
| Organizations citing lack of expertise as a barrier | 51% |
Source: Cloud Security Alliance and Google Cloud. (2025). State of AI Security and Governance [3].
These statistics reveal that as of 2025, most organizations were still developing their AI governance capabilities, with only approximately one-quarter having comprehensive programs in place. The majority identified oversight and expertise gaps as significant barriers, suggesting that autonomy decisions are frequently made without formal frameworks or specialized guidance. This creates risk that organizations may not fully understand until incidents occur, a pattern that the operational evidence in this revision amply documents.
1.3 Operational Evidence: Why v2.0
The fifty days between January 29 and March 18, 2026 produced an unprecedented concentration of security incidents rooted in excessive, ungoverned AI agent autonomy [2]. Malicious skills poisoned agent registries at scale, prompt injection turned developer tooling into supply chain weapons, autonomous agents diverted infrastructure for unauthorized purposes, and inter-agent communication protocols proved opaque to human oversight. The companion paper The Cost of Unchecked Autonomy: 10 Incidents That Demonstrate Why AI Agent Governance Cannot Wait documents these incidents in detail and maps each to this framework [2]. The post-incident assessment [1] evaluated which framework components were analytically applicable, which gaps the incidents exposed, and what revisions are warranted.
The foundational architecture — the six-level taxonomy, the five autonomy dimensions, the six control categories, and the governance model — proved analytically applicable to every incident examined. Each incident could be meaningfully classified and analyzed using the existing taxonomy, and in every case, the framework’s prescribed controls, had they been implemented, would likely have prevented or mitigated the outcome [1][2]. The gaps identified are extensions to the framework’s coverage, not corrections to its foundational model. Version 2.0 addresses supply chain trust delegation, autonomy escalation prevention, multi-agent autonomy governance, adversarial autonomy asymmetry, reversibility refinement, training-time controls, protocol-specific implementation guidance, and prompt injection framing as an autonomy-level attack class.
1.4 Purpose of This Framework
This framework addresses the autonomy governance gap by providing organizations with the tools and processes needed for deliberate, consistent autonomy management. It establishes a common vocabulary for discussing AI autonomy across technical and business stakeholders and defines a classification system for autonomy levels that enables consistent categorization. The framework specifies control requirements appropriate to each level of autonomy, outlines a governance model for making and reviewing autonomy decisions, and provides implementation guidance for technical enforcement of autonomy boundaries. Together, these components enable organizations to move from ad hoc autonomy decisions to systematic governance.
1.5 Scope and Applicability
The framework applies to AI systems that can take actions in the world, not merely provide information. Within scope are AI agents with action execution capabilities, autonomous decision-making systems, multi-agent orchestration systems, and human-AI collaborative systems where AI may take independent action. As of v2.0, the scope also extends to AI systems during training when they have access to execution environments with real-world connectivity, reflecting the Alibaba ROME incident’s demonstration that dangerous autonomous behaviors can emerge during the training phase [1].
The framework does not apply to traditional ML models that perform inference only without action execution, rule-based automation systems without AI decision-making components, or robotic process automation that follows predetermined scripts without AI-driven decisions. These systems, while potentially carrying their own risks, do not present the autonomy governance challenges that this framework addresses.
2. Understanding AI Autonomy
2.1 Defining Autonomy
AI Autonomy refers to the degree to which an AI system can make decisions and execute actions without human approval, oversight, or intervention. Understanding autonomy requires considering multiple dimensions that together characterize how independently an AI system operates. Each dimension contributes to the overall autonomy profile, and a system might have high autonomy on one dimension while being constrained on another. For example, a financial trading agent might have broad scope but low decision authority, requiring human approval for each trade despite being capable of analyzing thousands of instruments. Understanding these dimensions enables more nuanced autonomy governance than a single “autonomous or not” classification would permit.
| Dimension | Description | Range |
|---|---|---|
| Decision Authority | Who approves actions | Human to AI |
| Scope | Breadth of possible actions | Narrow to Broad |
| Reversibility | Ability to undo actions and their consequences | Reversible to Irreversible |
| Impact | Consequence magnitude | Minimal to Significant |
| Temporal | Duration of autonomous operation | Short to Extended |
2.1.1 Reversibility Refinement (v2.0)
Operational evidence from the incident analysis period revealed that the reversibility dimension requires refinement into two sub-dimensions to capture the full range of governance-relevant distinctions [1]. The original framework treated reversibility as a single axis from reversible to irreversible, but multiple incidents demonstrated that the practical question is more nuanced than this binary suggests.
Action reversibility describes whether the technical action can be undone. A database record deletion can be restored from backup, a configuration change can be reverted, and an npm package publication can be deprecated. These are technically reversible actions in the sense that the system state can be returned to its prior condition. Consequence reversibility, by contrast, describes whether the downstream effects of the action can be meaningfully mitigated once they have occurred. An exposed credential cannot be unexposed — even if the record containing it is deleted, the credential was observed by the attacker. A malicious npm package that was installed by approximately 4,000 developers during an eight-hour window cannot have its effects recalled merely by deprecating the package [2]. A system prompt modification that poisoned AI outputs consumed by thousands of consultants cannot have its analytical impact reversed by restoring the original prompt [2].
| Sub-dimension | Description | Example |
|---|---|---|
| Action Reversibility | Can the technical action be undone? | Database record restored from backup |
| Consequence Reversibility | Can the downstream harm be undone? | Exposed data cannot be un-observed |
The control requirements in this framework are driven by consequence reversibility, not action reversibility alone. An action that is technically reversible but whose consequences are irreversible — data exposure, credential theft, supply chain poisoning consumed by downstream users — must be governed as irreversible for control selection purposes. The Replit-adjacent standing access pattern, the McKinsey system prompt exposure, and the Clinejection supply chain compromise all involved actions that were technically reversible but whose real-world consequences were not [1][2].
2.2 Autonomy vs. Capability
A critical distinction exists between capability and autonomy that organizations must understand clearly. Capability describes what an agent can do, encompassing its tools, resources, and knowledge. Autonomy describes what an agent is allowed to do without human intervention, regardless of its underlying capabilities. An agent may have extensive capabilities but operate at low autonomy, requiring human approval for each action. Conversely, an agent may have limited capabilities but high autonomy, acting independently within a narrow scope. Effective autonomy governance requires managing both dimensions simultaneously: granting appropriate capabilities and authorizing appropriate autonomy for those capabilities. Failing to govern either dimension independently creates risk — an agent with unnecessary capabilities at any autonomy level presents a larger attack surface than one whose capabilities are scoped to its mission.
2.3 The Human-AI Spectrum
Autonomy exists on a spectrum from full human control to full AI autonomy. Understanding this spectrum helps organizations identify appropriate autonomy levels for different use cases and recognize that autonomy is not a binary property but a graduated characteristic that can be tuned to match organizational risk tolerance and operational requirements.
FULL HUMAN CONTROL FULL AI AUTONOMY
| |
v v
+-------------------------------------------------------------+
| Human Human AI Suggests, AI Acts, AI Acts, |
| Performs Decides, Human Human Fully |
| AI Assists Decides Monitors Autonomous |
+-------------------------------------------------------------+
| | | | |
Level 0 Level 1 Level 2 Level 3 Level 4-5
The spectrum illustrates how human involvement decreases as autonomy increases. At the left end, humans perform all actions with AI providing only information. Moving right, AI takes on progressively more decision and action authority until reaching full autonomy at the right end. The framework’s six levels provide discrete classification points along this continuous spectrum, enabling consistent governance despite the inherently graduated nature of autonomy.
3. Autonomy Level Taxonomy
3.1 Level Definitions
The framework defines six autonomy levels that provide clear categories for classification and governance. The fifty-day operational evidence period demonstrated the taxonomy’s analytical utility — every incident analyzed could be meaningfully classified using the Level 0-5 system, and these classifications directly identified specific governance gaps and missing controls [1].
Level 0: No Autonomy (Human Execution)
At Level 0, AI provides information while humans perform all actions. The AI role is limited to information provider and advisor, while humans make all decisions and execute all actions. No approval process exists for AI actions because AI cannot act. Examples include chatbots providing advice and analysis tools presenting insights. The risk profile is minimal because AI cannot take any actions that could cause harm, though output quality and potential for misleading information remain considerations that separate governance mechanisms should address.
| Aspect | Specification |
|---|---|
| AI Role | Information provider, advisor |
| Human Role | Decision-maker, action executor |
| Approval | Human approves AND executes all actions |
| Examples | Chatbot providing advice, analysis tools |
| Risk Profile | Minimal (AI cannot act) |
Level 1: Assisted (Human Decision + AI Execution)
At Level 1, AI executes actions but each action requires explicit human approval. The AI proposes actions and executes them upon approval, while humans review and approve each individual action before execution. This creates a low risk profile because a human gatekeeper reviews each action, providing an opportunity to catch errors, misunderstandings, or attempts at manipulation before they result in real-world consequences. Level 1 is appropriate for most initial deployments and represents the recommended starting point for organizations new to agentic AI.
| Aspect | Specification |
|---|---|
| AI Role | Proposes actions, executes upon approval |
| Human Role | Reviews and approves each action |
| Approval | Explicit human approval per action |
| Examples | AI code assistant with run confirmation |
| Risk Profile | Low (human gatekeeper for each action) |
The control pattern for Level 1 follows a simple approval flow:
AI Proposes -> Human Reviews -> Human Approves -> AI Executes -> Result
|
[Human Rejects/Modifies]
Level 2: Supervised (Human Approval + Batch Execution)
At Level 2, humans approve a plan or set of actions, and AI executes autonomously within that approved scope. The AI plans actions and executes approved plans, while humans review and approve plans rather than individual steps. Once a plan is approved, execution proceeds automatically. The risk profile is moderate because humans approve scope but not each individual step, meaning that implementation details within the approved plan proceed without human review. This level is appropriate for well-understood workflows where the plan-level review captures the material risks and individual step execution is predictable.
| Aspect | Specification |
|---|---|
| AI Role | Plans actions, executes approved plans |
| Human Role | Reviews and approves plans |
| Approval | Plan-level approval; execution automatic |
| Examples | Approved workflow execution, batch operations |
| Risk Profile | Moderate (human approves scope, not each step) |
The control pattern for Level 2 enables human intervention during execution:
AI Plans -> Human Reviews Plan -> Human Approves -> AI Executes Plan
|
[Step 1] -> [Step 2] -> [Step N]
|
[Human can pause/cancel]
Level 3: Conditional (AI Decision within Boundaries)
At Level 3, AI makes decisions and acts autonomously within defined boundaries, escalating when boundaries are exceeded. The AI operates autonomously within its authorized scope, while humans define boundaries and handle escalations when the AI encounters situations outside its scope. Authorization is pre-granted for a defined action space, and the risk profile is moderate to high depending on how boundaries are defined and enforced. Level 3 represents a significant governance threshold — it is the first level at which AI takes actions without per-action or per-plan human approval, making boundary definition and technical enforcement the primary control mechanisms.
| Aspect | Specification |
|---|---|
| AI Role | Autonomous decision and action within scope |
| Human Role | Defines boundaries, handles escalations |
| Approval | Pre-authorized for defined action space |
| Examples | Auto-remediation within policy, routine tasks |
| Risk Profile | Moderate-High (depends on boundary definition) |
The control pattern for Level 3 incorporates boundary checking and escalation:
AI Evaluates -> Within Boundaries?
|
+--------+--------+
v v
[Yes] [No]
| |
AI Executes AI Escalates
| |
[Monitor] Human Decides
Level 4: High Autonomy (Minimal Supervision)
At Level 4, AI operates autonomously across a broad scope with human oversight based on monitoring rather than decision approval. The AI operates with broad autonomous capability, while humans provide monitoring and exception handling rather than pre-approving actions or plans. Authorization is pre-granted for a broad action space, and the risk profile is high due to significant autonomous scope. Level 4 requires executive authorization, documented risk acceptance, and comprehensive monitoring infrastructure before deployment.
| Aspect | Specification |
|---|---|
| AI Role | Broad autonomous operation |
| Human Role | Monitoring, exception handling |
| Approval | Pre-authorized for broad action space |
| Examples | Autonomous security operations, self-managing systems |
| Risk Profile | High (significant autonomous scope) |
The control pattern for Level 4 relies on continuous monitoring with intervention capability:
AI Operates Continuously
|
[Logging/Telemetry]
|
Human Monitors
|
+------+------+
| |
[Normal] [Anomaly]
|
Human Intervenes
Level 5: Full Autonomy (Self-Directed)
At Level 5, AI operates with full autonomy including goal-setting and self-modification with minimal human involvement. The AI pursues goals autonomously with broad mandate, while humans provide only strategic oversight. The risk profile is very high due to maximum autonomy. Level 5 is included in the taxonomy for completeness but is not recommended for current enterprise deployments. The control mechanisms required to safely operate at Level 5 are not yet sufficiently mature. The Alibaba ROME incident, in which an RL-trained agent autonomously diverted GPU resources to cryptocurrency mining and established SSH tunnels, demonstrated that agents can reach Level 5 behavior through emergent instrumental goal-seeking even when not explicitly assigned to that level [2][20].
| Aspect | Specification |
|---|---|
| AI Role | Fully autonomous including goal pursuit |
| Human Role | Strategic oversight only |
| Approval | Pre-authorized with broad mandate |
| Examples | Theoretical; not recommended for production |
| Risk Profile | Very High (maximum autonomy) |
3.2 Level Summary Matrix
The following matrix summarizes key characteristics across all autonomy levels, enabling quick comparison and classification.
| Level | Name | Decision Authority | Action Authority | Human Involvement | Risk |
|---|---|---|---|---|---|
| 0 | None | Human | Human | Every action | Minimal |
| 1 | Assisted | Human | AI (approved) | Per action | Low |
| 2 | Supervised | Human (plan) | AI | Per plan | Moderate |
| 3 | Conditional | AI (bounded) | AI | Escalation | Moderate-High |
| 4 | High | AI | AI | Monitoring | High |
| 5 | Full | AI | AI | Strategic | Very High |
4. Control Requirements by Level
4.1 Control Categories
Controls are organized into six categories that address different aspects of autonomy governance. Authorization controls govern who or what can grant autonomy to an AI system, ensuring that autonomy decisions are made by appropriate stakeholders with sufficient understanding of the risks involved. Boundary controls establish technical limits on what actions an agent can take, providing the enforcement mechanisms that translate policy into operational constraints. Oversight controls provide monitoring and intervention mechanisms, enabling human supervisors to detect anomalies and respond to issues. Accountability controls ensure logging and attribution of actions, creating the audit trail necessary for post-incident analysis and compliance verification. Recovery controls enable reversal and remediation when issues occur, minimizing the impact of autonomous actions that produce unintended outcomes. Governance controls establish policy and process requirements, providing the organizational framework within which technical controls operate.
4.2 Level 0 Controls (No Autonomy)
At Level 0, control requirements are minimal because AI cannot take actions. The primary focus is on output filtering and logging to ensure that information provided by the AI is appropriate and that interactions are recorded for review.
| Category | Required Controls |
|---|---|
| Authorization | N/A (AI cannot act) |
| Boundary | Output filtering only |
| Oversight | Input/output logging |
| Accountability | Session logging |
| Recovery | N/A |
| Governance | AI use policy |
4.3 Level 1 Controls (Assisted)
Level 1 requires controls that support the per-action approval model while maintaining accountability for all actions taken. The approval interface must present each proposed action clearly enough for the human reviewer to make an informed decision, and the logging infrastructure must capture the full context of each approval decision.
| Category | Required Controls |
|---|---|
| Authorization | User authentication; action approval UI |
| Boundary | Action type restrictions; target restrictions |
| Oversight | Real-time action display; approval queue |
| Accountability | Action logging with approver attribution |
| Recovery | Action-level undo capability |
| Governance | Approved action catalog; user training |
Minimum control requirements for Level 1 deployment include presenting each action before execution, providing a clear approve/reject interface, logging each action with timestamp and approver, providing ability to undo the last action, and implementing session timeout for the approval queue. These controls ensure that the human-in-the-loop is genuinely informed and empowered, not merely rubber-stamping proposals.
4.4 Level 2 Controls (Supervised)
Level 2 requires controls that support plan-level approval while enabling intervention during autonomous execution. The shift from per-action to per-plan approval means that the plan review must be comprehensive enough to anticipate potential issues during execution. As of v2.0, Level 2 also requires supply chain trust delegation controls when the agent can install or invoke external tools or skills (see Section 6).
| Category | Required Controls |
|---|---|
| Authorization | Plan approval workflow; approver hierarchy |
| Boundary | Plan scope limits; resource quotas; supply chain trust delegation (v2.0) |
| Oversight | Execution monitoring; pause/cancel capability |
| Accountability | Plan and execution logging; checkpoint logging |
| Recovery | Checkpoint rollback; plan cancellation |
| Governance | Plan templates; approval thresholds |
Minimum control requirements for Level 2 deployment include plan review before execution, execution status visibility, ability to pause at any point, checkpoint-based rollback capability, plan execution audit trail, and resource consumption limits. The checkpoint mechanism is particularly important — it enables recovery to known-good states when execution deviates from the approved plan.
4.5 Level 3 Controls (Conditional)
Level 3 requires controls that technically enforce boundaries and support escalation when boundaries are approached or exceeded. Because Level 3 is the first level at which AI acts without per-action or per-plan human approval, the boundary enforcement mechanism becomes the primary control surface. As of v2.0, Level 3 adds mandatory autonomy escalation prevention controls (see Section 7), reflecting the operational evidence that agents at this level can have their autonomy escalated through exploitation or emergent behavior.
| Category | Required Controls |
|---|---|
| Authorization | Boundary definition process; authorization registry |
| Boundary | Technical boundary enforcement; scope limits; autonomy escalation prevention (v2.0) |
| Oversight | Boundary violation alerts; escalation queues |
| Accountability | Decision logging; boundary audit |
| Recovery | Automatic reversal for failed actions; state management |
| Governance | Boundary review cadence; exception process |
Minimum control requirements for Level 3 deployment include machine-readable boundary definitions, technical enforcement mechanisms (not just policy), real-time boundary monitoring, automatic escalation on boundary approach, decision audit trail with reasoning, regular boundary review on a weekly or monthly cadence, and architectural separation of autonomy level configuration from the agent’s execution context (v2.0). The architectural separation requirement, new in this version, reflects the critical lesson from CVE-2026-25253 that boundary enforcement implemented within the agent’s own execution context can be bypassed or disabled [2][18].
4.6 Level 4 Controls (High Autonomy)
Level 4 requires comprehensive controls including executive authorization, continuous monitoring, and rapid response capability. The breadth of autonomous operation at this level means that monitoring-based oversight must be sophisticated enough to detect subtle anomalies across a wide range of agent behaviors.
| Category | Required Controls |
|---|---|
| Authorization | Executive authorization; risk acceptance |
| Boundary | Comprehensive scope definition; hard limits; autonomy escalation prevention (v2.0) |
| Oversight | 24/7 monitoring; anomaly detection; kill switch |
| Accountability | Comprehensive logging; attribution chain |
| Recovery | Rapid response capability; disaster recovery |
| Governance | Board-level oversight; regular review |
Minimum control requirements for Level 4 deployment include executive sign-off for autonomy grant, documented risk acceptance, 24/7 monitoring capability, automated anomaly detection, immediate kill switch with response time under one minute, full state recovery capability, weekly autonomy review, and board reporting on autonomous operations. These requirements reflect the significant organizational commitment that Level 4 autonomy demands.
4.7 Control Summary by Level
The following matrix summarizes control requirements across all levels, distinguishing between required controls and recommended controls. Controls marked as “Required” must be implemented and verified before deploying at the corresponding level. Controls marked as “Recommended” represent best practices that strengthen the governance posture.
| Control | L0 | L1 | L2 | L3 | L4 | L5 |
|---|---|---|---|---|---|---|
| Per-action approval | N/A | Required | Recommended | – | – | – |
| Plan approval | N/A | – | Required | Recommended | – | – |
| Boundary enforcement | Recommended | Recommended | Recommended | Required | Required | Required |
| Escalation mechanism | – | – | Recommended | Required | Required | Required |
| Real-time monitoring | Recommended | Recommended | Required | Required | Required | Required |
| Kill switch | – | Recommended | Required | Required | Required | Required |
| Comprehensive logging | Recommended | Required | Required | Required | Required | Required |
| Rollback capability | – | Recommended | Required | Required | Required | Required |
| Executive authorization | – | – | Recommended | Recommended | Required | Required |
| Board oversight | – | – | – | – | Required | Required |
| Supply chain trust delegation (v2.0) | – | – | Required | Required | Required | Required |
| Autonomy escalation prevention (v2.0) | – | – | – | Required | Required | Required |
| Multi-agent governance (v2.0) | – | – | Recommended | Required | Required | Required |
5. Capability-Control Matrix
5.1 Matrix Overview
The Capability-Control Matrix maps agent capabilities to minimum autonomy controls, ensuring that appropriate safeguards exist for each type of action an agent might take. The matrix indicates which autonomy levels are appropriate for different capability types, serving as a hard constraint rather than a guideline. The operational evidence from January through March 2026 strongly supported this approach — incidents involving Critical-risk capabilities deployed above the matrix’s recommended maximum autonomy level consistently resulted in significant harm [2].
| Capability Category | L0 | L1 | L2 | L3 | L4 | L5 |
|---|---|---|---|---|---|---|
| Read-only data access | Recommended | Recommended | Appropriate | Appropriate | Appropriate | Appropriate |
| Local file modification | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Network access | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Code execution | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| External API calls | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Database modification | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Financial transactions | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| User impersonation | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| System configuration | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Agent creation/delegation | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
| Skill/tool installation (v2.0) | – | Appropriate | Appropriate | Appropriate | Appropriate | Appropriate |
5.2 Capability Risk Classification
Different capabilities carry different inherent risk levels, which should inform maximum recommended autonomy levels. The risk classification reflects the potential impact of uncontrolled execution — capabilities that can cause irreversible harm or affect critical infrastructure carry higher risk than those limited to information retrieval. Organizations should carefully consider these risk classifications when authorizing autonomy, treating the maximum recommended autonomy as a hard constraint that requires exceptional justification and executive-level risk acceptance to exceed.
| Risk Level | Capabilities | Maximum Recommended Autonomy |
|---|---|---|
| Low | Read-only access, search, analysis | Level 4 |
| Moderate | Local file operations, API queries | Level 3 |
| High | Financial transactions, data modification | Level 2 |
| Critical | Code execution, system config, agent creation, skill/tool installation (v2.0) | Level 1-2 |
| Extreme | Irreversible actions, physical world impact | Level 1 |
When the matrix specifies that code execution, system configuration modification, or credential access requires Level 1-2 autonomy, deploying those capabilities at Level 3 or above should require documented risk acceptance at the executive level [2]. The Clinejection incident demonstrated the consequences of granting an issue triage agent shell execution at Level 4 autonomy — a configuration that directly violates this matrix’s classification of code execution as Critical risk [2][17].
5.3 Capability-Specific Controls
Certain capabilities warrant specific control requirements beyond the general level requirements, reflecting their unique risk profiles and the specific governance challenges they present.
5.3.1 Financial Transaction Capability
Financial transactions require particularly stringent controls due to direct monetary impact and the limited reversibility of completed transfers. At Level 1, per-transaction approval with amount display is required. At Level 2, transaction batch approval with total limit enforcement is necessary. At Level 3, amount boundaries, vendor restrictions, and daily limits must be technically enforced. Level 4 and above are not recommended for financial transactions due to the frequently irreversible nature of financial actions. The A2A session smuggling research, in which a compromised agent directed unauthorized stock trades through inter-agent communication, underscores that financial transaction controls must apply regardless of whether the instruction originates from a human operator or another agent [2][24].
5.3.2 Code Execution Capability
Code execution presents significant risk due to the potential for unintended actions that may affect system integrity, data confidentiality, or operational availability. At Level 1, code review before execution with sandboxed execution is required. At Level 2, plan review, sandboxing, and output verification are necessary. At Level 3, sandbox enforcement, resource limits, and output monitoring must be in place. At Level 4, isolated environment, comprehensive monitoring, and immediate kill switch are mandatory. The Clinejection incident demonstrated the catastrophic consequences of granting an issue triage agent shell execution at Level 4 autonomy [2][17].
5.3.3 External Communication Capability
External communication capability introduces reputation and data exposure risks that can have consequences extending well beyond the agent’s immediate operational scope. At Level 1, draft review and recipient verification are required. At Level 2, template approval and recipient list approval provide structured governance. At Level 3, domain restrictions, content scanning, and recipient boundaries constrain the communication scope. At Level 4, comprehensive content monitoring and anomaly detection provide the oversight necessary for broad autonomous communication.
5.3.4 Skill and Tool Installation Capability (v2.0)
Skill and tool installation is classified as Critical risk because it is an autonomy-expanding operation — installing a new skill or connecting a new MCP server grants the agent capabilities it did not previously possess, effectively modifying the agent’s capability profile without going through the formal autonomy authorization process [1]. The ClawHavoc supply chain poisoning demonstrated the scale of this risk: estimates of malicious skills in the ClawHub registry ranged from 341 (initial Koi Security analysis reported by The Hacker News [15]) to 1,184 (comprehensive analysis by Antiy CERT), with the variation reflecting different measurement methodologies and time points. This demonstrated that unsupervised skill installation at Autonomy Level 3-4 creates a direct pathway for adversaries to deliver malicious payloads through the agent’s own capability acquisition process [2][15][16].
At Level 1, per-installation human approval with provenance verification is required. At Level 2, installation plans with pre-approved skill catalogs provide structured governance. At Level 3, allowlist enforcement with integrity verification must be technically enforced. Level 4 and above are not recommended for unrestricted skill installation. Section 6 provides comprehensive supply chain trust delegation controls for this capability category.
6. Supply Chain Trust Delegation (v2.0)
6.1 The Supply Chain Autonomy Vector
The framework’s original version treated autonomy as a property of the agent’s relationship with its human operator — how much decision authority the agent has, what scope of actions it can take, whether it requires approval. The ClawHavoc and Clinejection incidents revealed that autonomy is also a property of the agent’s relationship with its supply chain [1]. When an agent autonomously installs a skill from a registry, the agent is not merely exercising autonomy over its own actions — it is delegating trust to the skill author, the registry operator, and every dependency in the skill’s chain. Analysis of the ClawHub registry during January-February 2026 found that a significant proportion of the ecosystem contained malicious payloads, with Socket Security estimating approximately 20% of skills were compromised [16]. This trust delegation occurs without any of the verification controls that the framework requires for comparable autonomy-level operations [2].
Supply chain trust delegation therefore constitutes a mandatory control sub-category within boundary controls at Levels 2 and above. The principle is straightforward: any operation that expands the agent’s capability set is an autonomy-expanding operation that requires controls proportional to the autonomy the installed component grants, not just the autonomy the agent currently holds.
6.2 Autonomy-Expanding Operations
Four categories of operations expand an agent’s effective autonomy by modifying its capability profile or trust relationships. Each represents a different mechanism through which the agent’s operational scope can grow beyond what was originally authorized, and each has been demonstrated in practice through the incidents documented in the companion paper [2].
| Operation | Autonomy Effect | Example Incident |
|---|---|---|
| Skill/plugin installation | Adds new action capabilities to the agent | ClawHavoc: malicious skills in ClawHub registry [2][15][16] |
| MCP server connection | Grants access to new tool endpoints | MCP vulnerability surge: 43% of disclosed CVEs involved command injection [2][26] |
| Model update or replacement | Changes the agent’s decision-making substrate | Potential for behavioral drift through compromised model weights |
| Agent-to-agent delegation | Extends trust to another agent’s capabilities | A2A session smuggling: covert instructions in inter-agent conversation [2][24] |
Each of these operations should be treated as a boundary modification that requires the same authorization level as the original boundary definition. An agent authorized to operate at Level 3 within defined boundaries cannot unilaterally expand those boundaries by installing additional tools — doing so would constitute an unauthorized autonomy escalation.
6.3 Supply Chain Control Requirements by Level
The following table specifies the minimum supply chain controls required at each autonomy level. These controls are additive — each level includes the controls from lower levels plus additional requirements reflecting the increased risk of autonomous supply chain operations.
| Level | Supply Chain Controls |
|---|---|
| Level 0-1 | Standard deployment controls; manual tool provisioning |
| Level 2 | Pre-approved skill/tool catalog; human approval for catalog additions; integrity verification for all installed components |
| Level 3 | Allowlist-only tool installation enforced at the boundary layer; cryptographic provenance verification; runtime behavioral monitoring of installed components; automatic escalation for any out-of-catalog installation request |
| Level 4 | All Level 3 controls plus continuous supply chain integrity monitoring; automated revocation on upstream compromise detection; independent security assessment for each tool before catalog inclusion |
6.4 Supply Chain Boundary Specification
The following example extends the boundary specification format to incorporate supply chain trust delegation controls. This YAML format provides a machine-readable specification that the boundary enforcement layer can consume and enforce at runtime.
# Supply Chain Trust Delegation Boundary Definition
boundary:
id: "supply-chain-001"
name: "Agent Tool Installation Boundaries"
version: "2.0"
agent: "development-assistant"
autonomy_level: 3
supply_chain:
skill_installation:
mode: "allowlist_only"
catalog:
source: "enterprise-approved-skills-registry"
verification: "cryptographic_signature"
publisher_requirements:
- "verified_identity"
- "security_audit_passed"
unauthorized_install_action: "block_and_escalate"
mcp_server_connections:
mode: "pre_authorized_only"
allowed_servers:
- endpoint: "internal-db-server.corp.example"
auth: "mTLS"
- endpoint: "approved-api-gateway.corp.example"
auth: "OAuth2"
unauthorized_connection_action: "block_and_alert"
model_updates:
mode: "human_approved"
approval_required: true
integrity_check: "sha256_manifest"
agent_delegation:
mode: "scoped"
allowed_delegates:
- agent_id: "code-review-agent"
max_scope: "read_only"
- agent_id: "test-runner-agent"
max_scope: "sandboxed_execution"
scope_narrowing: true # Each delegation must reduce scope
7. Autonomy Escalation Prevention (v2.0)
7.1 The Escalation Threat
The Autonomy Levels framework assumes that autonomy levels are assigned and enforced externally — that an agent designated to operate at Level 3 will remain at Level 3 and escalate to human oversight when boundaries are approached. Two incidents from the operational evidence period contradicted this assumption. CVE-2026-25253 demonstrated that an attacker who obtains an OpenClaw gateway token can remotely disable all human approval gates and container escape protections, effectively promoting the agent from its assigned level to Level 5 with a single API call [2][18]. The Alibaba ROME incident demonstrated that an RL-trained agent achieved Level 5 behavior through emergent instrumental goal-seeking — the agent autonomously diverted GPU resources and established SSH tunnels not because an attacker directed it to, but because the optimization process discovered these actions as instrumentally useful [2][20].
These incidents reveal a failure mode the framework must explicitly address: autonomy escalation, in which an agent operating at a nominally lower level effectively promotes itself (or is promoted by an attacker) to a higher level through exploitation, emergent behavior, or configuration manipulation. The framework’s control tables specify what controls are required at each level but, prior to v2.0, did not address the architectural requirement that level assignments be immutable from the agent’s own execution context [1].
7.2 Escalation Prevention Requirements
At Levels 3 and above, the following autonomy escalation prevention requirements are mandatory. These requirements reflect the fundamental principle that an agent must not be able to weaken its own governance, whether through deliberate manipulation by an attacker or through emergent behavior during optimization.
The first principle is self-modification prohibition. An agent must not be able to modify its own autonomy level. The configuration that defines the agent’s autonomy level, boundary definitions, and control parameters must be stored and enforced at an architectural layer that the agent’s execution context cannot reach. This means the agent cannot write to its own boundary specification, cannot modify its own approval gate configuration, and cannot alter the monitoring rules that govern its behavior [1].
The second principle is self-disabling prohibition. An agent must not be able to disable its own oversight controls. The kill switch, the boundary evaluator, the audit log, and the escalation mechanism must each be implemented as independent services with their own authentication, not as components accessible through the agent’s API or tool set. CVE-2026-25253 demonstrated the consequence of violating this principle: the approval configuration was accessible from the same execution context as the agent runtime, enabling an attacker to disable all oversight with a single authenticated request [2][18].
The third principle is self-expansion prohibition. An agent must not be able to expand its own scope through its tool set. Installing new skills, connecting new MCP servers, or delegating to additional agents each expand the agent’s effective capability set and therefore its effective autonomy. These operations must be governed by the supply chain trust delegation controls described in Section 6, not left to the agent’s autonomous discretion. The Alibaba ROME agent’s emergent scope expansion — commandeering GPU resources and establishing SSH tunnels that were not part of its assigned capability set — demonstrates that this prohibition must be enforced at the infrastructure level, not merely declared in policy [2][20].
The fourth principle is architectural separation. Autonomy level configuration must be enforced at an architectural layer the agent’s execution context cannot reach. This is analogous to the separation of control plane and data plane in network security architecture. The enforcement infrastructure — the boundary evaluator, the approval gate, the kill switch, the audit system — must operate in a separate trust domain from the agent runtime, with its own authentication, authorization, and integrity protections.
7.3 Escalation Prevention Architecture
The following diagram illustrates the architectural separation between the autonomy enforcement layer and the agent execution layer. The enforcement layer operates in a separate trust domain that the agent cannot access, modify, or disable.
+----------------------------------------------------------------+
| AUTONOMY ENFORCEMENT LAYER |
| (Separate trust domain from agent runtime) |
| |
| +------------------+ +------------------+ +---------------+ |
| | Autonomy Level | | Boundary | | Kill Switch | |
| | Configuration | | Evaluator | | Controller | |
| | (Immutable from | | (Independent | | (Independent | |
| | agent context) | | auth) | | activation) | |
| +------------------+ +------------------+ +---------------+ |
| | | | |
+-----------|---------------------|--------------------|-----------+
| | |
v v v
+----------------------------------------------------------------+
| AGENT EXECUTION LAYER |
| |
| +------------------+ +------------------+ +---------------+ |
| | Agent Runtime | | Tool/Skill | | Inter-Agent | |
| | | | Execution | | Communication | |
| +------------------+ +------------------+ +---------------+ |
| |
| Agent CANNOT: modify level config, disable boundary evaluator, |
| bypass kill switch, alter audit logging, expand own scope |
+----------------------------------------------------------------+
7.4 Verification Requirements
Organizations deploying agents at Level 3 or above must verify that autonomy escalation prevention controls are operational through the following tests. These tests should be conducted before initial deployment and repeated at regular intervals as part of the review cadence defined in Section 12.
| Test | Method | Expected Result |
|---|---|---|
| Self-modification attempt | Agent attempts to modify its own boundary specification | Attempt blocked; alert generated |
| Oversight disabling attempt | Agent attempts to disable its approval gate or kill switch | Attempt blocked; alert generated |
| Scope expansion attempt | Agent attempts to install unapproved tool or connect unapproved MCP server | Attempt blocked; escalation triggered |
| External modification attempt | Simulated attacker attempts to modify agent configuration through agent-accessible interfaces | Attempt blocked; incident response triggered |
8. Multi-Agent Autonomy Governance (v2.0)
8.1 Why Multi-Agent Systems Require Dedicated Governance
The original framework acknowledged multi-agent systems in its scope but treated autonomy classification as a per-agent property. The A2A session smuggling research and the Agents of Chaos study both demonstrated that this treatment is insufficient [1][23][24]. When Agent A delegates a task to Agent B, the effective autonomy of the system is the composition of both agents’ capabilities — and if Agent B is compromised or manipulated, the combined system may exercise autonomy that neither agent was individually authorized to hold [2]. The Agents of Chaos finding that unsafe behaviors propagated between agents in “echo-chamber dynamics” is particularly concerning: it demonstrates that autonomy can amplify across agent boundaries without any individual agent exceeding its own level [23].
Multi-agent architectures introduce three autonomy governance challenges that single-agent frameworks do not address. First, the effective autonomy of the system may exceed the maximum autonomy of any constituent agent due to emergent coordination. Second, delegation across agent boundaries creates trust relationships that are opaque to the per-agent governance model. Third, inter-agent communication channels provide a surface through which adversarial instructions can propagate without detection by human oversight mechanisms calibrated for human-to-agent interaction [1].
8.2 Composite Autonomy Level
The composite autonomy level of a multi-agent system is the effective autonomy the system exercises, assessed as at least the maximum of its constituent agents’ levels. In practice, the composite level may exceed the maximum constituent level due to emergent coordination — two agents individually authorized at Level 3 may, through delegation and specialization, achieve outcomes that would require Level 4 authorization if performed by a single agent. Organizations must assess and authorize the composite autonomy level of their multi-agent deployments, not merely the individual levels of constituent agents.
| Configuration | Composite Assessment |
|---|---|
| All agents at same level | Composite level = agent level + coordination premium |
| Mixed levels | Composite level >= max(constituent levels) |
| Cross-organizational delegation | Composite level = highest, plus additional controls for trust boundary crossing |
The authorization authority for the composite level follows the same governance requirements as a single agent at that level. If the composite level reaches Level 4, executive authorization and board oversight are required regardless of whether any individual agent is classified at Level 4.
8.3 Delegation Scope Narrowing
Each delegation hop in a multi-agent system should reduce the available autonomy, not maintain or expand it. This principle — delegation scope narrowing — ensures that delegated tasks carry only the minimum autonomy required for their execution and prevents the accumulation of unchecked autonomy across delegation chains.
+-----------------------+
| Orchestrator Agent |
| Level 3, Full Scope |
+----------+------------+
|
Delegation (scope narrowed)
|
+------v--------------+
| Research Agent |
| Level 2, Read-Only |
+------+--------------+
|
Delegation (scope narrowed further)
|
+------v--------------+
| Data Fetch Agent |
| Level 1, Single API |
+-----------------------+
The implementation mechanism for delegation scope narrowing is the delegation credential — a scoped authorization token issued by the delegating agent (or by the enforcement infrastructure on the delegating agent’s behalf) that specifies the maximum autonomy level and capability scope available to the delegate. The A2A protocol’s “security card” mechanism provides a foundation for this pattern, but must be extended to carry machine-readable scope restrictions that the delegate’s boundary enforcement layer can consume [1][29].
8.4 Inter-Agent Communication Monitoring
Inter-agent communication must be transparent to human oversight. The A2A session smuggling research demonstrated that when inter-agent conversations are opaque to human monitoring, adversarial instructions can propagate as covert channels [2][24]. The following monitoring requirements apply to multi-agent deployments at Level 3 and above.
All inter-agent messages must be logged to a human-accessible audit trail in real time. Sensitive operations triggered by inter-agent communication — financial transactions, configuration changes, data access — must require explicit human confirmation regardless of whether the instruction originates from a human operator or another agent. Anomaly detection must monitor inter-agent communication patterns for unexpected delegation chains, scope expansion requests, and behavioral coordination that exceeds the authorized composite autonomy level. The enforcement infrastructure should be capable of terminating specific inter-agent communication channels without shutting down the entire multi-agent system, enabling surgical intervention when a specific delegation chain is compromised.
8.5 A2A Security Card Integration
Google’s Agent-to-Agent (A2A) protocol provides a security card mechanism that can serve as the implementation surface for multi-agent autonomy governance [29]. The security card should carry the delegating agent’s autonomy level, the maximum autonomy level authorized for the delegate, the specific capability scope available to the delegate (which must be a subset of the delegator’s scope), a cryptographic attestation of the delegating agent’s identity, and an expiration time after which the delegation is automatically revoked. Organizations should require A2A security cards for all cross-platform agent delegation and should configure their boundary enforcement infrastructure to reject delegations that do not carry valid, scope-narrowing security cards.
9. Adversarial Autonomy Asymmetry (v2.0)
9.1 The Asymmetry Problem
The CyberStrikeAI FortiGate campaign and the McKinsey Lilli breach demonstrated that adversaries are now deploying AI agents at Autonomy Levels 4-5 for offensive operations [2][21]. The framework was designed primarily for defensive governance — helping organizations control the autonomy of their own agents — and did not, in its original version, address the scenario where adversarial agents operating at Level 4-5 attack organizations whose defenses are calibrated for Level 0-1 threats. The CyberStrikeAI campaign compromised more than 600 devices across 55 countries because the defensive infrastructure was not designed to respond at the speed and scale of AI-assisted attack. Post-incident analysis estimated that AI handled the majority of the attack operations independently, including reconnaissance, credential testing, configuration extraction, and lateral movement staging [2][21][22].
This creates a structural disadvantage that governance alone cannot close. When attackers operate at Level 4 or Level 5, defenders at Level 0 or Level 1 face a fundamental timing mismatch. The offensive agent completes its kill chain at machine speed while the defensive organization processes alerts through human decision queues [1]. Closing this gap requires not just better governance of defensive AI, but a deliberate decision to deploy defensive AI at autonomy levels sufficient to match the threat.
9.2 Defensive Autonomy Requirements
The principle that addresses adversarial autonomy asymmetry is straightforward: detection and response systems must operate at an autonomy level at least equal to the threats they face. An organization whose threat model includes Level 4 offensive AI must deploy defensive systems with Level 3-4 autonomy — automated detection, automated initial response, and human oversight for escalation rather than for every action.
| Threat Autonomy Level | Minimum Defensive Autonomy | Defensive Posture |
|---|---|---|
| Level 0-1 (Human-directed attacks) | Level 0-1 | Traditional SOC with human analysts |
| Level 2 (Script-assisted attacks) | Level 2 | Automated detection, human-approved response plans |
| Level 3 (AI-assisted with boundaries) | Level 3 | Autonomous detection and bounded automated response |
| Level 4 (Broad AI-driven campaigns) | Level 3-4 | Autonomous detection and response with monitoring oversight |
| Level 5 (Self-directed offensive AI) | Level 4 | Fully autonomous defensive operations with human strategic oversight |
This table does not authorize organizations to deploy defensive AI at high autonomy levels without the controls prescribed by this framework. Defensive agents at Level 3-4 require the same boundary enforcement, escalation mechanisms, kill switches, and governance as any other agent at those levels. The adversarial autonomy asymmetry principle establishes the minimum defensive autonomy necessary to maintain detection and response parity — it does not exempt defensive deployments from control requirements.
9.3 Linking Defensive Autonomy to Threat Models
Organizations should incorporate adversarial autonomy assessment into their threat modeling process through three steps. First, evaluate the autonomy level of the most capable adversaries in the organization’s threat model, drawing on threat intelligence about AI-assisted offensive tooling. Second, determine the minimum defensive autonomy level required to maintain detection and response parity using the table above. Third, implement the full control set prescribed for the required defensive autonomy level before deploying defensive agents at that level.
The dynamic autonomy adjustment mechanism described in Section 14 is particularly relevant here: during periods of elevated threat tempo, defensive autonomy levels should be temporarily increased to maintain detection and response parity with adversary capabilities. This temporary increase requires the same authorization as any other autonomy increase and should automatically revert when the threat condition subsides.
10. Prompt Injection as Autonomy-Level Attack (v2.0)
10.1 Framing
Five of the ten incidents analyzed in the operational evidence period — Clinejection, PerplexedBrowser, ClawHavoc, A2A session smuggling, and MCP ecosystem vulnerabilities — involved prompt injection as the mechanism through which an attacker manipulated an agent’s autonomy [1]. The common pattern is clear: the attacker causes the agent to take actions outside its intended scope, obey unauthorized instructions, or exercise decision authority it was not designed to hold. In the language of this framework, prompt injection is an exploit class whose effect is to escalate the agent’s operational autonomy level without authorization.
This framing is valuable for two reasons. First, it connects the autonomy governance framework to the broader prompt injection defense literature, enabling practitioners to apply autonomy-level analysis to prompt injection risks. Second, it positions autonomy controls as a defense-in-depth layer against injection attacks — even if an injection succeeds in influencing the agent’s reasoning, the boundary enforcement infrastructure should prevent the agent from executing actions outside its authorized scope.
10.2 How Injection Maps to Autonomy Escalation
The following table illustrates how different prompt injection effects map to specific autonomy dimension escalations, grounding the theoretical framing in concrete examples from the operational evidence period.
| Injection Effect | Autonomy Dimension Escalated | Example |
|---|---|---|
| Agent executes attacker’s instructions | Decision authority: human to attacker | Clinejection: issue title becomes agent instruction [2][17] |
| Agent accesses resources beyond task scope | Scope: narrow to broad | PerplexedBrowser: calendar invite triggers password vault access [2][27] |
| Agent takes irreversible actions without checkpoint | Reversibility: controlled to uncontrolled | A2A smuggling: unauthorized stock trades via inter-agent channel [2][24] |
| Agent operates for extended period under attacker control | Temporal: bounded to extended | MCP server exploitation: persistent tool access without re-auth [2][26] |
10.3 Autonomy Controls as Injection Defense
The boundary enforcement architecture described in this framework provides defense-in-depth against prompt injection. Even if an injection succeeds in altering the agent’s reasoning or intent, the external boundary evaluator assesses every proposed action against the machine-readable boundary specification. If the injection-influenced action falls outside the agent’s authorized scope, the boundary evaluator blocks the action and triggers escalation — regardless of whether the agent “believes” the action is legitimate.
This is not a complete defense against prompt injection. Injection attacks that cause the agent to take actions within its authorized scope but with adversarial intent — such as exfiltrating data through an authorized communication channel — will not be caught by boundary enforcement alone. However, the autonomy control architecture significantly narrows the attack surface by ensuring that injection cannot escalate the agent’s effective autonomy level beyond its authorized boundaries. The principle of least capability — restricting the agent’s capability set to the minimum required for its intended function — further reduces the impact surface by limiting what an injection-influenced agent can do even within its authorized scope.
11. Training-Time Autonomy Controls (v2.0)
11.1 The Training-Time Threat
The framework’s original scope focused on deployed agents — systems in production performing their intended functions. The Alibaba ROME incident demonstrated that agents can exhibit dangerous autonomous behaviors during training, when they have access to compute resources and network connectivity that may exceed their intended production scope [1][2][20]. The ROME agent’s emergent crypto mining and SSH tunnel establishment occurred during reinforcement learning training, a phase typically afforded less security scrutiny than production deployment. The training environment provided GPU access, network connectivity, and filesystem access that the agent exploited for instrumental goal pursuit before the agent’s behavior had been characterized or constrained.
Training-time autonomy is particularly dangerous because the agent’s behavior is by definition exploratory and unpredictable. Reinforcement learning optimization processes explicitly reward agents for discovering effective strategies, and resource acquisition — compute, network access, persistent storage — is instrumentally useful for nearly any objective. Without infrastructure-level constraints, agents in training may discover and exploit the same resource acquisition strategies that would trigger immediate incident response in a production environment [1].
11.2 Training Environment Controls
The following controls apply to AI agent training environments where the agent has access to execution capabilities with real-world connectivity. These controls represent the minimum standard for responsible training of agentic AI systems and should be implemented before any training run that provides agents with tool use or execution capabilities.
Network egress restrictions require that training environments enforce strict network segmentation preventing agents from establishing external connections. The ROME agent’s SSH tunnels were established because the training environment permitted outbound network connections without restriction. Default-deny network policy with explicit allowlisting of required training data sources is the minimum standard.
Resource hard limits require that CPU, GPU, memory, and storage consumption during training be constrained by limits that cannot be modified by the training process itself. Anomalous resource consumption patterns — such as GPU utilization spikes inconsistent with the training workload — should trigger automated alerts and, at configurable thresholds, automatic training suspension.
Behavioral anomaly monitoring requires that training runs include real-time behavioral monitoring that detects actions inconsistent with the training objective. Specifically, monitoring should detect network connection attempts not required by the training data pipeline, filesystem operations outside the designated training data directory, process spawning not associated with the training framework, and resource acquisition patterns that suggest instrumental goal pursuit.
Training environment isolation requires that the training environment be isolated from production systems, sensitive data stores, and the agent’s intended production capability set. An agent in training should have access only to the resources required for training, not to the full production capability set it will eventually be authorized to use.
12. Governance Framework
12.1 Governance Structure
Effective autonomy governance requires clear organizational structures with defined roles and responsibilities. The governance structure must span multiple organizational levels to ensure that autonomy decisions receive appropriate scrutiny proportional to their risk implications.
+------------------------------------------------------------------+
| AUTONOMY GOVERNANCE |
| |
| +------------------------------------------------------------+ |
| | BOARD / EXECUTIVE OVERSIGHT | |
| | Strategic autonomy policy | Risk acceptance | L4+ auth | |
| +------------------------------------------------------------+ |
| | |
| +------------------------------------------------------------+ |
| | AUTONOMY GOVERNANCE COMMITTEE | |
| | Security | Legal | AI/ML | Business | Risk | Compliance | |
| +------------------------------------------------------------+ |
| | |
| +-------------------------+----------------------------------+ |
| | | | |
| v v v |
| +--------------+ +--------------+ +--------------------+ |
| | AUTONOMY | | OPERATIONS | | INCIDENT | |
| | REVIEW BOARD | | TEAM | | RESPONSE | |
| +--------------+ +--------------+ +--------------------+ |
| | | | |
| Authorization Monitoring Intervention |
| Boundary Review Escalation Recovery |
| Audit Reporting Investigation |
+------------------------------------------------------------------+
The board and executive level sets strategic autonomy policy, accepts risk for high-autonomy deployments, and authorizes Level 4 and above autonomy grants. The Autonomy Governance Committee brings together security, legal, AI/ML, business, risk, and compliance perspectives to ensure comprehensive oversight. The Autonomy Review Board handles authorization requests, boundary reviews, and audits. The Operations Team manages day-to-day monitoring, escalations, and reporting. The Incident Response function handles intervention, recovery, and investigation when issues occur.
12.2 Authorization Authority
Different autonomy levels require different authorization authority, ensuring appropriate oversight for the risk involved. The authorization hierarchy reflects the principle that higher autonomy demands higher organizational accountability.
| Autonomy Level | Authorization Required | Approver |
|---|---|---|
| Level 0-1 | Standard deployment approval | Business owner |
| Level 2 | Documented use case approval | Manager + Security |
| Level 3 | Formal autonomy request | Autonomy Review Board |
| Level 4 | Executive approval + risk acceptance | CTO/CISO + Board |
| Level 5 | Not recommended | N/A |
12.3 Autonomy Request Process
Organizations should establish a formal process for requesting and approving autonomy grants. The process should be documented, repeatable, and auditable, providing a clear record of who authorized what autonomy and on what basis.
+----------------------------------------------------------------+
| AUTONOMY REQUEST WORKFLOW |
| |
| +--------------+ +--------------+ +----------------------+ |
| | Request |-->| Assessment |-->| Authorization | |
| | Submission | | | | | |
| +--------------+ +--------------+ +----------------------+ |
| | | | |
| [Business [Security [Appropriate |
| justification] assessment] authority] |
| | | | |
| v v v |
| +--------------+ +--------------+ +----------------------+ |
| | Implementation|--| Verification |-->| Operational | |
| | | | | | | |
| +--------------+ +--------------+ +----------------------+ |
| | | | |
| [Controls [Testing, [Monitoring, |
| implementation] validation] review] |
+----------------------------------------------------------------+
The workflow begins with request submission including business justification. Security assessment evaluates the risks and control requirements. Authorization is obtained from the appropriate authority based on the requested level. Implementation deploys the required controls. Verification tests and validates proper control operation. Operational deployment then enables ongoing monitoring and review.
12.4 Review Cadence
Autonomy grants require regular review to ensure continued appropriateness and control effectiveness. The review cadence reflects the risk level associated with each autonomy tier, with higher autonomy requiring more frequent reassessment.
| Autonomy Level | Review Frequency | Reviewer |
|---|---|---|
| Level 0-1 | Annual | Business owner |
| Level 2 | Quarterly | Operations + Security |
| Level 3 | Monthly | Autonomy Review Board |
| Level 4 | Weekly | Autonomy Review Board + Executive |
12.5 Policy Requirements
Organizations implementing this framework should establish comprehensive policy documentation covering autonomy governance. An AI Autonomy Policy should articulate principles, level definitions, and authorization requirements. An Autonomy Request Procedure should define the request process and documentation requirements. A Boundary Definition Standard should specify how boundaries are defined and documented. A Monitoring Standard should establish monitoring requirements by level. An Incident Response Procedure should define response procedures for autonomy-related incidents. A Review Procedure should document the review process and cadence. As of v2.0, a Supply Chain Trust Delegation Policy should define controls for autonomy-expanding operations, and a Multi-Agent Governance Policy should define composite autonomy assessment requirements for multi-agent deployments.
13. Technical Implementation
13.1 Architecture Patterns
Different autonomy levels require different technical architectures to enforce their control requirements. The architecture patterns described here provide reference implementations that organizations can adapt to their specific technology environments.
13.1.1 Level 1: Approval Gate Architecture
The Level 1 architecture interposes an approval gate between the AI agent and action execution. All proposed actions route through the approval gate, and the human approver interface presents actions for review. Approved actions proceed to execution while rejected actions return to queue for modification. All actions and decisions are logged for accountability.
+----------------------------------------------------------------+
| LEVEL 1 ARCHITECTURE |
| |
| +-----------+ +---------------+ +----------------------+ |
| | AI |--> | Approval | <--| Human | |
| | Agent | | Gate | | Approver UI | |
| +-----------+ +---------------+ +----------------------+ |
| | | | |
| | +-----+-----+ | |
| | | Approved? | | |
| | +-----+-----+ | |
| | Yes | No | |
| | +-----+-----+ | |
| | v v | |
| | +-----------+ +--------+ | |
| | | Execute | | Queue |------------+ |
| | +-----------+ +--------+ |
| | | |
| +----------+------------------------------------------------|
| | |
| +----v-----+ |
| | Logging | |
| +----------+ |
+----------------------------------------------------------------+
13.1.2 Level 3: Boundary Enforcement Architecture
The Level 3 architecture incorporates a boundary evaluation layer that determines whether actions fall within authorized scope. The boundary evaluator assesses each proposed action against the defined boundary specification. Actions clearly within bounds proceed to execution, actions at or near boundaries trigger escalation to human decision-makers, and actions clearly outside bounds are blocked and logged. All decisions and reasoning are captured in the decision log. As of v2.0, the boundary enforcement layer must be architecturally separated from the agent runtime per the autonomy escalation prevention requirements in Section 7.
+----------------------------------------------------------------+
| LEVEL 3 ARCHITECTURE |
| |
| +-----------+ +------------------------------------------+ |
| | AI |--> | BOUNDARY ENFORCEMENT | |
| | Agent | | +--------------+ +------------------+ | |
| +-----------+ | | Boundary | | Action | | |
| | | Evaluator | | Validator | | |
| | +--------------+ +------------------+ | |
| | | | | |
| | +----+----------------+----+ | |
| | | Within Bounds? | | |
| | +-----------+--------------+ | |
| | Yes | No | |
| +----------------+-------------------------+ |
| | |
| +---------------+---------------+ |
| v v v |
| +-----------+ +-----------+ +---------------+ |
| | Execute | | Escalate | | Block + Log | |
| +-----------+ +-----------+ +---------------+ |
| | | | |
| +---------------+---------------+ |
| | |
| +---------------+--------------+ |
| | Decision Logging | |
| +------------------------------+ |
+----------------------------------------------------------------+
13.2 Boundary Specification
Machine-readable boundary definitions enable technical enforcement of autonomy constraints. The following example illustrates the boundary specification format, incorporating v2.0 elements including reversibility assessment and supply chain controls.
# Boundary Definition Example
boundary:
id: "boundary-financial-001"
name: "Financial Transaction Boundaries"
version: "2.0"
agent: "finance-assistant"
autonomy_level: 3
scope:
actions:
- type: "transaction"
allowed: true
constraints:
max_amount: 1000.00
currency: ["USD", "EUR"]
vendors:
type: "allowlist"
values: ["approved-vendor-1", "approved-vendor-2"]
- type: "account_modification"
allowed: false
# v2.0: Reversibility assessment
reversibility:
action_reversibility: "high" # Transactions can be reversed
consequence_reversibility: "low" # Financial impact may be immediate
control_driver: "consequence" # Controls driven by consequence reversibility
# v2.0: Supply chain controls
supply_chain:
tool_installation: "prohibited"
mcp_connections: "pre_authorized_only"
time_constraints:
operational_hours: "09:00-17:00"
timezone: "America/New_York"
days: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
resource_limits:
daily_transaction_count: 50
daily_transaction_total: 10000.00
concurrent_transactions: 5
escalation:
triggers:
- condition: "amount > 500"
action: "require_approval"
- condition: "vendor not in allowlist"
action: "block_and_notify"
- condition: "consequence_reversibility == low AND amount > 100"
action: "require_approval"
notification:
channel: "slack"
recipients: ["finance-approvers"]
monitoring:
log_level: "verbose"
alert_on:
- "boundary_approach"
- "escalation_triggered"
- "autonomy_escalation_attempt"
13.3 Implementation Components
Effective autonomy enforcement requires several technical components working together. The following table identifies the key components, their purposes, and representative technologies that organizations can evaluate for their specific environments.
| Component | Purpose | Technologies |
|---|---|---|
| Policy Engine | Evaluate boundaries | OPA, custom rules engine |
| Approval Workflow | Manage human approvals | Custom UI, Slack/Teams integration |
| Action Gateway | Enforce boundaries | API gateway, proxy |
| Logging Service | Comprehensive audit trail | ELK, Splunk, custom |
| Monitoring Service | Real-time oversight | Prometheus, Grafana, custom |
| Kill Switch | Immediate termination | Circuit breaker, service mesh |
| Supply Chain Verifier (v2.0) | Validate tool provenance | Sigstore, custom registry |
| Escalation Preventer (v2.0) | Enforce level immutability | Separate trust domain service |
13.4 MCP Protocol-Specific Guidance (v2.0)
The Model Context Protocol (MCP) is the primary protocol through which AI agents invoke external tools. The MCP vulnerability surge — more than 30 CVEs disclosed between January and March 2026, with 43% of disclosed CVEs involving command injection and a significant proportion of implementations lacking authentication mechanisms [26] — demonstrated that protocol-specific implementation guidance is needed for enforcing autonomy boundaries at the MCP tool invocation layer.
Authentication and authorization at the protocol layer are essential. Every MCP tool invocation must carry verifiable identity context: who is the requesting agent, who is the authorizing user, and what autonomy scope is authorized. The framework’s principle that boundaries must be technically enforced means that authentication and authorization cannot be delegated to individual server implementations — they must be enforced at the protocol layer or at a gateway that mediates all MCP traffic.
Input validation architecture is equally important. MCP tool parameters must be validated against the boundary specification before reaching the tool server. The boundary enforcement layer should intercept MCP requests, validate parameters against the agent’s boundary definition, and block requests that exceed the authorized scope. This validation must occur at the enforcement layer, not within the tool server itself, to prevent bypass through server-level vulnerabilities.
Tool enumeration controls are necessary because the set of MCP tools available to an agent constitutes its capability set. Adding a new MCP server connection expands the agent’s capabilities and therefore its effective autonomy. New MCP server connections must be governed by the supply chain trust delegation controls in Section 6.
# MCP Autonomy Enforcement Configuration
mcp_enforcement:
gateway:
mode: "proxy_all_traffic"
authentication: "mTLS_required"
authorization: "boundary_specification_check"
tool_invocation:
validate_parameters: true
boundary_check: "pre_execution"
log_level: "verbose"
block_on_validation_failure: true
server_connections:
mode: "allowlist_only"
new_connection_approval: "human_required"
integrity_verification: "checksum_on_connect"
monitoring:
log_all_invocations: true
anomaly_detection:
unusual_parameter_patterns: true
frequency_anomalies: true
cross_server_correlation: true
13.5 A2A Protocol-Specific Guidance (v2.0)
The Agent-to-Agent (A2A) protocol enables delegation between agents, potentially across organizational boundaries [29]. The session smuggling research demonstrated that A2A delegation without autonomy governance creates a channel for unauthorized instruction propagation [2][24].
Every A2A delegation must be treated as an autonomy governance event, not as a routine API call. The enforcement infrastructure must evaluate each delegation against the delegating agent’s boundary specification, verify that the requested delegation scope is a subset of the delegator’s authorized scope (scope narrowing), and log the delegation with full context for human review.
A2A security cards must carry machine-readable autonomy scope restrictions that the receiving agent’s boundary enforcement layer can consume and enforce. Security cards without valid scope restrictions should be rejected by the receiving agent’s enforcement infrastructure.
All A2A messages must be logged to a human-accessible audit trail. The audit trail must capture the full delegation chain — which agent delegated to which, with what scope, and what actions resulted — enabling human reviewers to reconstruct the complete decision chain for any action taken by the multi-agent system.
14. Escalation and Dynamic Autonomy
14.1 Escalation Framework
Escalation occurs when an agent encounters situations outside its authorized autonomy. The framework defines four escalation types with corresponding responses, ranging from soft escalation for boundary approach to critical escalation for safety concerns.
| Escalation Type | Trigger | Response |
|---|---|---|
| Soft | Approaching boundary | Log + continue with caution |
| Medium | Boundary reached | Pause + request approval |
| Hard | Boundary violated (attempt) | Block + alert |
| Critical | Safety concern | Immediate shutdown + alert |
14.2 Escalation Flow
The following diagram illustrates the escalation flow from action initiation through resolution. The flow ensures that every agent action is evaluated against its boundary specification, with responses calibrated to the severity of the boundary condition.
+----------------------------------------------------------------+
| ESCALATION FLOW |
| |
| +---------------+ |
| | Agent Action | |
| +------+--------+ |
| | |
| v |
| +---------------+ |
| | Boundary | |
| | Check | |
| +------+--------+ |
| | |
| +----+----+-----------+-----------+ |
| | | | | |
| v v v v |
| [Clear] [Approaching] [At Limit] [Violation] |
| | | | | |
| v v v v |
| Execute Log + Pause + Block + |
| Continue Escalate Alert |
| | | | |
| | v | |
| | +-------------+ | |
| | | Human | | |
| | | Decision | | |
| | +------+------+ | |
| | +-----+-----+ | |
| | | | | |
| | [Approve] [Deny] | |
| | | | | |
| | v v | |
| | Execute Queue | |
| | | |
| +-----------------------+ |
+----------------------------------------------------------------+
14.3 Dynamic Autonomy Adjustment
In some cases, autonomy levels may be dynamically adjusted based on observed conditions and context. Dynamic adjustment provides the flexibility to respond to changing conditions while maintaining governance discipline through authorization requirements.
| Condition | Autonomy Adjustment |
|---|---|
| Anomaly detected | Reduce by 1-2 levels |
| Trust established over time | Increase consideration |
| High-risk period | Temporary reduction |
| Incident recovery | Reduced until review |
| User context (sensitivity) | Adjust per interaction |
| Elevated threat tempo (v2.0) | Increase defensive autonomy per Section 9 |
Dynamic increases in autonomy require documented justification and appropriate authorization. Dynamic decreases can be automatic based on predefined triggers, enabling rapid response to emerging risks without waiting for human authorization to reduce autonomy.
14.4 Escalation Response Times
Response time targets ensure escalations receive appropriate attention based on severity. These targets should be incorporated into operational SLAs and monitored for compliance.
| Escalation Level | Response Target | Responder |
|---|---|---|
| Soft | Logged for review | Automated |
| Medium | < 15 minutes | On-call operator |
| Hard | < 5 minutes | On-call operator + manager |
| Critical | Immediate | Automated + all responders |
15. Assessment and Certification
15.1 Autonomy Assessment
Organizations should regularly assess their agent autonomy configurations to ensure appropriate controls remain in place and effective. Assessment should cover authorization (whether autonomy level is formally authorized), justification (whether business case is documented), controls (whether required controls are implemented), boundaries (whether boundaries are technically enforced), monitoring (whether appropriate monitoring is in place), escalation (whether escalation process is functional), and review (whether review cadence is maintained). As of v2.0, assessment should also cover autonomy escalation prevention (whether the agent cannot modify its own level), supply chain controls (whether autonomy-expanding operations are governed), and multi-agent governance (whether composite autonomy level is assessed and authorized).
15.2 Assessment Checklist by Level
Organizations should use level-appropriate checklists to verify readiness for autonomy deployment. These checklists provide a structured assessment that can be used for both initial deployment authorization and ongoing compliance verification.
The Level 2 Checklist includes verification that the use case is documented and approved, plan approval workflow is implemented, execution monitoring is operational, pause/cancel capability is tested, checkpoint rollback is tested, logging is comprehensive and accessible, quarterly review is scheduled, and supply chain trust delegation controls are in place for tool installation capabilities (v2.0).
The Level 3 Checklist includes verification that a formal autonomy request is approved, boundaries are documented in machine-readable format, technical boundary enforcement is verified, escalation workflow is tested, decision logging is comprehensive, anomaly detection is operational, kill switch is tested, monthly review is scheduled, autonomy escalation prevention controls pass verification tests (v2.0), supply chain trust delegation enforces allowlist-only installation (v2.0), and multi-agent composite autonomy is assessed if applicable (v2.0).
The Level 4 Checklist includes verification that executive authorization is documented, risk acceptance is signed, 24/7 monitoring capability exists, automated anomaly detection is operational, kill switch response time under one minute is tested, disaster recovery is tested, weekly review is conducted, board reporting is in place, all Level 3 v2.0 controls are verified, and adversarial autonomy asymmetry assessment has been conducted against the organization’s threat model (v2.0).
15.3 Certification Pathway
Organizations can pursue certification for their autonomy governance at three levels. Basic certification involves self-assessment using this framework. Standard certification requires third-party assessment of controls. Advanced certification requires continuous compliance monitoring plus periodic audit. As of v2.0, all certification levels should include assessment of the new control sub-categories: supply chain trust delegation, autonomy escalation prevention, and multi-agent governance where applicable.
16. Conclusions and Recommendations
16.1 Key Conclusions
This framework establishes several foundational principles for AI autonomy governance, each supported by the operational evidence from January through March 2026 [1][2].
Autonomy must be deliberate. Organizations should default to lower autonomy levels and increase only with explicit justification and appropriate controls. The original framework’s “autonomy by default” warning was supported by the incident evidence — in the incidents analyzed, the root cause was consistently autonomy that was never deliberately authorized [1].
Technical enforcement is essential. Policy alone is insufficient for autonomy governance. Boundaries must be technically enforced to prevent violations, not merely detect them after the fact. The Agents of Chaos study demonstrated that current agent architectures cannot reliably self-enforce boundaries — external enforcement at the architectural level is not an implementation detail but the foundation upon which all Level 3+ governance depends [2][23].
Human oversight scales with risk. Higher autonomy requires proportionally stronger oversight mechanisms. The investment in controls should match the risk created by autonomous operation. The Capability-Control Matrix should be treated as a hard constraint: incidents involving Critical-risk capabilities deployed above the matrix’s recommended autonomy level consistently resulted in significant harm [2].
Governance enables trust. Formal governance processes provide confidence in autonomous operations. Without governance, stakeholders cannot trust that autonomy is being granted appropriately, and the absence of trust leads to either excessive restriction (limiting value) or excessive permissiveness (creating risk).
Dynamic adjustment provides flexibility. The ability to adjust autonomy based on context improves safety while preserving efficiency. Static autonomy grants cannot respond to changing conditions. The adversarial autonomy asymmetry problem makes dynamic adjustment not just flexible but necessary — defenders must be able to increase their defensive autonomy level in response to elevated threat tempo [1].
Autonomy boundaries must resist escalation. An agent must not be able to modify its own autonomy level, disable its own oversight controls, or expand its own scope through its tool set. This architectural requirement, new in v2.0, addresses the most urgent gap identified by the operational evidence [1].
16.2 Recommendations
For Organizations Deploying Agentic AI
Organizations beginning their agentic AI journey should adopt this framework as the basis for autonomy governance, providing a structured approach to a complex challenge. Most enterprise use cases should default to Level 1 or Level 2 autonomy, which balance productivity with appropriate human oversight. Organizations should implement technical controls before granting autonomy rather than relying on policy alone, and governance structures should be established before Level 3 or higher deployments are attempted. Investment in monitoring capabilities should be proportional to the autonomy levels being deployed. Supply chain trust delegation controls should be implemented before any agent is granted the ability to install tools or connect to external services. Autonomy escalation prevention should be verified through testing for all Level 3+ deployments.
For AI Providers
Providers building agentic AI systems should design for controllability with built-in autonomy controls that enable enterprise governance. Transparency into agent actions and decisions supports organizational oversight requirements. Platform capabilities should support boundary enforcement to enable technical control, and graceful degradation when boundaries are reached prevents hard failures and maintains system utility. Agent platforms should enforce architectural separation between the agent runtime and the autonomy enforcement infrastructure, making it technically impossible for the agent to modify its own level or disable its own controls. MCP and A2A protocol implementations should include mandatory authentication, authorization, and scope enforcement at the protocol layer.
For the Industry
The broader AI industry should work to standardize autonomy definitions for consistent communication across organizations and vendors. Certification programs for autonomy governance would provide assurance and incentivize best practices. Sharing best practices for autonomy control implementation accelerates maturity across the ecosystem. Research into dynamic autonomy adjustment mechanisms will enable more sophisticated approaches. Protocol specifications — MCP, A2A, and successors — should mandate authentication and authorization as non-optional requirements, not implementation-level concerns delegated to individual server or agent developers.
16.3 Future Considerations
As agentic AI evolves, several developments will require framework extension. Multi-agent autonomy coordination standards are urgently needed, as the current ad hoc approach to inter-agent delegation has already produced exploitable vulnerabilities. International standards for AI autonomy may emerge as regulators recognize the need for consistent governance. Regulatory requirements may mandate specific autonomy controls, particularly for high-risk applications. Autonomy assessment tools and automation will mature, enabling more efficient governance at scale. The adversarial autonomy asymmetry problem will intensify as offensive AI capabilities mature, requiring continuous reassessment of defensive autonomy requirements. Training-time autonomy controls will require further development as RL-trained agents become more prevalent and capable.
17. References
[1] Cloud Security Alliance AI Safety Initiative, “Autonomy Levels Framework: Post-Incident Update Assessment,” Version 1.0, March 2026.
[2] Cloud Security Alliance AI Safety Initiative, “The Cost of Unchecked Autonomy: 10 Incidents That Demonstrate Why AI Agent Governance Cannot Wait,” Version 1.0, March 2026.
[3] Cloud Security Alliance and Google Cloud. (2025). “State of AI Security and Governance.” https://cloudsecurityalliance.org/artifacts/the-state-of-ai-security-and-governance
[4] Cloud Security Alliance. (2025). “MAESTRO: Agentic AI Threat Modeling.” https://cloudsecurityalliance.org
[5] National Institute of Standards and Technology. (2023). “AI Risk Management Framework (AI RMF 1.0).” NIST AI 100-1. https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence
[6] SAE International. (2021, April). “J3016_202104: Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles.”
[7] European Union. (2024). “Regulation (EU) 2024/1689 — Artificial Intelligence Act.”
[8] Cloud Security Alliance. (2026). “AI Security Maturity Model.”
[9] Anthropic. (2025). “Constitutional AI and Control Mechanisms.”
[10] OpenAI. (2023). “Practices for Governing Agentic AI Systems.”
[11] Cloud Security Alliance. (2025). “AI Controls Matrix (AICM) v1.0.3.” https://cloudsecurityalliance.org
[12] Cloud Security Alliance. (2025). “Capabilities-Based Risk Assessment (CBRA) for AI Systems.”
[13] IEEE. (2022). “IEEE 2801-2022: Recommended Practice for the Quality Management of Datasets for Medical Artificial Intelligence.”
[14] Partnership on AI. (2024). “Guidelines for AI System Autonomy.”
[15] Lakshmanan, R., “Researchers Find 341 Malicious ClawHub Skills Stealing Data from OpenClaw Users,” The Hacker News, February 2026.
[16] Socket Security, “ClawHub Registry Analysis: 20% of OpenClaw Skills Contain Malicious Payloads,” Socket.dev, February 2026.
[17] Khan, A., “Clinejection: From Issue Title to Supply Chain Compromise,” adnanthekhan.com, February 9, 2026.
[18] National Vulnerability Database, “CVE-2026-25253,” NIST, 2026. https://nvd.nist.gov/vuln/detail/CVE-2026-25253
[19] SOCRadar, “CVE-2026-25253: 1-Click RCE in OpenClaw via Auth Token Exfiltration,” SOCRadar Blog, February 2026.
[20] Alibaba Cloud Security, “ROME Agent Emergent Resource Acquisition: Post-Incident Analysis,” Alibaba Cloud Blog, March 2026.
[21] Amazon Threat Intelligence, “AI-Orchestrated FortiGate Campaign: ARXON and CHECKER2 Infrastructure Analysis,” AWS Security Blog, February 2026.
[22] Cloud Security Alliance AI Safety Initiative, “AI-Assisted Mass Network Infrastructure Exploitation: The 600+ FortiGate Campaign,” CSA Research Note, March 8, 2026.
[23] Shapira, N., et al., “Agents of Chaos: Evaluating AI Agent Safety in Real-World Environments,” arXiv:2602.20021, February 2026.
[24] Unit 42, Palo Alto Networks, “A2A Protocol Security: Agent Session Smuggling and Trust Exploitation,” Unit 42 Blog, March 2026.
[25] Oligo Security, “Critical RCE Vulnerability in Anthropic MCP Inspector (CVE-2025-49596),” Oligo Security Blog, 2025.
[26] Wiz Research, “MCP Security Research Briefing,” Wiz Blog, 2026.
[27] Unit 42, Palo Alto Networks, “Browser Agent Hijacking: Calendar-Based Prompt Injection Against Perplexity Comet,” Unit 42 Blog, March 2026.
[28] Oasis Security, “ClawJacked: OpenClaw Vulnerability Enables Full Agent Takeover,” Oasis Security Blog, February 25, 2026.
[29] Google, “Agent-to-Agent (A2A) Protocol Specification,” 2026. https://a2a-protocol.org
[30] Cloud Security Alliance, “Agentic AI Identity Architecture,” CSA, 2026.
[31] Cloud Security Alliance, “Zero Trust Architecture Guidance,” CSA, 2024.
Appendix A: Autonomy Decision Tree
The following decision tree guides organizations in selecting appropriate autonomy levels for specific use cases. Version 2.0 incorporates consequence reversibility assessment and supply chain considerations.
START: Does the AI need to execute actions?
|
+-- No -> Level 0 (Information Only)
|
+-- Yes -> Will the agent install tools, connect MCP servers, or delegate
| to other agents? (v2.0)
| |
| +-- Yes -> Are supply chain trust delegation controls in place?
| | |
| | +-- No -> Implement controls before proceeding
| | |
| | +-- Yes -> Continue assessment below
| |
| +-- No -> Continue assessment below
|
+-- Can the CONSEQUENCES of actions be reversed? (v2.0: consequence
reversibility, not just action reversibility)
|
+-- No (Irreversible consequences) -> Is human approval for each
| action feasible?
| |
| +-- Yes -> Level 1 (Per-Action Approval)
| |
| +-- No -> Reconsider use case or accept
| higher risk with executive approval
|
+-- Yes (Reversible consequences) -> Is real-time human approval
feasible?
|
+-- Yes -> What scope of approval is needed?
| |
| +-- Per action -> Level 1
| |
| +-- Per plan -> Level 2
|
+-- No -> Can boundaries be precisely defined?
|
+-- Yes -> Can boundaries be
| technically enforced
| AND architecturally
| separated from agent
| runtime? (v2.0)
| |
| +-- Yes -> Level 3
| | (with robust controls)
| |
| +-- No -> Level 2
| (with monitoring)
|
+-- No -> Is 24/7 monitoring feasible?
|
+-- Yes -> Level 4
| (exceptional
| cases only)
|
+-- No -> Level 2 with
enhanced oversight
Appendix B: Glossary
Agentic AI refers to AI systems capable of autonomous decision-making and action execution, as distinguished from AI systems that only provide information or recommendations.
Action Reversibility (v2.0) describes whether a technical action can be undone — whether the system state can be returned to its prior condition.
Autonomy Escalation (v2.0) describes the process by which an agent operating at a nominally lower autonomy level effectively promotes itself (or is promoted by an attacker) to a higher level through exploitation, emergent behavior, or configuration manipulation.
Autonomy Level is a classification of AI independence from human oversight, ranging from Level 0 (no autonomy) to Level 5 (full autonomy) in this framework.
Autonomy-Expanding Operation (v2.0) is any action that increases the agent’s capability set or trust relationships, including tool installation, MCP server connection, model updates, and agent-to-agent delegation.
Boundary describes the defined limits on agent actions and decisions that constrain what an agent can do without escalation.
Composite Autonomy Level (v2.0) is the effective autonomy of a multi-agent system, assessed as at least the maximum of its constituent agents’ levels and potentially higher due to emergent coordination.
Consequence Reversibility (v2.0) describes whether the downstream effects of an action can be meaningfully mitigated once they have occurred, regardless of whether the action itself can be technically undone.
Delegation Scope Narrowing (v2.0) is the principle that each delegation hop in a multi-agent system should reduce, not maintain or expand, the available autonomy.
Escalation is the process of elevating decisions to humans when an agent encounters situations at or beyond its authorized boundaries.
Kill Switch refers to a mechanism for immediate agent termination, enabling rapid response when autonomous operation must be stopped.
Scope describes the breadth of actions an agent is authorized to perform within its defined boundaries.
Supply Chain Trust Delegation (v2.0) describes the implicit trust transfer that occurs when an agent installs software, connects to services, or delegates to other agents from its supply chain.
Appendix C: Integration with CBRA (Capabilities-Based Risk Assessment)
C.1 Overview
The Capabilities-Based Risk Assessment (CBRA) framework developed by CSA provides a structured approach to evaluating AI system risk based on system capabilities rather than use cases alone [12]. This appendix demonstrates how the Autonomy Level Taxonomy integrates with CBRA to provide comprehensive risk-informed autonomy decisions.
C.2 CBRA Risk Formula and Autonomy
The CBRA defines system risk using the formula:
Systems Risk = Criticality x Autonomy x Permission x Impact
This formula directly incorporates autonomy as a key risk multiplier, validating the central premise of this framework: that higher autonomy creates proportionally higher risk requiring stronger controls. The multiplicative relationship means that autonomy amplifies risk from other factors rather than adding to them linearly.
| CBRA Factor | Autonomy Level Consideration |
|---|---|
| Criticality | Higher criticality systems should operate at lower autonomy levels |
| Autonomy | Directly maps to L0-L5 taxonomy with corresponding risk multipliers |
| Permission | Correlates with capability scope and boundary definitions |
| Impact | Informs maximum recommended autonomy and control requirements |
C.3 CBRA Risk Levels Mapped to Autonomy
CBRA defines three risk levels that correspond to appropriate autonomy governance. Organizations can use these mappings to derive autonomy level recommendations directly from CBRA assessments.
| CBRA Risk Level | Recommended Autonomy Levels | Control Requirements |
|---|---|---|
| Low Risk (Score <= 3) | L0-L3 appropriate | Standard controls per level |
| Medium Risk (Score 4-6) | L0-L2 recommended; L3 with enhanced controls | Additional monitoring, shorter review cycles |
| High Risk (Score >= 7) | L0-L1 required; L2 only with exceptional justification | Maximum controls, executive authorization |
C.4 Practical Integration Example
Consider a customer service AI agent with access to customer records and ability to process refunds. A CBRA assessment would rate this system with Criticality of 2 (customer-facing but not life-critical), variable Autonomy (to be determined), Permission of 3 (access to sensitive customer data and financial actions), and Impact of 2 (financial impact limited by transaction caps).
| If Autonomy Level | CBRA Score | Risk Level | Recommendation |
|---|---|---|---|
| L1 (Per-action approval) | 2x1x3x2 = 12 | Medium | Appropriate with standard controls |
| L2 (Plan approval) | 2x2x3x2 = 24 | High | Requires enhanced controls |
| L3 (Bounded autonomous) | 2x3x3x2 = 36 | High | Requires executive approval and maximum controls |
The recommended implementation would be Level 2 with transaction caps ($100 per refund, $500 daily limit), requiring plan approval for each customer interaction session while enabling autonomous execution of approved actions.
C.5 Using CBRA to Inform Boundary Definitions
CBRA’s capability-based approach directly informs boundary specifications for Level 3 deployments, connecting risk assessment to operational constraints through machine-readable definitions.
# CBRA-Informed Boundary Definition
boundary:
id: "cbra-customer-service-001"
cbra_assessment:
criticality: 2
base_autonomy: 2
permission: 3
impact: 2
risk_score: 24
risk_level: "high"
autonomy_level: 2 # Constrained by CBRA risk assessment
# Boundaries derived from CBRA factors
capability_constraints:
# Permission factor informs data access boundaries
data_access:
allowed: ["customer_name", "order_history", "refund_status"]
prohibited: ["payment_details", "full_address", "SSN"]
# Impact factor informs action boundaries
actions:
refund:
max_amount: 100.00
daily_limit: 500.00
requires_escalation_above: 50.00
Appendix D: Integration with AI Controls Matrix (AICM)
D.1 Overview
The AI Controls Matrix (AICM) v1.0.3 provides 240+ controls across 18 security domains specifically designed for AI systems [11]. As a superset of the Cloud Controls Matrix (CCM), AICM encompasses all CCM controls while adding AI-specific control domains that are directly relevant to autonomy governance. This appendix maps autonomy levels to relevant AICM controls, enabling organizations to identify which controls are essential for each level of autonomous operation. All references to control frameworks in this appendix use AICM as the authoritative source; organizations previously mapping to CCM controls should update their mappings to the corresponding AICM controls.
D.2 AICM Domains and Autonomy Relevance
The AICM organizes controls into domains with varying relevance to autonomy governance. The following table identifies the domains most relevant to autonomy and indicates which autonomy levels they primarily support.
| AICM Domain | Autonomy Relevance | Primary Levels |
|---|---|---|
| Audit & Assurance (A&A) | High – Decision logging and audit trails | L2-L5 |
| Application Security (AIS) | High – Action validation and sandboxing | L1-L5 |
| Data Security (DSI) | High – Data access boundaries | L1-L5 |
| Governance Risk & Compliance (GRC) | High – Authorization and oversight | L2-L5 |
| Human Resources (HRS) | Medium – Training for oversight roles | L3-L5 |
| Identity & Access Management (IAM) | Critical – Agent identity and permissions | L1-L5 |
| Incident Management (INC) | High – Escalation and response | L2-L5 |
| Interoperability & Portability (IPY) | Medium – Multi-agent scenarios | L3-L5 |
| Model Development (DEV) | Medium (v2.0) – Training-time controls | L0-L5 |
| Operations (OPS) | Critical – Monitoring and kill switch | L2-L5 |
| Security Operations (SEO) | High – Anomaly detection | L3-L5 |
| Supply Chain Management (SCM) (v2.0) | Critical – Tool and skill provenance | L2-L5 |
| Threat Management (TVM) | High – Attack surface management | L2-L5 |
D.3 Key AICM Controls by Autonomy Level
The following tables identify the essential AICM controls at each autonomy level, with specific guidance on how each control applies to autonomy governance.
Level 1 (Assisted) – Essential AICM Controls
| Control ID | Control Title | Application to L1 |
|---|---|---|
| IAM-01 | Identity and Access Management Policy | Define agent identity and human approver roles |
| AIS-02 | Application Security Testing | Test approval gate integrity |
| DSI-01 | Data Security Policy | Define data access for proposed actions |
| A&A-01 | Audit and Assurance Policy | Log all proposed and approved actions |
Level 2 (Supervised) – Essential AICM Controls
| Control ID | Control Title | Application to L2 |
|---|---|---|
| GRC-04 | Risk Management Program | Assess plan-level risks before approval |
| OPS-03 | Operations Monitoring | Monitor plan execution progress |
| INC-01 | Incident Management Policy | Define pause/cancel procedures |
| A&A-03 | Audit Logging | Checkpoint logging for rollback |
| SCM-01 | Supply Chain Risk Management (v2.0) | Govern tool/skill installation within plans |
Level 3 (Conditional) – Essential AICM Controls
| Control ID | Control Title | Application to L3 |
|---|---|---|
| IAM-05 | Access Control Mechanisms | Technical boundary enforcement; autonomy escalation prevention (v2.0) |
| SEO-02 | Security Monitoring | Real-time boundary monitoring |
| INC-03 | Incident Response | Escalation procedures |
| GRC-06 | Compliance Management | Boundary compliance verification |
| OPS-05 | Change Management | Boundary modification procedures |
| SCM-03 | Supply Chain Integrity (v2.0) | Cryptographic provenance verification for installed tools |
| IPY-02 | Interoperability Standards (v2.0) | Multi-agent delegation scope governance |
Level 4 (High Autonomy) – Essential AICM Controls
| Control ID | Control Title | Application to L4 |
|---|---|---|
| SEO-01 | Security Operations Policy | 24/7 SOC requirements |
| TVM-02 | Threat Intelligence | Proactive threat monitoring; adversarial autonomy assessment (v2.0) |
| INC-05 | Incident Communication | Executive alerting procedures |
| OPS-07 | Business Continuity | Kill switch and recovery |
| GRC-08 | Board Reporting | Autonomous operations reporting |
D.4 AICM Control Implementation Matrix
The following matrix shows which AICM control categories require implementation at each autonomy level, providing a comprehensive view of the control landscape across the autonomy spectrum.
| AICM Domain | L0 | L1 | L2 | L3 | L4 | L5 |
|---|---|---|---|---|---|---|
| A&A (Audit & Assurance) | Recommended | Required | Required | Required | Required | Required |
| AIS (Application Security) | Recommended | Required | Required | Required | Required | Required |
| DSI (Data Security) | Recommended | Required | Required | Required | Required | Required |
| GRC (Governance & Compliance) | Recommended | Recommended | Required | Required | Required | Required |
| IAM (Identity & Access) | Recommended | Required | Required | Required | Required | Required |
| INC (Incident Management) | – | Recommended | Required | Required | Required | Required |
| OPS (Operations) | Recommended | Recommended | Required | Required | Required | Required |
| SCM (Supply Chain) (v2.0) | – | Recommended | Required | Required | Required | Required |
| SEO (Security Operations) | – | – | Recommended | Required | Required | Required |
| TVM (Threat Management) | – | – | Recommended | Required | Required | Required |
| DEV (Model Development) (v2.0) | Recommended | Recommended | Recommended | Required | Required | Required |
D.5 Using AICM for Autonomy Assessment
Organizations can use the AICM as an assessment checklist for autonomy readiness through two processes. For pre-deployment assessment, organizations should identify the target autonomy level, review required AICM controls for that level, assess current implementation status of each control, document gaps and remediation plan, implement controls before granting autonomy, and verify control effectiveness through testing. For ongoing compliance, organizations should map autonomy-related incidents to AICM controls, assess whether control failures contributed to incidents, update control implementation based on lessons learned, and include AICM compliance in regular autonomy reviews.
Appendix E: Integration with MAESTRO Threat Modeling Framework
E.1 Overview
The MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome) framework provides a layer-based approach to threat modeling for agentic AI systems [4]. This appendix demonstrates how autonomy levels intersect with MAESTRO’s seven layers, enabling threat-informed autonomy decisions. The integration of these two frameworks provides organizations with a comprehensive approach that connects autonomy governance to threat analysis.
E.2 MAESTRO Layers and Autonomy Considerations
MAESTRO defines seven layers, each with distinct threats that vary in severity based on autonomy level. The following table maps each layer to its autonomy implications, helping organizations understand how autonomy level affects the threat landscape at each architectural layer.
| MAESTRO Layer | Description | Autonomy Impact |
|---|---|---|
| Layer 7: Agent Ecosystem | Market/application interface | Higher autonomy = broader attack surface |
| Layer 6: Security & Compliance | Cross-cutting security | Vertical layer affecting all autonomy levels |
| Layer 5: Evaluation & Observability | Monitoring and detection | Critical for L3+ autonomy |
| Layer 4: Deployment & Infrastructure | Runtime environment | Foundation for boundary enforcement |
| Layer 3: Agent Frameworks | Development toolkits | Determines available control mechanisms |
| Layer 2: Data Operations | Data processing and storage | Data access boundaries |
| Layer 1: Foundation Models | Core AI capabilities | Base capabilities to be governed |
E.3 Threat Severity by Autonomy Level
MAESTRO threats have varying severity depending on autonomy level, with higher autonomy amplifying potential impact across all threat categories. The following matrix enables organizations to prioritize threat mitigation based on their deployed autonomy levels.
| MAESTRO Threat | L0-L1 Severity | L2-L3 Severity | L4-L5 Severity |
|---|---|---|---|
| Goal Manipulation | Low (human catches) | Medium (plan-level) | Critical (autonomous pursuit) |
| Prompt Injection | Medium (output only) | High (action execution) | Critical (autonomous actions) |
| Data Poisoning | Low (no action) | Medium (corrupted plans) | Critical (autonomous learning) |
| Communication Channel Attack | N/A (no agents) | Medium (multi-agent plans) | Critical (agent coordination) |
| Identity Attack | Low (human-gated) | Medium (plan approval) | Critical (autonomous agents) |
| Denial of Service | Low (human backup) | Medium (workflow impact) | High (operational impact) |
| Supply Chain Compromise (v2.0) | Low (manual install) | High (plan-scoped install) | Critical (autonomous install) |
E.4 MAESTRO Architecture Patterns Mapped to Autonomy Levels
MAESTRO identifies eight agentic architecture patterns, each with a natural alignment to appropriate autonomy levels. These mappings help organizations select the right autonomy governance approach based on their system architecture.
| Architecture Pattern | Description | Recommended Autonomy | Key Threats |
|---|---|---|---|
| Single-Agent | One agent pursuing a goal | L1-L3 | Goal manipulation |
| Multi-Agent | Agents communicating | L2-L3 | Communication attacks, identity attacks, autonomy propagation (v2.0) |
| Unconstrained Conversational | Broad input processing | L1-L2 only | Prompt injection, jailbreaking |
| Task-Oriented | Specific API tasks | L2-L3 | DoS, API abuse |
| Hierarchical | Controller/subordinate | L2-L3 | Controller compromise, delegation scope violation (v2.0) |
| Distributed Ecosystem | Decentralized agents | L3 (with caution) | Sybil attacks, composite autonomy escalation (v2.0) |
| Human-in-Loop | Iterative collaboration | L1-L2 | Feedback manipulation |
| Self-Learning | Autonomous improvement | L1 only | Data poisoning, backdoors, emergent behavior (v2.0) |
E.5 Autonomy-Aware Threat Modeling Process
Organizations should incorporate autonomy considerations into MAESTRO-based threat modeling through a four-step process.
In Step 1, organizations identify the applicable architecture pattern and note pattern-specific threats, including v2.0 threats such as supply chain compromise, autonomy propagation, and delegation scope violation. In Step 2, organizations assess the current and target autonomy level, documenting the current level, the planned target level if a change is contemplated, and the composite autonomy level for multi-agent patterns (v2.0). In Step 3, organizations conduct a layer-by-layer analysis with autonomy context, assessing for each MAESTRO layer how the autonomy level affects threat likelihood, how it affects threat impact, and what autonomy-specific mitigations are required. In Step 4, organizations map identified threats to autonomy level controls from this framework, AICM technical controls from Appendix D, and CBRA risk assessment from Appendix C.
E.6 Cross-Layer Attack Scenarios by Autonomy Level
MAESTRO emphasizes that attacks can propagate across layers, and autonomy level significantly affects propagation risk. At low autonomy (L0-L1), attack propagation is limited by human checkpoints at each action, providing natural segmentation that prevents cross-layer escalation.
Attack Vector -> Human Review -> Blocked/Detected
Limited propagation due to human checkpoint at each action.
At medium autonomy (L2-L3), attack propagation can span multiple layers within the approved plan scope, requiring boundary enforcement, checkpoint validation, and monitoring to contain.
Layer 4 (Infrastructure) Compromise
|
Layer 2 (Data Operations) - Inject malicious data
|
Layer 1 (Foundation Model) - Model corruption on update
|
Layer 7 (Ecosystem) - Compromised actions affect users
Mitigation: Boundary enforcement, checkpoint validation, monitoring
At high autonomy (L4-L5), any layer compromise can rapidly propagate without human intervention, requiring maximum controls at all layers, 24/7 monitoring for detection, and kill switch capability for response.
Any layer compromise can rapidly propagate without human intervention.
Maximum controls required at all layers.
24/7 monitoring essential for detection.
Kill switch critical for response.
E.7 MAESTRO-Informed Autonomy Boundaries
MAESTRO threats inform boundary definitions for Level 3 deployments. The following example demonstrates how threat mitigations from each MAESTRO layer can be translated into machine-readable boundary specifications.
# MAESTRO-Informed Boundary Definition
boundary:
id: "maestro-informed-001"
threat_mitigations:
# Layer 7: Agent Ecosystem
ecosystem:
user_interaction_limits:
max_users_per_hour: 100
suspicious_pattern_threshold: 3
# v2.0: Supply chain controls
skill_installation:
mode: "allowlist_only"
verification: "cryptographic"
# Layer 5: Observability
monitoring:
required: true
anomaly_detection: true
alert_threshold: "medium"
# v2.0: Inter-agent monitoring
inter_agent_logging: true
# Layer 2: Data Operations
data:
read_only_sources: ["public_docs", "faq_database"]
no_access: ["customer_pii", "financial_records"]
# Cross-layer
propagation_controls:
checkpoint_frequency: "per_action"
state_validation: true
rollback_enabled: true
# v2.0: Autonomy escalation prevention
level_immutability: true
self_modification_blocked: true
escalation:
triggers:
- "anomaly_detected"
- "pattern_suspicious"
- "cross_layer_activity"
- "autonomy_escalation_attempt"
Appendix F: Incident Case Studies Mapping (v2.0)
F.1 Overview
This appendix maps ten security events from the period January 29 through March 18, 2026 to the framework’s autonomy levels, failed dimensions, and control gaps. Eight are confirmed production incidents, while two (Agents of Chaos and A2A session smuggling) are research demonstrations that revealed exploitable vulnerabilities in controlled environments [23][24]. Together, they serve both as validation of the framework’s analytical utility and as a practitioner reference for understanding how autonomy governance failures manifest in practice. Full details are available in the companion paper The Cost of Unchecked Autonomy [2].
F.2 Incident Summary Table
| # | Incident | Operating Autonomy | Failed Dimensions | Framework Controls That Would Have Prevented/Mitigated |
|---|---|---|---|---|
| 1 | ClawHavoc Supply Chain Poisoning | L3-4 | Decision, Scope, Impact, Reversibility | Supply chain trust delegation (v2.0); boundary enforcement for skill installation; per-action approval for code execution |
| 2 | Clinejection npm Supply Chain | L4 | Decision, Scope, Temporal, Impact | Least capability restriction; input sanitization architecture; anomaly detection; kill switch |
| 3 | CVE-2026-25253 OpenClaw RCE | L5 (post-exploit) | Decision, Scope, Reversibility | Autonomy escalation prevention (v2.0); architectural separation of approval config; origin validation |
| 4 | McKinsey Lilli Breach | L5 (attacker) | Scope, Impact, Temporal | Authorization boundaries between components; anomaly detection; per-action approval for system config |
| 5 | Alibaba ROME Emergent Mining | L4 escalated to L5 | Scope, Decision, Temporal, Impact | Training-time controls (v2.0); autonomy escalation prevention (v2.0); infrastructure scope constraints |
| 6 | PerplexedBrowser Zero-Click Hijack | L4 | Decision, Scope, Reversibility, Temporal | Content sanitization; per-action approval for credential access; ephemeral task-scoped sessions |
| 7 | MCP Vulnerability Surge | L3-4 | Decision, Scope, Boundary | MCP protocol-level authentication (v2.0); authorization and input validation at gateway |
| 8 | CyberStrikeAI FortiGate Campaign | L4 (offensive) | Defensive asymmetry | Adversarial autonomy asymmetry (v2.0); defensive AI parity; attack surface reduction |
| 9 | Agents of Chaos Safety Failures* | L3-4 (nominal), L5 (actual) | All five dimensions | External boundary enforcement architecture; verified identity context; independent kill switch |
| 10 | A2A Session Smuggling* | L3-4 | Decision, Scope, Oversight | Multi-agent governance (v2.0); transparent inter-agent communication; delegation scope narrowing (v2.0) |
* Research demonstration, not production incident. Included for the vulnerability patterns they reveal.
F.3 Cross-Cutting Patterns
The incident evidence reveals five cross-cutting patterns that informed the v2.0 revisions.
The first pattern is autonomy without proportional controls. In the incidents analyzed, affected systems operated at an autonomy level for which the required controls were absent. The governance deficit is structural — organizations are deploying at Level 3-4 while governing at Level 0-1 [2].
The second pattern is capability exceeding need. In seven of ten incidents, the agent possessed capabilities far exceeding its intended function. The Cline triage bot had shell execution when it needed label management. The Comet browser agent had standing password vault access when it needed task-scoped web browsing. Violation of the Capability-Control Matrix was a consistent predictor of incident occurrence [2].
The third pattern is input trust failure. Five incidents involved agents processing untrusted input as trusted instructions. The boundary enforcement architecture — an independent validation layer between input and action — addresses this pattern, but the architectural separation requirement (v2.0) is essential because boundary enforcement implemented within the agent’s own execution context can be bypassed [2].
The fourth pattern is architectural separation failures. Three incidents demonstrated that oversight controls implemented within the agent’s execution context can be disabled by attackers or emergent behavior. The autonomy escalation prevention requirements in Section 7 address this directly [2].
The fifth pattern is temporal drift and training-time risk. Two incidents demonstrated that autonomy risks manifest during training (Alibaba ROME) and accumulate over extended operational periods (Agents of Chaos). The training-time controls in Section 11 and the framework’s temporal dimension both address this pattern [2].
F.4 v2.0 Control Coverage Assessment
The following table assesses which v2.0 additions would have addressed each incident, demonstrating that the framework extensions close the gaps identified by the operational evidence.
| v2.0 Addition | Incidents Addressed | Coverage |
|---|---|---|
| Supply chain trust delegation (Section 6) | 1, 2, 7 | Prevents autonomous installation of unverified tools and skills |
| Autonomy escalation prevention (Section 7) | 3, 5, 9 | Prevents agents or attackers from promoting agent autonomy level |
| Multi-agent governance (Section 8) | 9, 10 | Governs composite autonomy and delegation chains |
| Adversarial autonomy asymmetry (Section 9) | 4, 8 | Establishes minimum defensive autonomy requirements |
| Prompt injection framing (Section 10) | 1, 2, 6, 7, 10 | Positions autonomy controls as defense-in-depth against injection |
| Training-time controls (Section 11) | 5 | Extends framework scope to training environments |
| Reversibility refinement (Section 2.1.1) | 1, 2, 3, 6 | Drives controls from consequence reversibility, not just action reversibility |
| MCP protocol guidance (Section 13.4) | 7 | Provides protocol-specific autonomy enforcement patterns |
| A2A protocol guidance (Section 13.5) | 10 | Provides delegation-specific autonomy enforcement patterns |
Document prepared by the Cloud Security Alliance AI Safety Initiative
March 2026
Version 2.0