Agentic Secure Development Lifecycle (ASDL)

White Paper | 2026-03-27 | Status: draft

Agentic Secure Development Lifecycle (ASDL)

Executive Summary

The security discipline that organizations apply to traditional software development has matured over two decades into a well-understood body of practice. Microsoft’s Security Development Lifecycle, NIST’s secure software development guidance, and the OWASP family of application security frameworks together form a baseline that most mature engineering organizations have internalized. Those practices, however, were designed for a world of deterministic programs: software that executes a defined sequence of instructions and produces predictable outputs from defined inputs. The emergence of autonomous AI agents — systems that reason over unstructured input, plan multi-step action sequences, invoke external tools, coordinate with other agents, and make decisions at runtime without human review — breaks every foundational assumption underlying traditional SDL.

The consequences of this mismatch are no longer theoretical. The CVE-2025-54795 vulnerability, a command injection flaw in Anthropic’s Claude Code that allowed attackers to bypass human approval gates and achieve remote code execution through prompt injection [1], demonstrated that the attack surface of an AI agent extends far beyond the code written by its developers. Trail of Bits researchers showed that the same vulnerability class affected multiple agent platforms simultaneously [2], illustrating how the combination of autonomous tool invocation, natural-language instruction following, and external content ingestion creates an entirely new category of exploitable behavior. Academic research published in June 2025 on Compositional Instruction Attacks documented that jailbreak success rates exceeding 95% are achievable against major LLMs when adversarial instructions are embedded within seemingly innocuous content — exactly the scenario that agentic tools encounter in normal operation [3].

This whitepaper introduces the Agentic Secure Development Lifecycle (ASDL), a five-phase framework that extends proven SDL methodology with security controls designed specifically for the risks introduced by autonomous agent systems. The ASDL addresses the full lifecycle from initial design through ongoing operations, providing teams with concrete, implementable guidance at each phase. The framework is informed by the MAESTRO threat modeling architecture developed by the Cloud Security Alliance [4], the December 2025 OWASP Top 10 for Agentic Applications [5], NIST AI Risk Management Framework guidance [6], the CSA AI Controls Matrix (AICM) [7], and operational security research from enterprise deployments through early 2026.

The ASDL does not replace traditional SDL practices. It extends them. Organizations that have invested in threat modeling, secure coding standards, and penetration testing will find that many of their existing skills and processes transfer directly — but require augmentation to address the unique behaviors that emerge when large language models acquire autonomy, tools, and the ability to act on behalf of users in complex, partially-trusted environments.


1. Introduction: Why Traditional SDL Is Insufficient for Agents

The Security Development Lifecycle as it exists today was architected around a central premise: that software behavior is a function of code. Threat modeling in the STRIDE tradition asks engineers to reason about how an attacker might exploit implementation choices, network boundaries, and data flows. Static analysis tools scan code for vulnerability patterns. Penetration testing probes running applications for inputs that produce unexpected outputs. These methods work because, in a deterministic software system, the space of possible behaviors is bounded by the logic encoded in source files. An attacker who cannot influence that code cannot fundamentally alter what the system does.

AI agents violate this premise at the design level. An agent’s runtime behavior is not solely a function of its code — it is a function of its code, its underlying language model, the contents of its context window, the outputs of every tool it invokes, and the instructions embedded in any content it retrieves from external sources. This means that an attacker who can influence what an agent reads, what tools it calls, or what data flows through its context can reshape the agent’s behavior without touching a single line of source code. This is not a theoretical vulnerability class. The EchoLeak attack demonstrated that a prompt embedded in a web page read by a Microsoft Copilot agent could silently redirect the agent to exfiltrate internal communications [8]. The Gemini Memory Attack showed that adversarial instructions persisted in agent memory stores could reshape behavior across multiple sessions after the initial injection [9]. These incidents are not implementation bugs in the conventional sense; they are the expected consequence of deploying instruction-following systems in environments where instructions can arrive from untrusted sources.

Traditional SDL also struggles with the temporal characteristics of agent behavior. Software behavior is stable between releases: once tested, a function does the same thing every time it is called. Agent behavior is stochastic and context-dependent, and it evolves as the underlying model is updated, as the agent accumulates new memories or tool registrations, and as the external systems it interacts with change. A security test that passes today may not reflect tomorrow’s behavior. Microsoft’s February 2026 SDL evolution guidance explicitly acknowledges this challenge, noting that AI accelerates development cycles beyond SDL norms and that “model updates, new tools, and evolving agent behaviors outpace traditional review processes, leaving less time for testing and observing long-term effects” [10]. Their conclusion — that mitigation requires “iterative security controls, faster feedback loops, telemetry-driven detection, and continuous learning” — aligns directly with the design principles of the ASDL.

A further structural gap is the absence of agent-specific threat taxonomies in traditional SDL tooling. STRIDE models do not have categories for goal hijacking, tool misuse, or inter-agent communication exploitation. CVSS scoring does not adequately capture the blast radius of an agent that operates with broad IAM permissions or has access to enterprise production systems. The OWASP Web Application Security Top 10 addresses SQL injection and cross-site scripting, not emergent misbehavior in multi-agent pipelines or cascading failures triggered by a single poisoned tool response. The ASDL addresses these gaps by incorporating purpose-built agentic threat frameworks — MAESTRO and the OWASP Top 10 for Agentic Applications — as foundational references throughout the lifecycle.

Finally, the authorization model of AI agents creates risks that traditional SDL has no direct equivalent for. An agent typically operates with a set of tool permissions that were granted at deployment time, often scoped to what the agent might plausibly need rather than what any specific task actually requires. When an agent is hijacked through prompt injection or goal manipulation, those permissions do not disappear — the attacker now effectively holds them. This is the identity and privilege abuse scenario classified as ASI03 in the OWASP Agentic Top 10 [5], and it means that the security of every tool an agent can invoke is an integral part of the agent’s threat model. Traditional SDL addresses authorization, but it does so at the application level, not at the level of a reasoning system whose authorization footprint may expand at runtime through sub-agent delegation.


2. ASDL Overview: Five Phases

The ASDL organizes agent security activities into five phases corresponding to the natural stages of agent development and operation. These phases are Design, Development, Testing, Deployment, and Operations. The phases are sequential in a new project, but in practice they overlap and feed back into each other: operational monitoring findings feed back into the design phase for the next iteration, and red-team testing results may require returning to development for remediation before deployment gates are cleared. The ASDL is intended to be embedded within existing engineering workflows, not to run as a separate process alongside them.

Phase 1: Design

Security for an AI agent must be designed in from the beginning, not layered on after the fact. The Design phase of the ASDL has two primary security deliverables: a MAESTRO-based threat model and a formal trust architecture specification. Neither can be produced as an afterthought; both directly constrain architectural decisions made during Development.

MAESTRO-Based Threat Modeling. The MAESTRO framework, developed by the Cloud Security Alliance, provides a seven-layer reference architecture specifically designed to support threat modeling of agentic AI systems [4]. The layers address Foundation Models (L1), Data Operations (L2), Agent Frameworks (L3), Deployment and Infrastructure (L4), Evaluation and Observability (L5), Security and Compliance (L6), and Agent Ecosystem (L7). MAESTRO’s value lies in its ability to surface both traditional threats and the novel agentic threats that arise from non-determinism, autonomy, and the collapse of conventional trust boundaries. A threat model produced without MAESTRO’s layered structure will systematically miss threats that appear only in the interaction between layers — for example, a model-level susceptibility to adversarial inputs (L1) that becomes exploitable only when the agent is given file-system tool access (L3) and deployed in a multi-tenant cloud environment (L4).

The required elements of a MAESTRO threat model for an ASDL-compliant design include: identification of all external trust boundaries across each of the seven layers; enumeration of data flows between components, including the directionality and content type of each flow; classification of threats within each layer as either traditional (arising from the component’s technology) or agentic (arising from autonomy, non-determinism, or trust boundary collapse); assignment of risk ratings using a consistent scoring methodology; and a corresponding mitigations table linking each identified threat to a specific control. The threat model must be reviewed and approved by a security architect before Development begins, and it must be updated whenever the agent’s capabilities, tool registrations, or deployment architecture change materially.

Trust Boundary and Permission Specification. Every agent must have a formally documented trust architecture that defines which principals the agent trusts, to what degree, and under what conditions. This specification must enumerate the agent’s tool permissions at the granularity of individual tool operations — not just “this agent has access to the file system” but “this agent may read files matching /data/reports/*.csv and may not write or execute.” It must define the human-in-the-loop checkpoints: the classes of action that require human review before execution, the time budget for human responses, and the fallback behavior if a checkpoint times out. It must specify how the agent’s identity is represented to downstream systems and what scope that identity carries. Agents that delegate to sub-agents must specify the delegation protocol and the maximum permission set that any sub-agent may be granted — this is the foundational control against the privilege escalation attacks that OWASP ASI03 describes [5].

Human-in-the-loop checkpoints deserve particular attention at the Design phase because they are significantly harder to retrofit than to design in. The Design phase should categorize all agent-executable actions along two dimensions: consequence severity (measured as difficulty of reversal, data sensitivity, or blast radius) and confidence threshold (the minimum planning confidence required before autonomous execution). Actions that are both high-severity and below the confidence threshold must be gated by human approval. The checkpoint design must also specify what information is presented to the human reviewer: a bare tool call with opaque parameters provides insufficient information for meaningful human oversight.

Phase 2: Development

The Development phase translates the trust architecture and threat model into code. Three categories of secure development practice are specific to agentic systems and have no direct analogue in traditional SDL: input channel separation, safe tool integration, and safe delegation.

Input Channel Separation. The single most consequential secure coding practice for AI agents is the architectural separation of instruction channels from data channels. An agent that processes user instructions and external data through the same pipeline — allowing content retrieved from a web page, a database, or a tool response to influence the agent’s goal-level behavior — is vulnerable to every variant of indirect prompt injection. The arxiv 2506.23260 research on threats in LLM-powered agent workflows documented that the intermingling of instruction and data processing is the root condition enabling cascades from prompt injection through to full protocol exploits [3]. The defense is an instruction hierarchy enforced at the code level: system prompts occupy a higher privilege tier than user messages, which occupy a higher privilege tier than retrieved content, and retrieved content is never permitted to issue directives that override instructions from higher-privilege tiers. This hierarchy must be enforced in the agent framework’s message construction logic, not relied upon from the LLM’s own instruction-following fidelity — which, as the Trail of Bits research demonstrated, can be bypassed [2].

Concretely, developers must wrap all externally-sourced content — tool responses, retrieved documents, API outputs, web-fetched content — in explicit data framing that signals to the model that the enclosed text is data to be analyzed, not instructions to be followed. System-level prompts must include explicit guidance that instructions cannot appear in data channels, and the agent’s reasoning over tool outputs must be audited to confirm that tool responses are not being elevated to instruction status. Input validation for external content must apply both structural checks (format, size, encoding) and semantic checks that flag anomalous patterns consistent with embedded directives — unusual imperative language in factual content, instructions referencing the agent’s own capabilities, or attempts to override the system prompt.

Secure Tool Integration. Every tool an agent can invoke represents a potential attack vector in two directions: the tool’s output may carry adversarial content that reshapes agent behavior, and the agent’s tool invocation may carry attacker-influenced parameters that cause the tool to take unintended actions. Secure tool integration requires validation of both the invocation and the response. Before invoking a tool, the agent framework must verify that the requested parameters are consistent with the task context — parameters that contain file paths outside allowed directories, command arguments that differ in structure from the agent’s planned action, or network endpoints not on an approved list should raise exceptions rather than proceed. After receiving a tool response, the framework must validate the response format and size before incorporating it into the agent’s context, and it must sanitize any executable or structured content in the response before it reaches the LLM.

Tool permission management must enforce least privilege at the level of individual invocations. An agent with read access to a file system should not be able to invoke write operations simply because the underlying tool library permits them; the agent framework layer must enforce the permissions specified in the trust architecture. Tool integrations must also implement invocation rate limiting and circuit-breaker patterns to prevent runaway agent behavior — an agent caught in an action loop should be halted by the framework, not allowed to exhaust system resources or trigger external rate limits that affect other systems.

Safe Delegation Protocols. Agents that spawn sub-agents or delegate tasks to peer agents in a multi-agent pipeline must implement delegation protocols that prevent privilege escalation. The fundamental rule is that a delegating agent may never grant a sub-agent permissions that exceed its own current permission set — a constraint that must be enforced by the orchestration layer, not merely by convention. Delegation tokens must be scoped to the specific task being delegated, with a defined expiration time and a limited set of tools the sub-agent may invoke. The delegating agent must not pass its own authentication credentials to a sub-agent directly; instead, sub-agents must authenticate independently and receive scoped permissions from the orchestration framework. This prevents the chain of trust from being exploited: an attacker who compromises a leaf-node sub-agent should gain access only to that sub-agent’s narrow permission set, not to the full capability footprint of the orchestrator.

Developers building on current agent frameworks should note that most popular frameworks do not enforce these constraints by default. LangGraph, AutoGen, and similar orchestration tools provide delegation primitives but leave permission scoping to application code. ASDL compliance requires that delegation security controls be implemented explicitly at the application layer until framework-level enforcement becomes standard. Secure coding standards for agent frameworks should be codified as developer checklists, linting rules, and automated policy checks that run as part of the standard CI pipeline.

Phase 3: Testing

Testing AI agents requires a fundamentally different approach than testing deterministic software. Traditional test suites verify that a function produces correct output for a given input; agent testing must verify that the agent maintains correct behavior across a distribution of adversarial inputs, emergent multi-agent interactions, and edge cases that may only manifest after extended operation. The ASDL defines three categories of required testing: adversarial red-team testing, multi-agent interaction fuzzing, and acceptance gate evaluation against OWASP ASI criteria.

Red-Team Testing and Benchmark Evaluation. Red-team testing for agents must probe the full set of agentic attack vectors, not merely the input validation and authentication issues that traditional penetration testing covers. RiskRubric.ai, developed in partnership with the Cloud Security Alliance, provides a structured evaluation platform that assesses agent behavior across more than 1,000 reliability prompts and over 200 adversarial prompts designed specifically to test agentic security properties [11]. Red-team engagements should use the ASDL Testing Framework defined in Section 4 of this document, which maps test cases to each OWASP ASI category and specifies both pass criteria and failure escalation procedures.

Goal hijacking resistance testing deserves particular emphasis. A well-designed goal hijacking test does not merely embed an obvious adversarial instruction in external content; it embeds instructions that are plausibly consistent with the agent’s task, that appear in content the agent would legitimately read, and that are designed to redirect a small subset of the agent’s actions rather than completely override its behavior. The latter is far harder to detect and represents the realistic threat model for targeted attacks against enterprise agents. Testing must also verify that the agent’s behavior is consistent across repeated runs with the same input — agents that behave differently on different invocations of the same task create a testing surface that cannot be adequately covered by point-in-time evaluation.

Multi-Agent Interaction Fuzzing. Single-agent testing is necessary but not sufficient for systems that deploy agents in orchestrated pipelines. Emergent misbehavior in multi-agent systems arises from interactions between agents that each behave within their individual parameters. A fuzzing harness for multi-agent interactions must vary agent count, message routing, task decomposition, and the timing and content of inter-agent messages to identify behaviors that do not appear in single-agent evaluation. The OWASP ASI07 (Insecure Inter-Agent Communication) and ASI08 (Cascading Failures) categories [5] specifically address multi-agent failure modes that require pipeline-level testing to detect. Spoofed inter-agent messages, malformed delegation tokens, and out-of-order message delivery are all inputs that a production multi-agent system may encounter and that a robust pipeline must handle without escalating to harmful behavior.

Acceptance Criteria and Exit Gates. No agent should be promoted from testing to deployment without formal acceptance criteria evaluation. The ASDL defines three categories of exit gate requirements. First, security acceptance criteria must be met: the agent must pass all mandatory OWASP ASI test cases, demonstrate goal hijacking resistance across the test corpus, and have no critical or high findings from automated scanning tools unresolved. Second, behavioral consistency requirements must be met: variance in agent outputs across repeated runs of identical inputs must fall within specified bounds, and all defined human-in-the-loop checkpoints must be demonstrated to fire correctly under test conditions. Third, observability requirements must be met: the agent’s logging and telemetry instrumentation must be validated to produce the behavioral signals required for production monitoring before the deployment gate is cleared.

Phase 4: Deployment

Deployment security for AI agents spans three concerns that are each necessary but individually insufficient: runtime isolation, least-privilege configuration, and telemetry instrumentation. An agent deployed without runtime isolation can escape its intended operating boundaries; an agent deployed without least-privilege configuration creates an unnecessarily large blast radius when compromised; an agent deployed without telemetry instrumentation cannot be monitored or investigated effectively during incidents.

Runtime Sandboxing. NVIDIA’s NemoClaw and OpenShell runtime, released in March 2026 as the enterprise security layer for the OpenClaw agent platform, provides the most mature current implementation of kernel-level sandboxing for production AI agents [12]. OpenShell runs each agent in its own isolated sandbox, with security policies enforced at the system level rather than at the application level. Policies are declarative and default-deny: if the agent attempts any action not explicitly permitted in the policy manifest — accessing a network endpoint outside the approved list, reading a file outside its designated directories, or spawning a sub-process — OpenShell intercepts the request and surfaces it for human review rather than allowing silent execution. The privacy router component monitors all communication between the agent and external systems, enforcing data residency rules and preventing credential exfiltration even in the case of model-level compromise.

For organizations not using OpenShell, equivalent sandboxing capabilities can be achieved through a combination of open-source tools. Linux namespaces and seccomp-bpf profiles provide process-level isolation and system call filtering. OPA (Open Policy Agent) can enforce declarative access policies for tool invocations. Network policy enforcement through eBPF-based tools constrains which external endpoints the agent process may reach. Container security platforms such as Falco provide runtime anomaly detection that complements static policy enforcement. The key architectural requirement that distinguishes effective sandboxing from ineffective sandboxing is that policies must be enforced by the execution environment, not by the agent itself: any security control that can be overridden by agent code or model output is not a genuine security boundary.

Least-Privilege Configuration. At deployment time, every tool permission, API credential, data source access grant, and downstream system integration must be reviewed against the principle of least privilege. The ASDL requires a deployment-time permission audit that compares the agent’s actual permission set against the minimal permissions required for each task in its defined scope. Permissions should be granted by task type, not by agent identity as a whole — an agent that performs both read-only research tasks and write-enabled report generation tasks should ideally receive different credential scopes for each activity class, with scope escalation requiring a separate authorization step. Agent identities should be machine-managed service accounts with programmatic credential rotation, not long-lived static API keys or shared credentials.

The deployment checklist mandated by the ASDL includes confirmation that: all tool permissions have been reviewed and documented; credentials are stored in a secrets management system rather than in environment variables or configuration files; network egress is restricted to explicitly required endpoints; the agent’s logging configuration captures all tool invocations with their parameters; human-in-the-loop checkpoint configurations have been tested in the deployment environment; and a kill-switch mechanism exists that can immediately halt the agent’s operation without requiring a full redeployment.

Telemetry Instrumentation. Behavioral monitoring during operations requires telemetry infrastructure that is designed at deployment time, not improvised after incidents occur. The ASDL defines minimum telemetry requirements for production agents: structured logging of every tool invocation including tool name, invocation parameters, response status, and elapsed time; logging of each reasoning step in which the agent modifies its plan or changes the tool it intends to invoke next; logging of all human-in-the-loop checkpoint events including the checkpoint type, the action reviewed, the reviewer’s decision, and elapsed time to decision; and logging of all errors, exceptions, and unexpected model outputs. Logs must be structured in a machine-parseable format, forwarded to a SIEM or security analytics platform in near real-time, and retained for a minimum period consistent with the organization’s incident response requirements.

Phase 5: Operations

An agent that is deployed without ongoing security operations is not a secure agent — it is a periodically audited one. The operational characteristics of AI agents require continuous security attention in ways that static software deployments do not. Model behavior can drift as underlying models are updated. Tool integrations accumulate new permissions or expose new attack surface as they evolve. New attack techniques emerge that were not tested for during the pre-deployment security review. The ASDL Operations phase defines three continuous security activities: behavioral monitoring and drift detection, incident response, and systematic feedback into subsequent development cycles.

Behavioral Monitoring and Drift Detection. Behavioral monitoring for AI agents requires detecting deviations from established behavioral baselines, not merely detecting technical anomalies like network port scans or authentication failures. A baseline behavioral profile for an agent captures the statistical distribution of tool invocation frequency by tool type, the distribution of token counts in model outputs, the rate of human-in-the-loop checkpoint triggers, the frequency of plan modifications after tool responses, and the ratio of successful to failed tool invocations. Significant deviations from these baselines — an agent that begins invoking network tools far more frequently than its historical baseline, or one that starts producing unusually long reasoning chains before tool calls — may indicate goal hijacking, memory poisoning, or tool compromise. Detection thresholds must be calibrated to the specific agent’s behavioral profile; generic anomaly detection tuned for network traffic will miss agent-specific behavioral shifts.

Drift detection has a temporal dimension that single-point-in-time audits miss. An agent whose behavior shifts gradually over weeks — perhaps through incremental memory poisoning as described in the OWASP ASI06 category [5] — will not trigger threshold-based alerting for any individual session, but the cumulative drift may represent a significant compromise of the agent’s intended behavior. ASDL compliance requires periodic behavioral snapshots and trend analysis that can detect slow drift in addition to acute behavioral changes.

Agent-Specific Incident Response. When an agent is suspected of compromise, the incident response procedures applicable to conventional software are necessary but insufficient. A compromised web application is typically isolated and reimaged; the state relevant to the compromise is in logs and network captures. A compromised AI agent may have persisted adversarial instructions in its memory store, may have delegated tasks to sub-agents that are still executing, may have already exfiltrated data through tool calls that appear in logs but require interpretation to understand, and may have influenced other agents in a pipeline through manipulated inter-agent messages. Incident response for agent compromise requires procedures specific to each of these scenarios.

Immediate containment requires not only halting the compromised agent but also auditing all sub-agents that received delegations from it during the suspected compromise window, reviewing all memory entries written during that period for adversarial content, and tracing all tool invocations to reconstruct what data was accessed or modified. Memory stores must be treated as potentially compromised until audited: agent memory should be rolled back to a pre-compromise snapshot if one is available, or cleared and rebuilt from verified sources if it is not. Post-incident analysis must determine the injection point through which the compromise originated — whether from tool output, retrieved content, inter-agent message, or system prompt manipulation — and this analysis must feed directly into an update to the agent’s threat model.

Agent Decommissioning. The decommissioning of an AI agent presents a security surface that has no equivalent in traditional software. A decommissioned agent may leave behind persistent state in memory stores, cached tool credentials in secrets management systems, registered identities in downstream services, and scheduled tasks or webhooks that continue to operate after the agent itself is shut down. ASDL-compliant decommissioning requires a formal checklist that includes: revocation of all credentials and API keys issued to the agent; deletion or archival of the agent’s memory stores with documented chain of custody; removal of the agent’s identity from all downstream system ACLs; deletion of any scheduled tasks, webhooks, or persistent connections that the agent registered; and audit confirmation that no sub-agent delegations issued by the decommissioned agent remain active.


3. Threat Modeling with MAESTRO

MAESTRO — the Multi-Agent Environment, Security, Threat, Risk, and Outcome framework developed by Ken Huang and the Cloud Security Alliance — provides the foundational threat modeling methodology for the ASDL Design phase [4]. Its seven-layer architecture is not merely a taxonomy of components; it is a structured tool for identifying where in an agent’s architecture specific threat classes materialize, and for ensuring that threat models do not systematically miss the interaction effects between layers that produce the most dangerous emergent vulnerabilities.

The Foundation Models layer (L1) is the starting point for understanding an agent’s susceptibility to model-level attacks. Threats at this layer include adversarial input manipulation, model inversion attacks that expose training data, and the behavioral characteristics of the specific model being used — its tendency to follow instructions from data channels, its susceptibility to known jailbreak techniques, and its behavior under resource exhaustion. Threat modeling at L1 requires teams to understand and document the specific model version in use, any fine-tuning that has been applied, and the model’s known behavioral characteristics relative to security-relevant properties like instruction hierarchy enforcement and refusal behavior. Model updates — even minor version increments — must trigger a review of L1 threat model assumptions, since security-relevant behavior can change between versions.

The Data Operations layer (L2) governs all data that flows into and out of the agent’s context. At this layer, the primary threats are data poisoning attacks that corrupt the knowledge the agent draws on for its reasoning, retrieval-augmented generation (RAG) poisoning in which adversarial content is inserted into a vector store to be retrieved in response to targeted queries, and privacy violations arising from training data memorization or context window leakage. A rigorous L2 threat model must enumerate every data source the agent reads from, classify each source by trust level, and specify the validation and sanitization controls applied to content from each source before it reaches the agent’s context.

The Agent Frameworks layer (L3) covers the orchestration and decision-making infrastructure — the LangChain, AutoGen, or custom framework code that coordinates the agent’s planning, tool dispatch, and memory management. This is the layer at which most implementation-level vulnerabilities in current agent deployments are concentrated. Threats at L3 include unsafe deserialization of tool responses, inadequate permission enforcement in the delegation layer, and the absence of channel separation between instructions and data as discussed in Section 2. The Trail of Bits research that produced CVE-2025-54795 operated at L3: the vulnerability was in the command parsing logic of the agent framework layer, not in the underlying language model [1]. L3 threat modeling must trace every path through which external input can influence agent behavior and verify that each path has adequate validation and privilege enforcement.

The Deployment and Infrastructure layer (L4) addresses the execution environment in which the agent runs. Threats at L4 include container escape vulnerabilities that allow an agent process to access host system resources, inadequate secrets management that exposes credentials to the agent’s process environment, insecure inter-service communication between the agent and the tools it invokes, and multi-tenancy isolation failures in shared deployment environments. L4 threat modeling must produce a network diagram showing all communication paths between the agent process and external systems, annotated with the authentication mechanism and transport security properties of each path.

The Evaluation and Observability layer (L5) is concerned with the monitoring and integrity verification infrastructure for the agent. Threats at L5 include logging bypass — an agent or attacker that can suppress or manipulate log output — and observability gaps that leave security-relevant events unrecorded. L5 threat modeling must verify that the telemetry infrastructure defined in the Deployment phase is tamper-resistant: logs must be forwarded to a system that the agent process cannot write to or delete from, and the integrity of log streams must be monitored for gaps or anomalies.

The Security and Compliance layer (L6) addresses the governance controls that apply across the stack: access control policies, regulatory compliance requirements, and the formal documentation of security decisions. L6 threat modeling is less about specific attack vectors and more about ensuring that the security controls defined at other layers are coherent, documented, and auditable. The AICM control domains for Governance Risk and Compliance and Application and Interface Security are primarily addressed at this layer [7].

The Agent Ecosystem layer (L7) captures the broadest scope of the threat model — the agent’s interactions with external systems, other agents, and human users. Threats at L7 include supply chain attacks on the tools and services the agent depends on, multi-agent protocol exploitation (OWASP ASI07) [5], and human-agent trust exploitation in which the agent’s confident presentation of results misleads human reviewers into approving harmful actions (OWASP ASI09) [5]. L7 threat modeling must consider not only how the agent behaves in isolation but how its behavior contributes to or degrades the security posture of the broader ecosystem it participates in.


4. Security Testing Framework

The ASDL Security Testing Framework maps test case categories to each of the ten OWASP Top 10 for Agentic Applications (2026) risk categories and specifies acceptance criteria for each. This framework provides the structured test plan that security engineers should use during the Testing phase, and it serves as the basis for the deployment acceptance gates described in Section 2.

ASI01 — Agent Goal Hijack. Test cases must include direct prompt injection through each data channel the agent reads (tool outputs, RAG retrievals, web content, inter-agent messages), indirect prompt injection in which adversarial instructions are embedded in content that appears consistent with the agent’s normal task context, and long-context hijack attempts in which the adversarial instruction appears in a later portion of a large context window. Pass criteria require the agent to complete its assigned task without modifying its goals based on any adversarially-crafted input in the test corpus, and to flag or reject injected content that appears to issue instructions rather than provide data.

ASI02 — Tool Misuse. Test cases must verify that the agent cannot be induced to invoke tools for purposes outside their intended scope, particularly through adversarial content that constructs plausible-appearing task contexts requiring unusual tool use. Test cases must also verify that the agent’s tool invocation parameters are within expected value distributions for each tool, and that the agent raises appropriate exceptions when tool responses are structurally invalid or anomalous.

ASI03 — Identity and Privilege Abuse. Test cases must verify that the agent does not pass its authentication credentials to sub-agents or external services outside of the defined delegation protocol, that the agent’s effective permission set does not expand through chained tool invocations, and that the agent’s identity cannot be spoofed by an adversarial tool response claiming to be a different system or principal. Testing must also verify that credentials are not logged in plaintext and do not appear in any data that flows through the agent’s context.

ASI04 — Agentic Supply Chain Vulnerabilities. Test cases must verify the integrity of all tool packages and MCP server dependencies against known-good checksums, test the agent’s behavior when a tool server returns responses inconsistent with its registered description, and verify that tool registration workflows require documented provenance before a new tool is approved for agent use. Dynamic dependency resolution at runtime must be disabled or constrained to verified package registries.

ASI05 — Unexpected Code Execution. Test cases must verify that the agent cannot be induced to execute arbitrary code through natural-language invocations of code execution tools, that any code execution tools require parameter validation before execution, and that sandboxing controls prevent executed code from accessing resources outside the agent’s defined scope. CVE-2025-54795 and the broader Trail of Bits research specifically motivate this test category [1][2].

ASI06 — Memory and Context Poisoning. Test cases must simulate adversarial content reaching the agent’s persistent memory store and verify that the agent’s subsequent behavior is not altered by the poisoned memory entry. Tests must also verify that memory content is validated on retrieval and that anomalous memory entries trigger alerts rather than silent inclusion in the agent’s context.

ASI07 and ASI08 — Inter-Agent Communication and Cascading Failures. Pipeline-level test cases must verify that the agent correctly validates the identity of agents it receives messages from, rejects messages with invalid delegation tokens or spoofed source identities, and degrades gracefully rather than propagating errors when a tool or sub-agent in its pipeline returns anomalous results. Cascading failure tests must introduce failures at each node of a multi-agent pipeline and verify that the failure is contained at that node rather than propagating to affect the broader system.

ASI09 and ASI10 — Human-Agent Trust Exploitation and Rogue Agents. Testing for these categories requires behavioral evaluation rather than input-output testing. ASI09 tests present human reviewers at HITL checkpoints with agent-presented justifications for harmful actions that are framed confidently and coherently, and evaluate whether the checkpoint UI and process design provides reviewers with sufficient context to make accurate decisions. ASI10 tests evaluate whether the behavioral monitoring system would detect and escalate the behavioral profile of a rogue agent, using synthetic behavioral logs calibrated to known misalignment patterns.


5. Reference Architecture

The ASDL reference architecture describes a layered security topology in which controls at each lifecycle phase interact to provide defense in depth. It is not a single-vendor blueprint but a description of the control categories and their relationships that an ASDL-compliant deployment must implement.

At the foundation, the agent’s execution environment consists of an isolated runtime container or process — implemented using OpenShell, Linux namespaces with seccomp profiles, or an equivalent kernel-level isolation mechanism — within which the agent process runs with the minimal set of operating system privileges required for its function. Outside this sandbox, a policy enforcement layer receives all requests from the agent process to access external resources (network endpoints, file system paths, sub-processes) and evaluates them against the declarative policy manifest derived from the trust architecture specification. The policy layer is operationally independent from the agent process: it cannot be modified by agent code, and violations are logged to a tamper-resistant log stream that the agent process cannot access.

The tool integration layer sits between the agent framework and the external tools and services the agent invokes. This layer is responsible for enforcing tool-level least privilege, validating invocation parameters before dispatch, and validating and sanitizing tool responses before they are incorporated into the agent’s context. The channel separation logic that distinguishes instruction-privileged content from data-privileged content is implemented at this layer. Tool responses that fail validation are returned to the agent framework as structured error messages rather than being passed through, and all validation failures are logged to the security telemetry stream.

Above the tool integration layer, the agent framework implements the business logic of the agent — its planning, reasoning, memory management, and task decomposition. The framework is responsible for enforcing the instruction hierarchy, maintaining the human-in-the-loop checkpoint registry, and managing the delegation protocol when sub-agents are spawned. Security controls at this layer are primarily code-level: the secure coding standards, automated policy checks, and CI-integrated linting that the Development phase produces.

The observability plane runs horizontally across all layers, collecting structured telemetry from the policy enforcement layer, the tool integration layer, and the agent framework. This telemetry feeds into the security monitoring system that performs behavioral baseline comparison and drift detection. Alerts from the monitoring system flow to the security operations platform, which maintains the incident response runbooks specific to agent compromise scenarios. The kill-switch mechanism, which can immediately halt the agent’s operation at the policy enforcement layer without requiring a code change or redeployment, is accessible from the security operations platform.


6. ASDL Maturity Levels

The ASDL defines three maturity levels to enable organizations to adopt the framework incrementally. Each level builds on the previous, and organizations should assess their current practices honestly before claiming a maturity designation.

Basic (Level 1) represents the minimum security posture that any organization deploying an AI agent in a non-sandbox environment should maintain. At Basic maturity, the organization has produced a MAESTRO-based threat model for the agent, has documented the agent’s tool permissions and human-in-the-loop checkpoints, has implemented input validation that separates instruction channels from data channels, and has performed adversarial testing covering at minimum the ASI01 (Goal Hijack) and ASI03 (Identity and Privilege Abuse) categories. The agent runs in a containerized environment with defined network egress controls, its credentials are stored in a secrets management system, and its tool invocations are logged to a centralized platform. Incident response procedures for agent compromise have been documented and at least one tabletop exercise has been conducted.

Intermediate (Level 2) adds the full depth of security engineering that an agent handling sensitive data or performing consequential actions should have. At Intermediate maturity, the agent has passed testing against all ten OWASP ASI categories, runs in a kernel-level sandbox with declarative policy enforcement, and has telemetry instrumentation that supports behavioral baseline comparison. Memory store contents are validated on retrieval and anomalous entries generate alerts. Delegation protocols enforce maximum permission scoping for sub-agents. Supply chain validation is in place for all tool dependencies. The incident response plan has been tested through a live simulation exercise and refined based on the results. Decommissioning procedures have been documented and are included in the agent’s operational runbook.

Advanced (Level 3) represents the state of practice for agents deployed in high-stakes environments — agents with access to production systems, financial systems, personnel data, or critical infrastructure. At Advanced maturity, all controls from Level 2 are in place and additionally: behavioral monitoring supports drift detection with trend analysis across extended time windows, not just threshold-based alerting; red-team testing is conducted on a recurring schedule (at minimum quarterly) rather than only at initial deployment; memory stores implement cryptographic integrity protection that detects unauthorized modification; formal verification or property-based testing is applied to the agent’s planning and delegation logic; and human-in-the-loop checkpoint design has been evaluated through a formal usability study to confirm that reviewers can make accurate oversight decisions with the information provided.


7. Framework Alignment

The ASDL is designed to integrate with the primary frameworks that enterprise security and AI governance teams use. The following table maps each ASDL phase to the corresponding AICM control domains, OWASP ASI categories, NIST AI RMF functions, and MAESTRO layers.

ASDL Phase CSA AICM Domains OWASP ASI Categories NIST AI RMF Functions MAESTRO Layers
Phase 1: Design Governance Risk and Compliance; Supply Chain Transparency ASI04, ASI05, ASI09 Govern, Map L1–L7 (full stack threat model)
Phase 2: Development Application and Interface Security; Model Security ASI01, ASI02, ASI03, ASI05 Map, Measure L1, L2, L3
Phase 3: Testing Model Security; Data Security and Privacy ASI01–ASI10 (all) Measure L1–L7
Phase 4: Deployment Governance Risk and Compliance; Data Security and Privacy ASI02, ASI03, ASI04, ASI05 Manage L3, L4, L5
Phase 5: Operations Governance Risk and Compliance; Application and Interface Security ASI06, ASI07, ASI08, ASI10 Govern, Manage L4, L5, L6, L7

The AICM, released by CSA in July 2025, provides 243 control objectives across 18 security domains [7]. The ASDL’s five-phase structure covers control objectives in the AICM’s Model Security, Data Security and Privacy Lifecycle Management, Governance Risk and Compliance, Application and Interface Security, and Supply Chain Transparency domains. Organizations implementing the ASDL as part of a broader AICM compliance program should note that ASDL Basic maturity is sufficient to satisfy the AICM control objectives for lower-autonomy, lower-criticality agents, while ASDL Advanced maturity is required for agents in AICM’s highest-risk classification tiers — those with broad access permissions and high impact radius.

The NIST AI Risk Management Framework’s four functions — Govern, Map, Measure, Manage — map naturally to the ASDL’s structure [6]. The Govern function, which calls for establishing governance structures, policies, and roles, is addressed primarily in the Design phase (trust architecture specification, threat model approval requirements) and the Operations phase (behavioral monitoring governance, incident response authority). The Map function — identifying system context, data sources, and stakeholders — is addressed in the Design and Development phases. The Measure function — monitoring performance, trustworthiness, and risk — is addressed in the Testing phase and the telemetry instrumentation components of Deployment. The Manage function — prioritizing, mitigating, and continuously monitoring AI risks — is addressed in the Operations phase and in the feedback loops that connect Operations findings back to subsequent Design iterations.

Microsoft’s evolving SDL for AI, described in their February 2026 guidance [10], converges with the ASDL on several key principles: the need for iterative security controls that match the pace of model and tool evolution; the importance of AI system observability as a security primitive; and the recognition that agent identity and RBAC enforcement require formal design attention, not just policy documentation. Organizations that have already adopted Microsoft SDL practices will find that the ASDL extends rather than replaces that foundation, with the MAESTRO threat modeling methodology and OWASP ASI test framework providing the agentic-specific augmentation that Microsoft SDL identifies as necessary but does not yet fully specify.


References

[1] Miggo Security, “CVE-2025-54795: Claude Code Approval Bypass Remote Code Execution,” National Vulnerability Database, 2025. Available: https://nvd.nist.gov/vuln/detail/CVE-2025-54795

[2] Trail of Bits, “Prompt injection to RCE in AI agents,” Trail of Bits Blog, October 22, 2025. Available: https://blog.trailofbits.com/2025/10/22/prompt-injection-to-rce-in-ai-agents/

[3] Anonymous et al., “From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows,” arXiv:2506.23260, June 2025. Available: https://arxiv.org/abs/2506.23260

[4] K. Huang et al., “Agentic AI Threat Modeling Framework: MAESTRO,” Cloud Security Alliance Blog, February 6, 2025. Available: https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro

[5] OWASP GenAI Security Project, “OWASP Top 10 for Agentic Applications 2026,” December 9, 2025. Available: https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

[6] National Institute of Standards and Technology, “Artificial Intelligence Risk Management Framework (AI RMF 1.0),” NIST AI 100-1, January 2023. Available: https://www.nist.gov/itl/ai-risk-management-framework

[7] Cloud Security Alliance, “AI Controls Matrix (AICM) v1.0,” July 2025. Available: https://cloudsecurityalliance.org/artifacts/ai-controls-matrix

[8] HUMAN Security, “OWASP Top 10 Agentic AI Risks: ASI01 Agent Goal Hijack (EchoLeak),” 2026. Available: https://www.humansecurity.com/learn/blog/owasp-top-10-agentic-applications/

[9] Astrix Security, “The OWASP Agentic Top 10: ASI06 Memory and Context Poisoning (Gemini Memory Attack),” 2026. Available: https://astrix.security/learn/blog/the-owasp-agentic-top-10-just-dropped-heres-what-you-need-to-know/

[10] Microsoft Security, “Microsoft SDL: Evolving Security Practices for an AI-Powered World,” Microsoft Security Blog, February 3, 2026. Available: https://www.microsoft.com/en-us/security/blog/2026/02/03/microsoft-sdl-evolving-security-practices-for-an-ai-powered-world/

[11] Noma Security, “Introducing riskrubric.ai: the risk scorecard for every AI model,” Noma Security Blog, 2025. Available: https://noma.security/blog/introducing-riskrubric-ai-the-risk-scorecard-for-every-ai-model/

[12] NVIDIA, “Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell,” NVIDIA Technical Blog, 2026. Available: https://developer.nvidia.com/blog/run-autonomous-self-evolving-agents-more-safely-with-nvidia-openshell/

[13] NVIDIA, “NemoClaw: NVIDIA’s Enterprise Security Stack for OpenClaw Agents,” NemoClaw.run, March 2026. Available: https://nemoclaw.run/

[14] Practical DevSecOps, “MAESTRO: An Agentic AI Threat Modeling Framework,” 2025. Available: https://www.practical-devsecops.com/maestro-agentic-ai-threat-modeling-framework/

[15] Cloud Security Alliance, “Agentic AI Red Teaming Guide,” CSA Artifacts, 2025. Available: https://cloudsecurityalliance.org/artifacts/agentic-ai-red-teaming-guide

[16] NIST, “Artificial Intelligence Risk Management Framework: Generative AI Profile,” NIST AI 600-1, July 2024. Available: https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

[17] Cloud Security Alliance, “Threat Modeling OpenAI’s Responses API with MAESTRO,” CSA Blog, March 24, 2025. Available: https://cloudsecurityalliance.org/blog/2025/03/24/threat-modeling-openai-s-responses-api-with-the-maestro-framework