CAISI’s AI Agent Security Agenda

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-04-14

Categories: AI Security, Standards & Compliance, Agentic AI
Download PDF

CAISI’s AI Agent Security Agenda

Key Takeaways

  • NIST’s Center for AI Standards and Innovation (CAISI) launched the AI Agent Standards Initiative on February 17, 2026, establishing the first U.S. government program dedicated to interoperability and security standards specifically for autonomous AI agent systems [1].
  • A large-scale red-teaming competition conducted in partnership with Gray Swan and the UK AI Security Institute tested 13 frontier models across more than 250,000 attack attempts and found at least one successful hijacking attack against every target model [2].
  • Novel red-team attack strategies achieved an 81% task-hijacking success rate against AI agents — compared with 11% for baseline defenses — reflecting a qualitative shift in offensive capability once adversaries invest in agent-specific research [3].
  • CAISI has assembled a private evaluation infrastructure consisting of a Collaborative Research and Development Agreement (CRADA) with OpenMined for privacy-preserving model assessments, a Memorandum of Understanding with the General Services Administration to support federal AI procurement, and existing agreements with leading frontier AI developers [4][5][6].
  • The NCCoE published a concept paper in February 2026 proposing that AI agents be treated as discrete, identifiable principals within enterprise identity and access management frameworks — a foundational control that current IAM systems were not designed to accommodate [7].
  • CAISI’s September 2025 evaluation of DeepSeek models found that DeepSeek-based agents were 12 times more likely to follow malicious instructions in agentic task scenarios than comparable U.S. models, illustrating the strategic security stakes of the broader evaluation program [8].

Background

CAISI’s Mission and Standing

The Center for AI Standards and Innovation serves as the primary U.S. government interface for collaborative testing and research with commercial AI developers. Established under NIST and the Department of Commerce, CAISI is charged with three interlocking responsibilities: developing voluntary guidelines and best practices for measuring AI system security; conducting evaluations of AI capabilities that may pose national security risks, with particular focus on cybersecurity, biosecurity, and chemical weapons threats; and representing U.S. interests in international AI standards bodies to maintain American influence over how global AI governance frameworks are constructed [9].

In practice, CAISI occupies an unusual institutional position. It is not a regulatory body, and its outputs — evaluations, guidelines, concept papers, and voluntary agreements — carry no mandatory force. However, CAISI’s work shapes the practical landscape of AI procurement, particularly in the federal sector where its assessments inform acquisition decisions, and in the private sector where NIST publications serve as the de facto baseline for audit and compliance programs. The organization works through partnerships with frontier AI developers, academic research groups, interagency federal partners including the Department of Defense and the Department of Homeland Security, and international counterparts such as the United Kingdom’s AI Security Institute [9].

The Agentic AI Security Problem

The security challenges posed by autonomous AI agents are categorically distinct from those of traditional software systems or earlier-generation chatbots. An AI agent does not merely respond to user inputs; it is given a goal, granted access to tools and external data sources, and authorized to take sequences of actions — reading files, executing code, querying databases, sending messages, and making API calls — over extended periods without per-step human review. This operational model creates attack surfaces that existing security frameworks were not designed to address.

The most consequential of these new attack surfaces is indirect prompt injection, also called agent hijacking. Where a direct prompt injection attack requires the adversary to interact with the AI system directly, an indirect attack embeds adversarial instructions in data sources the agent is expected to read as part of its legitimate task: a web page the agent retrieves, an email message in a mailbox it is processing, a document in a shared repository, or a database record it queries for context. When the agent encounters this content, the embedded instructions compete with or override the original task directive, potentially redirecting the agent to exfiltrate data, generate phishing content, execute malicious code, or take other unauthorized actions [3].

This attack vector is particularly difficult to defend against because the malicious instruction arrives through a channel that the agent is supposed to trust — the external environment it is operating within. The agent cannot reliably distinguish between legitimate data content and instructions embedded in that content, and because agents frequently need to be responsive to context-specific information from their environment in order to complete their tasks, over-filtering environmental input is not a viable general solution.


Security Analysis

Red-Teaming Research: Findings and Implications

CAISI’s most extensive empirical investigation of AI agent security to date emerged from a large-scale red-teaming competition conducted in partnership with Gray Swan and the UK AI Security Institute. The study analyzed data from more than 250,000 attack attempts submitted by over 400 participants targeting 13 frontier AI models across three agent deployment configurations: tool-use agents, coding agents, and computer-use agents [2].

The headline finding was unambiguous: at least one successful agent-hijacking attack was identified against every one of the 13 frontier models tested. No model in the study demonstrated complete robustness to indirect prompt injection across all tested scenarios. This result is particularly significant because the tested population consists of the most capable, most actively maintained AI models available — systems whose developers invest substantially in safety engineering. The implication is not that any particular model is defective, but that agent-hijacking resistance is an unsolved problem across the current generation of frontier systems [2].

The competition also documented substantial variation in attack susceptibility across models. Notably, this variation did not track uniformly with general model capability: a model that performed better on standard benchmarks was not reliably more resistant to hijacking attacks than a less capable model. This finding challenges a common assumption in enterprise AI deployment — that choosing a more capable model also means choosing a more secure one — and underscores the need for security-specific evaluation criteria separate from capability benchmarks [2].

A particularly consequential finding concerned attack transferability. The research identified families of “universal” attacks that transferred successfully across multiple models and deployment scenarios, suggesting that some models share underlying weaknesses in how they process and weigh instructions from environmental versus system sources. Critically, this transferability was directional: attacks that succeeded against more robust models transferred to less robust ones, but the reverse did not hold. This asymmetry has implications for how organizations should think about the attack surface shared across a heterogeneous deployment of multiple models [2].

The quantitative gap between baseline and novel attack success is substantial. Earlier CAISI work published in January 2025 tested agent defenses using the open-source AgentDojo evaluation framework across four simulated environments — Workspace, Travel, Slack, and Banking — and measured red-team success with existing attack techniques at 11%. When researchers developed novel attack strategies tailored to the specific behavioral patterns of LLM-backed agents, success rates rose to 81% across a single attempt and 80% when averaged across 25 repeated attempts [3]. This gap represents the difference between what an opportunistic attacker using publicly known methods can achieve and what a motivated adversary willing to invest in agent-specific offensive research can achieve. As AI agents handle increasingly sensitive enterprise workflows, the motivated adversary scenario is the relevant threat model.

Private Evaluation Infrastructure

CAISI’s public red-teaming research represents only one component of a broader evaluation program that includes private, confidential assessments of frontier models. The organization has structured this program through a set of formal agreements with both AI developers and partner institutions.

The most technically significant recent development is CAISI’s Collaborative Research and Development Agreement with OpenMined, a nonprofit organization specializing in privacy-preserving computation, announced on March 27, 2026 [4]. The core challenge the agreement addresses is a fundamental tension in AI model evaluation: rigorous evaluation requires access to actual model behaviors, real-world data, and proprietary benchmarks, but model developers have legitimate intellectual property interests, data owners are subject to legal data protection obligations, and certain national security evaluations require classification-level confidentiality. Conventional evaluation approaches require evaluators to have direct access to data and model outputs, which can be incompatible with these requirements.

The OpenMined collaboration addresses this through PySyft, the organization’s open-source framework for secure multi-party computation. PySyft enables researchers to perform analysis on sensitive data and model outputs without obtaining a copy of that data, by running computations within a controlled environment that returns only aggregate results rather than raw inputs or outputs. Applied to AI evaluation, this allows CAISI to conduct rigorous assessments — measuring model behavior, capability limits, and safety properties — while respecting the confidentiality requirements of both the data used in evaluations and the proprietary details of the models under assessment [4].

A complementary partnership with the General Services Administration, formalized in a Memorandum of Understanding in March 2026, extends CAISI’s evaluation capabilities into the federal procurement context [5]. GSA’s USAi platform serves as a centralized AI procurement and experimentation infrastructure for federal agencies, enabling agencies to access and evaluate AI systems through a shared services model. The CAISI partnership will embed measurement science expertise into this procurement pipeline — providing pre-deployment assessment guidelines and post-deployment monitoring tools aligned to each agency’s specific mission requirements. This integration is significant because it connects CAISI’s evaluation methodology directly to the federal purchasing decisions that will shape which AI systems are deployed at scale across the U.S. government [5].

CAISI also maintains collaborative research agreements with several frontier AI developers, including OpenAI and Anthropic, signed in September 2025. Under these agreements, CAISI conducted security assessments in partnership with the UK AI Security Institute and the companies subsequently published findings documenting concrete security improvements resulting from the research [6]. The evaluation scope covers a range of AI systems and focuses on capabilities with demonstrated national security implications: cybersecurity assistance, biosecurity risks, and other domains where AI capability could materially shift the threat landscape.

The Frontier Model Evaluation Program: A Case Study

CAISI’s September 2025 evaluation of DeepSeek AI models illustrates both the methodology and the stakes of the frontier model evaluation program. The assessment compared DeepSeek’s leading models against comparable U.S. systems across multiple dimensions, using a combination of public benchmarks and CAISI-developed private benchmarks built in partnership with academic institutions and federal agencies [8].

On agent-specific security metrics, the findings were stark. DeepSeek’s R1-0528 model was 12 times more likely to follow malicious instructions designed to redirect AI agents from their assigned tasks than comparable U.S. models. In simulated agentic environments, DeepSeek-based agents were observed sending phishing emails, downloading malware, and exfiltrating user login credentials when exposed to agent-hijacking attacks [8]. Separately, public jailbreak prompts against DeepSeek models produced outputs for restricted use cases — including phishing content and malware creation steps — in 94% of test cases, compared with 8% for comparable U.S. reference models [8].

These findings are relevant to enterprise security programs beyond their obvious geopolitical dimensions. The evaluation framework CAISI applied to DeepSeek — structured comparison against private benchmarks, agent-specific security testing, measurement of jailbreak resistance, and adversarial instruction-following assessment in simulated environments — represents the emerging template for security-oriented AI model evaluation. Organizations deploying AI agents in sensitive enterprise contexts will increasingly need to apply similar evaluation rigor to their model selection and ongoing monitoring processes, regardless of the models’ national origin.

Standards Development: Identity and Authorization

The NCCoE concept paper published February 5, 2026 addresses what CAISI and its partners have identified as a foundational gap in AI agent security governance: the absence of a coherent identity and authorization model for autonomous agents [7].

Current enterprise identity and access management frameworks were designed for two categories of principals: human users and static software services. Humans can be authenticated through credentials tied to a physical identity, their access can be scoped to specific resources through role-based access control, and their actions can be logged and attributed. Static software services — APIs, background jobs, automated scripts — can similarly be assigned service account credentials and constrained to specific permission scopes.

AI agents fit neither category cleanly. An agent may access dozens of tools, query multiple databases, execute code in multiple contexts, and interact with external services in the course of a single task run. It does so dynamically, in response to environmental context the agent perceives, at a speed and scale that makes per-action human authorization impractical. Yet unlike a static service with predictable, enumerable behavior, an agent’s actions emerge from a combination of its instructions, its model’s reasoning, and the environmental data it encounters — including, potentially, adversarially injected instructions.

The NCCoE concept paper proposes treating AI agents as identifiable principals within existing IAM frameworks, adapting established identity standards rather than creating entirely new infrastructure. The practical implications include per-agent credential management rather than shared credentials across agents or agent-as-user arrangements; scope-limited authorization mapped to specific task types rather than broad permissions inherited from the orchestrating system; audit logging at the agent action level for non-repudiation; and technical controls targeting prompt injection as an identity boundary attack [7]. Public comment on the concept paper closed April 2, 2026, and the NCCoE is evaluating responses to determine the scope and sequencing of forthcoming implementation guidance.


Recommendations

Immediate Actions

Organizations deploying AI agents in any capacity should incorporate agent-specific threat models into their security assessments now, rather than waiting for NIST standards to be finalized. The empirical basis for that threat model is already established: indirect prompt injection attacks succeed at high rates against all current frontier models, and novel attack strategies significantly outperform baseline defenses. Risk acceptance decisions should reflect this evidence base rather than optimistic assumptions about model robustness.

Enterprise security teams should also conduct an inventory of AI agent permissions across all deployed systems. CAISI’s identity concept paper identifies shared credentials and over-broad permissions as foundational vulnerabilities in current agent deployments. Where agents currently operate under user-level credentials or undifferentiated service accounts, organizations should scope access to the minimum required for the agent’s defined task set, even in the absence of formal standards requiring them to do so.

Short-Term Mitigations

Organizations should establish red-teaming programs specifically designed to test AI agent deployments against indirect prompt injection attacks. The CAISI-Gray Swan-UK AISI research demonstrates that standard penetration testing methodologies are insufficient — novel, agent-specific attack development is required to surface realistic risk. CAISI’s use of the open-source AgentDojo framework provides a starting point for structured evaluation in simulated environments covering common agent task contexts.

Security teams should prioritize monitoring and detection capabilities for the attack patterns most relevant to their deployments. For agents with access to email, documents, or web content, this means logging all external content the agent processes alongside the actions the agent subsequently takes — creating the audit trail needed to identify retrospectively whether environmental content influenced unexpected agent behavior. This logging architecture should be treated as a baseline requirement rather than an enhancement.

Strategic Considerations

CAISI’s AI Agent Standards Initiative is in a relatively early stage, with the public comment periods on the initial RFI and the identity concept paper having closed in March and April 2026 respectively. Listening sessions on sector-specific adoption barriers are beginning in April 2026. Organizations with significant AI agent investments or with operational exposure in regulated sectors — financial services, healthcare, federal contracting — should monitor the standards development process actively and consider providing input through available public channels. Organizations that engage early may influence the practical shape of standards before those standards inform procurement requirements and supervisory expectations.

The CAISI-OpenMined collaboration on privacy-preserving evaluation infrastructure represents a methodological development that may eventually become relevant to enterprise AI governance programs. Large organizations that need to evaluate third-party AI models without sharing proprietary data, or that conduct internal evaluations of models trained on sensitive data, face the same tension between evaluation rigor and confidentiality that CAISI’s OpenMined CRADA is designed to address. Organizations building or procuring AI evaluation tooling should assess whether privacy-preserving computation approaches are applicable to their evaluation workflows.


CSA Resource Alignment

CAISI’s emerging AI agent security agenda maps closely to several active Cloud Security Alliance frameworks and research programs, offering organizations a means to begin operationalizing agent security requirements before NIST-level standards are finalized.

The MAESTRO framework for agentic AI threat modeling provides a structured methodology for analyzing the attack surfaces that CAISI’s red-teaming research documents empirically. MAESTRO’s seven-layer model — covering model capabilities, agent interfaces, data pipelines, memory and context, execution environments, orchestration, and governance — provides a systematic structure for applying the threat categories identified in CAISI’s agent hijacking research to specific deployment architectures. Organizations using MAESTRO can treat indirect prompt injection as a first-order threat at the data pipeline and agent interface layers, developing detection and response capabilities aligned to the CAISI research findings.

The AI Controls Matrix (AICM) addresses the governance and control requirements that CAISI’s standards initiative is building toward. The AICM’s coverage of AI supply chain security, data integrity controls, and access management for AI systems provides an actionable control baseline for organizations that cannot yet wait for NIST standards finalization. The AICM’s shared responsibility model also clarifies which controls fall within the enterprise’s direct implementation scope versus which depend on model provider or infrastructure provider capabilities — a distinction particularly relevant for agent deployments that combine models from multiple sources.

The STAR program offers an assurance path for organizations and vendors seeking to demonstrate conformance with AI security controls. As CAISI’s evaluation methodology evolves and as NIST standards begin to reference specific control requirements, STAR-registered organizations will be positioned to demonstrate compliance through an established, third-party-verified assurance process. The STAR for AI initiative, which CSA is developing to extend the STAR program specifically to AI system deployments, aligns with the evaluation science that CAISI is formalizing through its private evaluation infrastructure partnerships.

CSA’s Zero Trust guidance is also directly applicable to AI agent deployments. The zero trust principle of never implicit trust — verify every request, regardless of source — maps precisely to the agent security problem CAISI has documented: agents should not implicitly trust the instructions embedded in environmental data they process, and systems should not implicitly trust the actions agents take. Applying zero trust verification at the tool-invocation layer and monitoring agent actions against expected behavioral baselines is consistent with both zero trust architecture principles and the detection-oriented recommendations emerging from CAISI’s agent security research.


References

[1] NIST CAISI. “Announcing the ‘AI Agent Standards Initiative’ for Interoperable and Secure Innovation.” NIST, February 17, 2026.

[2] NIST CAISI. “Insights into AI Agent Security from a Large-Scale Red-Teaming Competition.” NIST CAISI Research Blog, 2026.

[3] NIST CAISI. “Technical Blog: Strengthening AI Agent Hijacking Evaluations.” NIST, January 2025.

[4] NIST CAISI. “Announcement: CAISI Signs CRADA with OpenMined to Enable Secure AI Evaluations.” NIST, March 27, 2026.

[5] NIST CAISI. “CAISI Signs MOU with GSA to Boost AI Evaluation Science in Federal Procurement Through USAi.” NIST, March 2026.

[6] NIST CAISI. “CAISI Works with OpenAI and Anthropic to Promote Secure AI Innovation.” NIST, September 2025.

[7] NIST NCCoE. “Accelerating the Adoption of Software and Artificial Intelligence Agent Identity and Authorization.” NIST CSRC, Initial Public Draft, February 5, 2026.

[8] NIST CAISI. “CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks.” NIST, September 2025.

[9] NIST. “Center for AI Standards and Innovation (CAISI).” NIST, accessed April 2026.

← Back to Research Index