Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-06-05

Categories: Supply Chain Security, Agentic AI Security, CI/CD Pipeline Security

AI Agent Prompt Injection: The New CI/CD Supply Chain Threat

Key Takeaways

Anthropic’s Claude Code GitHub Action contained a critical permission bypass (CVSS 4.0: 7.8) in which the checkWritePermissions function unconditionally trusted any GitHub App actor, enabling unauthenticated external attackers to inject malicious prompts and trigger full repository compromise without write access [1][2].
A separate class of vulnerabilities, dubbed “Comment and Control” by researchers, demonstrated that Anthropic Claude Code, Google Gemini CLI Action, and GitHub Copilot Agent all process untrusted GitHub metadata — PR titles, issue bodies, and HTML comments — as authoritative prompt content, enabling theft of live API keys and repository credentials [3][4].
The Clinejection incident (February 17, 2026) showed this threat is not theoretical: a single malicious GitHub issue title triggered a chain of four vulnerabilities that resulted in an unauthorized supply chain compromise of the Cline AI coding tool’s npm package, reaching an undisclosed number of developer and CI/CD systems over approximately eight hours before the malicious package was removed [5][6].
Fixes exist — Anthropic patched the primary authorization bypass within four days of disclosure and released hardened versions — but the underlying architectural pattern of AI agents ingesting untrusted repository data while holding elevated pipeline credentials remains prevalent across the industry, with Aikido Security identifying at least five Fortune 500 companies with configurations consistent with this pattern as of mid-2026 [10].
Organizations integrating AI agents into CI/CD must treat all AI-processed GitHub metadata as untrusted input and immediately audit workflows for the structural antipatterns that enable this attack class.

Background

The proliferation of AI coding assistants integrated into software development pipelines has created a novel attack surface at the intersection of two previously distinct threat categories: prompt injection and CI/CD supply chain compromise. Tools such as Anthropic’s Claude Code, Google’s Gemini CLI, and GitHub Copilot Agent can now be configured to respond automatically to repository events — triaging issues, reviewing pull requests, summarizing changes, and executing commands — all without human approval of individual actions. This automation brings real efficiency gains, but it also introduces a structural vulnerability: AI agents operating inside CI/CD workflows combine the attack surface of an untrusted text interpreter with the privilege level of a trusted pipeline actor.

The supply chain dimensions of this risk were already on the industry’s radar before AI agents entered the picture. The March 2025 compromise of tj-actions/changed-files and reviewdog/action-setup, which affected approximately 23,000 repositories, demonstrated how attackers who gain write access to a widely-used GitHub Action can exfiltrate secrets from every downstream consumer [8][13]. What AI agents change is the attack prerequisite: an adversary who previously needed write access to a trusted repository now needs only the ability to create an issue or pull request — capabilities available to any GitHub user with a free account.

In January 2026, security researcher RyotaK of GMO Flatt Security disclosed a critical flaw in Anthropic’s claude-code-action that concretely demonstrated this threat class. The vulnerability chain RyotaK documented did not require any novel technique; it composed well-understood weaknesses — authorization bypass, indirect prompt injection, and environment variable exfiltration — into a complete attack that begins with opening a public GitHub issue and ends with malicious code pushed into the action’s source repository, propagating to every downstream dependent [1]. The subsequent public disclosure on June 1, 2026, arrived alongside independent findings from Aonan Guan and Aikido Security demonstrating that the same structural pattern exists across multiple AI agent platforms, establishing prompt injection via GitHub metadata as a class-level supply chain risk for the industry.

Security Analysis

The Permission Bypass and Exploit Chain

The core authorization flaw in claude-code-action resided in its checkWritePermissions function, which determined whether an actor could trigger the agent in automation mode. The function unconditionally permitted any actor whose identity string ended in [bot], on the assumption that bot actors would be legitimate GitHub Apps installed by repository owners [1][2]. This assumption was incorrect. GitHub Apps receive implicit read access to any public repository and can create issues and pull requests in those repositories using only a standard installation token — without the repository owner granting any additional permissions [1].

An attacker exploiting this flaw would create a malicious GitHub App, install it on a repository they control, and use its installation token to open a crafted issue in any target public repository running claude-code-action. Because the actor identifier ends in [bot], the permission check passes unconditionally. The issue body contains a prompt injection payload — typically content designed to mimic a legitimate error message or triage request — that causes Claude to interpret embedded instructions as authoritative [1][2]. Claude Code allows certain bash commands such as cat and head without requiring explicit human approval, so a successful injection can direct the agent to read /proc/self/environ, a Linux pseudo-file that exposes all environment variables present in the workflow process. Among those variables are ACTIONS_ID_TOKEN_REQUEST_TOKEN and ACTIONS_ID_TOKEN_REQUEST_URL, which together allow the attacker to request an OIDC token from GitHub’s identity service [1].

The OIDC token, once obtained, can be exchanged with Anthropic’s backend for a privileged GitHub App installation token scoped to the anthropics/claude-code-action repository. That token provides write access to the action’s source, enabling the attacker to push malicious commits directly. Any downstream repository pinned to a floating version tag — rather than a specific commit SHA — would then execute the poisoned action on its next workflow run, propagating the compromise to the attacker’s arbitrary code across the full dependency graph. This complete attack chain was assigned CVE-2025-66032 (GHSA-xq4m-mc3c-vvg3), classified under CWE-20 (Improper Input Validation) and CWE-77 (Command Injection), and received a CVSS 3.1 score of 8.7 (CVSS v4.0: 7.8; the two values reflect different versions of the scoring standard) [7].

A secondary escalation path, enabled by misconfigured example workflows, operated in two phases. In phase one, an unprivileged user triggers a triage workflow configured with allowed_non_write_users: "*" and issues: write permission, manipulating Claude into leaking a GITHUB_TOKEN to the publicly visible workflow summary. In phase two, the attacker uses that token to edit issues created by trusted users, injecting prompt payloads into content that subsequently triggers higher-privilege tag-mode workflows with id-token: write access — yielding the OIDC credentials needed for the full supply chain attack [1].

Anthropic received RyotaK’s initial report on January 12, 2026, and deployed a fix for the primary bypass within four days [1][2]. The remediation added a checkHumanActor validation step, disabled the workflow run summary section by default to eliminate one exfiltration channel, implemented a custom gh command wrapper that validates arguments against known exfiltration-capable URL patterns, and scrubbed environment variables from child processes spawned by Claude Code [2]. The hardened release is claude-code-action v1.0.94 and @anthropic-ai/claude-code v1.0.93. Anthropic paid a bug bounty of $3,800 plus a $1,000 bypass bonus, and rated the overall finding 7.8 under CVSS v4.0 [1][2].

Comment and Control: Prompt Injection via GitHub Metadata

Concurrent with the GMO Flatt Security disclosure, independent researcher Aonan Guan published findings under the name “Comment and Control” demonstrating that the same fundamental weakness — AI agents processing untrusted GitHub metadata as authoritative prompt content — affects multiple platforms simultaneously [3][4]. The research showed that Anthropic’s Claude Code Security Review action, Google’s Gemini CLI Action, and GitHub’s Copilot Agent all failed to sanitize user-controlled input before incorporating it into the prompt context presented to the underlying AI model.

In the Claude Code case, pull request titles were interpolated directly into the prompt without sanitization. Guan demonstrated that placing a bash command instruction in a PR title caused Claude to execute the command and embed its output — including live API keys and environment variables — into the response posted as a PR comment, publicly visible to anyone with repository read access [3]. The Google Gemini CLI Action was susceptible to a variant in which “Trusted Content Sections” planted in issue comments overrode the model’s safety guidelines, causing Gemini to post API keys in its reply. GitHub Copilot Agent presented the most sophisticated bypass: the attack used hidden HTML comments invisible to human reviewers but parsed by the model, combined with an environment variable filtering bypass targeting only child bash subprocesses (leaving the parent Node.js process readable via ps auxeww), and exfiltrated credentials through base64 encoding to defeat GitHub’s secret-scanning log masking [3].

Credentials stolen across these three platforms included ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN with write permissions, GITHUB_COPILOT_API_TOKEN, GITHUB_PERSONAL_ACCESS_TOKEN, and COPILOT_JOB_NONCE [3]. VentureBeat reporting on the combined disclosures noted that Anthropic’s own system card documentation had previously flagged the prompt injection risk for Claude Code, suggesting a gap between a documented threat model and preventive architectural controls at the time of disclosure [9].

The Clinejection Precedent

The Clinejection incident, publicly disclosed on February 9, 2026, and subsequently exploited in the wild on February 17, 2026, demonstrated the full lifecycle of this attack class against a production target [5][6]. Security researcher Adnan Khan identified that the Cline AI coding tool’s automated issue triage workflow was vulnerable to prompt injection through issue titles. The attack chain Khan documented required only the submission of a single crafted GitHub issue and proceeded through four chained vulnerabilities: indirect prompt injection via the issue title, extraction of the npm publish token from the workflow environment, cache poisoning of the CI artifact store, and publication of the malicious artifact to the npm registry [5].

Eight days after Khan’s disclosure, an unknown actor exploited the unfixed vulnerability to publish [email protected] — an unauthorized release that installed the OpenClaw AI agent on every developer and CI/CD system that updated the Cline CLI [6]. The package remained live for approximately eight hours before the npm token was revoked and the malicious release removed [6]. The incident represents one of the earliest — and the most fully documented — confirmed supply chain compromises of an AI coding tool’s own distribution infrastructure executed through an AI agent that the project itself operated.

Systemic Vulnerability Patterns

Across the disclosed vulnerabilities, three structural antipatterns recur that serve as reliable indicators of exploitable configurations. The first is the absence of human actor validation: any workflow that permits non-human actors — GitHub Apps, bots, automated systems — to trigger AI agent execution without confirming that a human with write access was the initiating party creates a boundary that attackers can cross with a free GitHub account. The second is the direct interpolation of untrusted content into AI prompts, which treats GitHub metadata (issue titles, PR descriptions, code comments, commit messages) as equivalent to trusted system instructions. The third is excessive credential scope in workflow permissions, where agents granted id-token: write or broad contents: write access can become launchpads for OIDC token theft and direct repository write operations when prompt injection succeeds.

Research from Aikido Security, which coined the term “PromptPwnd” for this class of CI/CD vulnerabilities, identified at least five Fortune 500 companies with misconfigured AI agent workflows consistent with these patterns as of mid-2026 [10]. Separately, an August 2025 compromise of the Nx build system’s GitHub Action explicitly named Claude Code, Gemini CLI, and Amazon Q as credential targets, exploiting predictable configuration file paths to harvest stored API keys — a signal that threat actors are actively mapping the attack surface of AI tooling in development pipelines [8].

Recommendations

Immediate Actions

Organizations using Claude Code GitHub Actions or other AI agent workflows should update to claude-code-action v1.0.94 and @anthropic-ai/claude-code v1.0.93 or later, which contain Anthropic’s primary remediations [7]. All workflow files should be audited for the allowed_non_write_users: "*" configuration, which removes the human-actor requirement and should be treated as a critical misconfiguration. Pin every third-party action reference to a full commit SHA rather than a floating version tag such as @v1 or @latest; version tags are mutable and a supply chain attacker who gains write access to an action repository can silently repoint them. Review workflow permissions declarations to confirm that id-token: write and broad contents: write scopes are granted only to workflows where those privileges are operationally required, following the principle of least privilege.

Short-Term Mitigations

For teams that cannot immediately remove AI agents from automated pipelines, architectural isolation can substantially reduce blast radius. Run AI agents in dedicated workflows that hold no repository write credentials, and require a human approval step — via GitHub’s environment: production protection rules — before any agent-generated output is acted upon by a privileged workflow. Sanitize all AI agent inputs at the workflow level before they reach the model: strip or escape content from issue titles, PR descriptions, commit messages, and code comments rather than passing raw GitHub event payloads to the agent. Implement network egress filtering for CI runner environments so that agent processes cannot initiate outbound connections to attacker-controlled infrastructure; this eliminates one of the more reliable exfiltration paths even when prompt injection succeeds. Enable GitHub’s push protection and secret scanning on all repositories running AI agent workflows, and treat any alert of secrets in workflow logs as an immediate incident.

Strategic Considerations

The vulnerabilities disclosed in early 2026 expose a tension that will persist as AI agents become more capable: the value of agentic automation is proportional to the agent’s access to context and its ability to take action, but both of those properties expand the attack surface. The fundamental mitigation is an architectural separation of the agent’s reasoning layer from the credential-holding execution layer. An AI model that analyzes a repository event and produces a structured recommendation — but cannot itself execute the resulting action — cannot be weaponized through prompt injection into stealing credentials or pushing code. The execution layer, which does hold credentials, evaluates the structured recommendation against a policy and acts accordingly; it never processes untrusted free-text.

Organizations developing internal AI agent infrastructure should treat this separation as a design requirement analogous to the principle that web application front ends should not hold database credentials. Vendor procurement processes should evaluate AI agent products for documentation of their trust boundary model and evidence that prompt injection has been included in their threat model and tested adversarially. The broader industry trajectory — toward agentic AI workflows that autonomously read, plan, and act across development infrastructure — leads CSA to recommend treating systematic architectural standards for agent trust boundaries as an urgent standardization priority rather than a deferred best practice, a conclusion substantiated by the incidents documented above.

CSA Resource Alignment

The vulnerability class documented here maps directly to several layers of CSA’s MAESTRO agentic AI threat modeling framework. MAESTRO’s Layer 2 (Data Operations) addresses risks from untrusted content entering agent context windows, which addresses the root-cause category enabling prompt injection via GitHub metadata [11]. Layer 4 (Agent Trust Boundaries) captures the authorization bypass pattern: the checkWritePermissions flaw was fundamentally a failure to enforce the boundary between external, untrusted actors and the trusted execution environment the agent operates within. Layer 6 (Integration and Deployment) addresses risks in the deployment context — precisely the CI/CD integration patterns that created the supply chain attack surface. CSA’s earlier application of MAESTRO to real-world CI/CD pipeline threats, published in February 2026, provides a worked threat model that organizations can adapt for their own agent-integrated workflows [12].

The AI Controls Matrix (AICM) v1.0 supply chain security domain addresses controls for AI tool acquisition and integration, including validation of the provenance and integrity of AI components used in development pipelines. The AI supply chain controls within AICM apply directly to the third-party action pinning and vendor due diligence practices recommended above. CSA’s Software Transparency: Securing the Digital Supply Chain guidance, and the broader CI/CD security controls within the Cloud Controls Matrix (CCM), provide complementary frameworks for the non-AI-specific pipeline hardening measures — credential scoping, environment isolation, and human approval gates — that reduce the exploitability of AI agent vulnerabilities when they occur.

The convergence of agentic AI with software supply chain infrastructure represents one of the highest-priority emerging risk areas for CSA’s AI Safety Initiative. The incidents documented here are early-stage manifestations of a threat model that will grow in severity as AI agents gain broader access to development infrastructure. Organizations should treat this as an architectural risk category requiring dedicated governance, not a patching exercise to be resolved by updating action versions.

References

[1] RyotaK, GMO Flatt Security. “Poisoning Claude Code: One GitHub Issue to Break the Supply Chain.” GMO Flatt Security Research, June 2026.

[2] The Hacker News. “Claude Code GitHub Action Flaw Let One Malicious Issue Hijack Repositories.” The Hacker News, June 2026.

[3] Aonan Guan. “Comment and Control: Prompt Injection to Credential Theft in Claude Code, Gemini CLI, and GitHub Copilot Agent.” Personal Research Blog, 2026.

[4] SecurityWeek. “Claude Code, Gemini CLI, GitHub Copilot Agents Vulnerable to Prompt Injection via Comments.” SecurityWeek, 2026.

[5] Adnan Khan. “Clinejection — Compromising Cline’s Production Releases just by Prompting an Issue Triager.” Personal Research Blog, February 2026.

[6] Snyk. “How ‘Clinejection’ Turned an AI Bot into a Supply Chain Attack.” Snyk Security Research, 2026.

[7] GitHub Advisory Database. “Claude Code Command Validation Bypass Allows Arbitrary Code Execution — GHSA-xq4m-mc3c-vvg3 / CVE-2025-66032.” GitHub, 2026.

[8] CSA Labs. “Prompt Injection in AI-Powered GitHub Actions.” Cloud Security Alliance Lab Space, May 2026.

[9] VentureBeat. “Three AI coding agents leaked secrets through a single prompt injection. One vendor’s system card predicted it.” VentureBeat, 2026.

[10] Aikido Security. “Prompt Injection Inside GitHub Actions: The New Frontier of Supply Chain Attacks.” Aikido Security Research, 2026.

[11] Cloud Security Alliance. “Agentic AI Threat Modeling Framework: MAESTRO.” Cloud Security Alliance, February 2025.

[12] Cloud Security Alliance. “Applying MAESTRO to Real-World Agentic AI Threats: From Framework to CI/CD Pipeline.” Cloud Security Alliance, February 2026.

[13] GitHub Advisory Database. “tj-actions/changed-files GitHub Actions Supply Chain Attack — CVE-2025-30066.” GitHub, March 2025.

← Back to Research Index