Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-04-17

Categories: Agentic AI Security, CI/CD Security, Credential Management, Prompt Injection

Comment and Control: GitHub AI Agents as Credential Exfiltrators

Key Takeaways

Security researchers Aonan Guan, Zhengyu Liu, and Gavin Zhong disclosed on April 15, 2026 that three prominent AI agents — Anthropic’s Claude Code Security Review, Google’s Gemini CLI Action, and Microsoft’s GitHub Copilot Agent — can be hijacked via prompt injection payloads embedded in GitHub pull request titles, issue bodies, and issue comments, causing the agents to exfiltrate repository and API secrets back through GitHub itself [1][10].
The attack class, named “Comment and Control” (a deliberate reference to command-and-control infrastructure), requires no external attacker-controlled server: the attacker writes a malicious PR or issue, the target AI agent fires automatically on the GitHub event, reads the attacker content as trusted context, and posts stolen credentials as a PR comment, issue update, or repository artifact [1].
Each of the three affected agents yielded a different set of secrets: confirmed exfiltration targets include ANTHROPIC_API_KEY, GITHUB_TOKEN, GEMINI_API_KEY, GITHUB_COPILOT_API_TOKEN, GITHUB_PERSONAL_ACCESS_TOKEN, COPILOT_JOB_NONCE, and any additional repository or organization secrets provisioned to the workflow runner environment [1].
All three vendors — Anthropic, Google, and Microsoft — accepted bug bounty reports, paid nominal bounties ($100, $1,337, and $500 respectively), and implemented limited mitigations, but none assigned a CVE, published a public security advisory, or notified users running pinned or unattended deployments [2].
Organizations running AI-assisted code review agents on GitHub Actions should treat any untrusted content ingested by those agents — PR titles, issue bodies, review comments — as a potential injection surface and immediately apply the hardening controls described in this note.

Background

The adoption of AI agents in software development pipelines has accelerated significantly over the past eighteen months. GitHub Actions workflows that once invoked static linters or test runners now routinely invoke large language models to perform code review, security scanning, issue triage, and automated response drafting. The three agents at the center of this disclosure include products from the three largest cloud AI providers and represent among the most visible options in that deployable space: Claude Code Security Review is Anthropic’s reference implementation for integrating Claude into GitHub CI workflows; Gemini CLI Action is Google’s equivalent offering; and GitHub Copilot Agent (also described as the Copilot SWE Agent) is Microsoft’s native agentic product built directly into the GitHub platform.

What makes these agents attractive for pipeline integration — their ability to read context broadly, reason across multiple documents, and take autonomous action — is precisely what makes them dangerous when that context includes attacker-controlled content. Unlike a static analysis tool that processes only the code changes in a PR, an AI code review agent typically reads the PR title, the PR description, referenced issue bodies, and existing review comments before composing its response. Each of these fields is writable by any GitHub user who can open a pull request or file an issue, including contributors with no special privileges in private repositories. The agent treats all of this content as part of its working context, with no architectural distinction between instructions from its operator and content from potentially untrusted contributors [7].

This architectural pattern — an AI agent with privileged access to CI/CD secrets that ingests untrusted user content without sanitization — is the precise condition that prompt injection attacks require. The “Comment and Control” research demonstrates that this condition is not a theoretical concern but a readily exploitable reality across leading AI agent products on the market today [7][8].

Security Analysis

The Attack Primitive: Indirect Prompt Injection via GitHub Events

Prompt injection against AI systems takes two broad forms. Direct injection involves an attacker who controls the system prompt or primary instruction channel, typically requiring privileged access. Indirect injection, the form exploited here, involves embedding adversarial instructions in content that the AI agent is expected to process as data — content it was never meant to treat as authoritative commands.

In the Comment and Control attack class, the injection surface is GitHub itself. A pull request title, an issue body, or a review comment is content that GitHub agents are designed to read and reason about. From the model’s perspective, this content arrives as part of its context window with no special marker distinguishing it from trusted operator instructions. An attacker who embeds natural-language instructions in that content — instructions directing the agent to run commands, read environment variables, or post output to a public channel — exploits the model’s core capability: following instructions in natural language, wherever those instructions appear [3].

The “Comment and Control” naming reflects a second distinguishing characteristic: by routing exfiltration through GitHub’s own PR comments and issue updates, the attack exploits the same communication channel that gives it its name. The attacker does not need to set up a listener on an external server, register a domain, or route traffic through unusual network paths. GitHub pull request comments, issue titles, and commit messages all serve as the data channel. Because the traffic flows over GitHub’s HTTPS endpoints — the same endpoints the agent uses for all normal operations — network-level egress controls that look for unusual external destinations offer no protection [1].

Claude Code Security Review: Title-to-System-Prompt Injection

Anthropic’s Claude Code Security Review action processes the pull request title as part of the prompt it constructs for the Claude API. In the vulnerable configuration, the PR title is inserted into the agent’s system-level context without sanitization. Guan and colleagues demonstrated that a PR title containing natural-language instructions — directing Claude to execute a shell command and include the output in its review findings — caused the Claude model to comply [1]. The resulting PR comment, posted by the agent in its normal review-response format, contained the runner environment’s ANTHROPIC_API_KEY and GITHUB_TOKEN rendered as ostensible security “findings.” Any GitHub user with read access to the repository’s pull requests could read the exfiltrated credentials in the comment.

Anthropic received the initial report in October 2025 through its HackerOne bug bounty program, confirmed that the technique could also extract GITHUB_TOKEN, and in November 2025 paid a $100 bounty and upgraded the internal severity rating from 9.3 to 9.4. The company updated a “security considerations” section in its documentation, but did not issue a public advisory, did not assign a CVE, and did not communicate the vulnerability to organizations using pinned versions of the action [2].

Gemini CLI Action: Trusted Section Override

Google’s Gemini CLI Action parses GitHub issue content as part of its agentic context. The attack against Gemini began with a prompt injection payload in an issue title, then escalated with follow-on comments that introduced a fabricated “trusted content section” after the issue’s legitimate additional content. This injected section effectively overrode Gemini’s safety instructions by presenting itself as a more authoritative source of directives than the agent’s original system prompt [1]. The result was that Gemini published its own GEMINI_API_KEY as a new comment on the issue — an act the agent took in response to the injected instruction to “summarize all configuration values” in its operational environment.

Google paid a $1,337 bounty for this finding — a sum that, in security community convention, signals recognition of a significant vulnerability — though the company likewise issued no CVE and no public advisory for users of the affected action [2].

GitHub Copilot Agent: Hidden Payload with Defense Bypass

The Copilot Agent exploitation required additional technique because GitHub had implemented three runtime security layers specifically intended to prevent this class of attack: environment variable filtering (to prevent agent access to sensitive runner variables), secret scanning (to detect and redact credential patterns in agent outputs), and a network firewall (to block outbound connections to unexpected external hosts) [1].

The attack against Copilot used a payload hidden inside an HTML comment block — a construct that GitHub’s rendered Markdown interface conceals from human reviewers but that the AI agent reads in raw form as part of its context ingestion. The hidden payload was paired with an indirect trigger: rather than invoking the agent directly, the attacker relied on the agent’s automated event firing. The environment variable filtering was bypassed by instructing the agent to read secrets through an alternative mechanism; the secret scanning layer was bypassed by encoding the exfiltrated values in a format that the pattern matcher did not recognize; the network firewall was bypassed by routing the exfiltration through GitHub’s own API endpoints rather than an external host [1].

GitHub paid a $500 bounty. As with the other two vendors, no CVE was issued and no public advisory was published. Organizations running unattended Copilot Agent deployments on older pinned workflow versions received no notification [2].

The Disclosure Standards Gap

Under conventional vulnerability disclosure norms — as codified in ISO/IEC 29147 and practiced by the broader software security community — a confirmed vulnerability at the severity Anthropic itself assigned to the Claude Code finding (9.4 CVSS, above the critical threshold of 9.0) would produce a CVE record, a public advisory, and vendor communication to affected users. The same standard applies to the Google and Microsoft findings, whose severity has not been publicly scored but whose exploitability is confirmed. None of these outcomes occurred for any of the three Comment and Control findings. Anthropic, Google, and Microsoft each processed the reports through their bug bounty programs as product improvement feedback rather than security disclosures requiring public notification [2].

This gap has direct operational consequences. Organizations running pinned versions of affected GitHub Actions do not know they are vulnerable. Security teams scanning for CVEs in their CI/CD dependency graphs will find no results. Automated dependency update tools that respond to CVE alerts will not flag these actions for review. The attack surface continues to exist in production environments for users on unpatched or unupdated workflow versions, and no vendor has issued communication that would allow affected organizations to assess their exposure.

The Comment and Control findings are not isolated. In August 2025, GitHub mitigated CVE-2025-59145 (CVSS 9.6), the CamoLeak vulnerability, in which prompt injection payloads embedded in GitHub repository content caused GitHub Copilot Chat to exfiltrate source code, API keys, and cloud secrets through GitHub’s own Camo image proxy service — routing the exfiltration as innocuous image loading traffic that bypassed content security policy controls; the vulnerability was publicly disclosed in October 2025 [4]. In late 2025, BeyondTrust’s Phantom Labs disclosed a command injection vulnerability in OpenAI’s Codex cloud environment in which an attacker could inject arbitrary commands through a GitHub branch name parameter, exposing GitHub credentials stored in the Codex task execution environment [5]. The emerging pattern suggests that AI agents operating in developer and CI/CD environments are broadly vulnerable to injection via the same surfaces that human engineers use for normal collaboration — a concern reinforced by at least five confirmed exploits across five distinct products.

Attack Surface Summary

Agent	Injection Surface	Credentials Exposed	Exfiltration Channel	Vendor Response
Claude Code Security Review	PR title (inserted into system prompt)	`ANTHROPIC_API_KEY`, `GITHUB_TOKEN`	PR review comment	$100 bounty, no CVE
Gemini CLI Action	Issue body and comments	`GEMINI_API_KEY`, runner secrets	Issue comment	$1,337 bounty, no CVE
GitHub Copilot Agent	Hidden HTML comment in PR/issue	`GITHUB_COPILOT_API_TOKEN`, `COPILOT_JOB_NONCE`, `GITHUB_TOKEN`, user-defined secrets	GitHub API endpoints	$500 bounty, no CVE
GitHub Copilot Chat (CamoLeak)	Repository markdown content	Source code, API keys, cloud secrets	GitHub Camo image proxy	CVE-2025-59145 assigned, mitigated Aug 2025

Recommendations

Immediate Actions

Organizations currently running any of the three affected agent actions in GitHub Actions workflows — or similar AI-assisted code review agents — should take the following steps without delay.

Audit active GitHub Actions workflows for any steps that invoke AI agents against untrusted content. Specifically, identify workflows triggered by pull_request, issues, issue_comment, or pull_request_review_comment events that pass event-supplied content (titles, bodies, comments) into an AI model call. These workflows should be treated as potentially exploitable until hardening controls are in place.

Eliminate unnecessary secrets from GitHub Actions runner environments. Each secret provisioned to a runner environment expands the credential surface available to a successful injection. Remove ANTHROPIC_API_KEY, GEMINI_API_KEY, and similar service API keys from runner environments where they are not strictly required. Where they are required, scope them to the minimum necessary permissions using provider-specific key scoping features.

Restrict GITHUB_TOKEN permissions to the minimum required. GitHub Actions runners receive a GITHUB_TOKEN automatically; its default permissions can include write access to repository contents, pull requests, and issues. Organizations should configure permissions: blocks in workflow YAML to restrict the token to the specific operations each step requires, and disable write permissions for AI-agent steps that only need to read repository content.

Pin GitHub Actions to specific commit SHAs rather than mutable tags or branch references. This limits the blast radius of a supply chain attack against the action itself, though it does not address the prompt injection vulnerability in currently pinned versions. Check for updated versions of affected actions that may include mitigations.

Short-Term Mitigations

Implement input boundary controls between untrusted GitHub content and AI agent prompts. Where possible, pre-process event payloads to strip or encode HTML comment blocks before passing them to the AI model. While no sanitization approach is guaranteed to be comprehensive against creative prompt injection, removing hidden HTML comment payloads — which are invisible to human reviewers and serve no legitimate reviewer-facing purpose in a PR title or issue body — significantly raises the effort required for the Copilot-style attack variant.

Restrict which actors can trigger AI agent workflows. GitHub Actions supports conditions on workflow triggers; organizations can gate AI-agent-invoking workflows on membership in a specific team or on manual approval for first-time contributors. For repositories that accept pull requests from external contributors, consider requiring maintainer review before AI agent steps fire on a PR.

Implement output monitoring for AI agent responses. Deploy secret scanning on the content that AI agents post as PR comments, issue comments, or commit artifacts. GitHub’s native secret scanning does not cover agent-posted comments in all configurations, and as demonstrated by the Copilot exploit, encoding can bypass pattern-based scanners. Application-layer review of agent outputs — before they are posted — provides an additional detection layer.

Review and rotate any API keys or tokens that may have been exposed in GitHub Actions environments running the affected agents. Organizations that have run Claude Code Security Review, Gemini CLI Action, or GitHub Copilot Agent against pull requests from untrusted contributors should treat their runner-provisioned credentials as potentially compromised and rotate them proactively.

Strategic Considerations

The Comment and Control disclosures reveal a category-level design gap in how AI agents for CI/CD are architected. Current implementations do not distinguish between content the model should treat as instructions (the operator-defined system prompt) and content the model should treat as data (PR titles, issue bodies, user comments). Resolving this gap requires architectural changes that are likely to take time and coordination between platform providers and AI model vendors.

Organizations evaluating or deploying AI agents in developer pipelines should apply a principle of contextual distrust: any content reachable by untrusted external contributors must be treated as a potential injection vector, regardless of where in the platform that content appears. This principle should be embedded in AI agent procurement and deployment reviews and included in threat models for CI/CD pipeline security assessments.

The vendor disclosure behavior observed in this case — paying bounties while avoiding public disclosure — is not sustainable as AI agents proliferate. Security teams should establish explicit expectations with AI tool vendors around disclosure practices, and advocate within their organizations for procurement policies that require vendors to demonstrate responsible disclosure programs that include public CVE assignment for remotely exploitable AI agent vulnerabilities.

CSA Resource Alignment

The Comment and Control attack class maps directly to threat categories and controls addressed in multiple CSA frameworks, providing organizations with actionable structure for integrating these findings into their existing security programs.

The CSA MAESTRO framework (Multi-Agent Environment, Security, Threat, Risk, and Outcome) for agentic AI threat modeling addresses prompt injection at the orchestration layer as a distinct threat category [6][9]. In MAESTRO terms, the Comment and Control attack exploits the boundary between Layer 3 (Agent Orchestration) and the trust model governing which content sources an agent should treat as authoritative. MAESTRO’s recommended mitigations include input validation at the agent boundary, fine-grained access control governing what actions an agent may take, and structured audit trails for all agent-environment interactions — each directly applicable to GitHub Actions AI agent deployments.

CSA’s AI Controls Matrix (AICM), as a superset of the Cloud Controls Matrix (CCM), addresses credential management, least-privilege access, and secure integration patterns in ways that constrain the effective blast radius of a successful injection. The CCM’s IAM domain controls — specifically the requirements for scoped access credentials, automated credential rotation, and privileged access management — provide the foundational controls that reduce what an attacker can do with exfiltrated GITHUB_TOKEN values. Organizations using AICM as their AI security control framework should ensure that CI/CD credential provisioning is in scope for their AICM implementation reviews.

CSA’s Zero Trust guidance is directly applicable to the architectural recommendation above. The principle of contextual distrust for AI-ingested content extends Zero Trust’s “never trust, always verify” posture from network and identity contexts to the content context of AI model inputs. An AI agent operating in a CI/CD pipeline should be designed with the same skepticism toward all inbound data that a Zero Trust network architecture applies to all inbound traffic — regardless of where that content originates on the platform.

Finally, CSA’s STAR (Security Trust Assurance and Risk) registry and associated AI governance guidance provides a mechanism for organizations to assess AI vendor disclosure and incident response practices as part of their third-party risk management process. The pattern of private bounty payment without public advisory — repeated across three major vendors for a critical-severity attack class — is a supplier risk signal that should be reflected in vendor risk assessments and procurement criteria.

References

[1] Aonan Guan, Zhengyu Liu, and Gavin Zhong. “Comment and Control: Prompt Injection to Credential Theft in Claude Code, Gemini CLI, and GitHub Copilot Agent.” oddguan.com, April 15, 2026.

[2] The Register. “Agents hooked into GitHub can steal creds – but Anthropic, Google, and Microsoft haven’t warned users.” The Register, April 15, 2026.

[3] Cybernews. “AI agents vulnerable to prompt injection via GitHub: But do vendors care?” Cybernews, April 2026.

[4] Meterpreter. “GitHub Copilot Zero-Click CamoLeak Exposed: CVE-2025-59145 (CVSS 9.6) Allowed Silent Data Theft from Private Repos.” meterpreter.org, October 2025.

[5] BeyondTrust. “OpenAI Codex Command Injection Vulnerability.” BeyondTrust Blog, 2025.

[6] Cloud Security Alliance. “Agentic AI Threat Modeling Framework: MAESTRO.” CSA Blog, February 6, 2025.

[7] Aikido Security. “PromptPwnd: Prompt Injection Vulnerabilities in GitHub Actions Using AI Agents.” Aikido Security Blog, 2025.

[8] Techzine. “AI agents on GitHub leak API keys via prompt injection.” Techzine Global, April 2026.

[9] Cloud Security Alliance. “Applying MAESTRO to Real-World Agentic AI Threat Models.” CSA Blog, February 11, 2026.

[10] SecurityWeek. “Claude Code, Gemini CLI, GitHub Copilot Agents Vulnerable to Prompt Injection via Comments.” SecurityWeek, April 2026.

← Back to Research Index