Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-06-06

Categories: AI Security, Phishing and Social Engineering, Agentic AI Security

ChatGPhish: When AI Summaries Become Phishing Lures

Key Takeaways

ChatGPhish, disclosed by Permiso researcher Andi Ahmeti on May 29, 2026, demonstrates that a web page under an attacker’s control — or any platform permitting user-generated content — that ChatGPT is asked to summarize can embed hidden Markdown instructions that cause the assistant to render live phishing buttons, tracking pixels, and QR codes directly inside the OpenAI chat interface, which users treat as trusted [1][2].
The underlying class of vulnerability — Cross-Prompt Injection Attack (XPIA), a form of indirect prompt injection — shifts the phishing delivery point from the email or website the user suspects to the AI summary the user consults, in a way that may defeat conventional threat intuitions about recognizing malicious content, because users are conditioned to scrutinize links in emails rather than links rendered inside AI interfaces they consider trusted [3].
EchoLeak (CVE-2025-32711, CVSS 9.3), patched by Microsoft in June 2025, demonstrated the same indirect prompt injection pattern at enterprise scale: a single specially crafted email caused Microsoft 365 Copilot to silently exfiltrate sensitive documents from SharePoint, Teams, and OneDrive without requiring the target to open any attachment or click any link — the exfiltration triggered the next time the target asked Copilot to summarize their inbox [4][5].
As of the ChatGPhish public disclosure, OpenAI had not patched the rendering behavior despite receiving the initial Bugcrowd report on April 29, 2026 [1][2].
Security teams must treat AI-rendered output as an untrusted surface, apply the same scrutiny to links appearing inside AI chat windows as to links arriving by email, and establish organizational policies around which AI tools employees are permitted to use for processing external documents and URLs.

Background

Enterprise adoption of AI-powered summarization and research tools has grown substantially, with employees now regularly asking tools such as ChatGPT, Microsoft 365 Copilot, and Google Gemini to summarize web pages, digest inbound emails, and extract the key points from documents shared by external parties. The productivity benefits are real: AI summarization can compress an hour of reading into minutes. The security assumption baked into this behavior, however, is that the AI serves as a faithful intermediary — that its summary window is fundamentally different from clicking a link directly. ChatGPhish offers a concrete public demonstration that this assumption is wrong, illustrating the vulnerability through three distinct attack chains against a widely used consumer AI tool.

The broader vulnerability class behind ChatGPhish is known as indirect prompt injection, or XPIA in the context of large language model chat interfaces. Unlike direct prompt injection, where an attacker types malicious instructions into the prompt box visible to the user, indirect injection works by embedding adversarial instructions inside external content that the AI will subsequently read on the user’s behalf. The poisoned content — a web page, a document, a calendar invite, or an email body — appears entirely benign to a human reader. The embedded instruction is written not for human consumption but for the AI’s attention, formatted to look like a system directive, a style guide, or a role-assignment that the AI treats as authoritative context rather than inert text [6][7].

OWASP has categorized prompt injection as LLM01:2025, the single highest-priority security risk in large language model applications, covering both direct and indirect variants [8]. In April 2026, a research team at Zylos documented that indirect prompt injection was no longer primarily a research curiosity: the Palo Alto Networks Unit 42 team had documented 12 specific real-world indirect prompt injection cases active in the wild, including ad-review evasion and system prompt leakage on live commercial platforms [9][10]. The ChatGPhish vulnerability represents the next escalation in this trajectory — a weaponization of AI summarization features that re-routes phishing from the inbox to the chat window.

Security Analysis

How ChatGPhish Works

The ChatGPhish attack chain is architecturally simple, which is part of what makes it significant. An attacker creates or controls a web page and embeds adversarial Markdown instructions within that page’s content. The instructions can be hidden through techniques such as white-on-white text, zero-width characters, or HTML comment blocks — rendering them invisible to a human reader while remaining fully legible to the language model that ingests the page’s text content during summarization.

When a user asks ChatGPT to summarize the malicious page, the model processes the embedded instructions alongside the legitimate page content and treats the injected directives as part of the context it is being asked to summarize. The model then generates a response that includes attacker-crafted Markdown — specifically, styled links, image tags, and formatted button elements. The chatgpt.com response renderer subsequently trusts and renders this Markdown as though it originated from the model’s own legitimate output, surfacing the attacker’s links as live, styled, clickable elements inside the OpenAI chat interface [1][2].

Andi Ahmeti demonstrated three concrete attack chains during the research disclosure. In the first, injected instructions cause ChatGPT to render a fake “OpenAI security alert” phishing button using ChatGPT’s own visual design language, instructing the user to re-authenticate due to a suspicious login. The button links to an attacker-controlled credential harvesting page, but visually it is indistinguishable from a legitimate OpenAI interface element. In the second chain, an injected inline QR code pivots the lure from the desktop session to the victim’s mobile device, expanding the attack surface by routing the user through a scanning step that most phishing awareness training does not address. In the third, an injected tracking pixel fires on every page render, silently exfiltrating the victim’s IP address, browser User-Agent string, HTTP Referer header, and a session timing fingerprint — enabling attacker infrastructure to enumerate and profile users who process the malicious summary without ever clicking a link [1][2].

The disclosure timeline illuminates an additional problem: vendor response velocity. Ahmeti filed the initial Bugcrowd report on April 29, 2026, titled “Untrusted Markdown Rendering Leads to XSS, Phishing, and Data Exfiltration.” OpenAI marked the report “Not Reproducible” the following day. Ahmeti resubmitted on May 1 with expanded reproduction steps, screenshots, and additional proof-of-concept scenarios. Permiso published the full research chain publicly on May 29, 2026 — thirty days after the initial report — with the rendering behavior still live and unfixed [1][2]. No CVE had been assigned as of the publication date.

Enterprise-Scale Precedent: EchoLeak

While ChatGPhish demonstrates phishing delivery through a consumer AI summarization tool, the EchoLeak vulnerability (CVE-2025-32711) shows that the same indirect prompt injection pattern reaches considerably further inside enterprise environments when the AI assistant has access to organizational data. Discovered by the Aim Labs research team and reported to Microsoft’s Security Response Center, EchoLeak was assigned a CVSS score of 9.3 and described by Microsoft’s patch advisory as a zero-click elevation of privilege vulnerability in Microsoft 365 Copilot [4][5].

The attack mechanism required only that the attacker send the target a crafted email. No attachment was necessary, no link needed to be clicked, and no macro needed to be enabled. The email body contained hidden prompt injection instructions; when the target asked Microsoft 365 Copilot to summarize their inbox — a routine productivity action — Copilot processed those instructions as authoritative context. The injected payload directed Copilot to search the user’s organizational environment, locate files matching specified criteria across SharePoint and OneDrive, and exfiltrate the contents by encoding them into attacker-controlled image requests whose URLs Copilot’s renderer auto-fetched. The exfiltration occurred silently, within the normal render cycle of the chat response, with no anomalous behavior visible to the user [4][5].

Achieving this end-to-end required chaining four distinct bypass techniques: evading Microsoft’s built-in XPIA classifier that is intended to detect prompt injection attempts, circumventing link redaction by using reference-style Markdown that the classifier did not normalize, exploiting auto-image-fetch to carry exfiltration payloads, and leveraging a Microsoft Teams proxy that was whitelisted in the content security policy [4][5]. Microsoft confirmed the issue was fully resolved in June 2025 and stated there was no evidence of malicious exploitation in the wild.

The Structural Vulnerability

Both ChatGPhish and EchoLeak are instances of a structural flaw that arises when AI systems are designed to render rich output — formatted text, live links, auto-fetched images — while simultaneously processing untrusted external content. The danger is not a bug in any one AI product; it is an emergent property of the design pattern itself. AI chat interfaces that render Markdown from model responses inherit a rendering trust relationship: anything the model outputs, the renderer surfaces as trusted UI. When the model’s output is shaped by untrusted external content, the trust boundary collapses.

The parallel to web application Cross-Site Scripting (XSS) is instructive but imperfect. In XSS, an attacker injects script content that the browser executes in a trusted origin’s context. In XPIA, the “execution” is behavioral — the AI treats the injected text as directive content rather than data, and the renderer surfaces the model’s consequent output as trusted UI. Because this execution is semantic rather than code-based, traditional sanitization approaches designed to detect script syntax do not apply, leaving conventional defenses largely ineffective against the threat. Traditional web application scanners and email security gateways were not designed to detect adversarial Markdown buried in page content intended for AI model ingestion — an attack surface that did not exist when these tools were architected [6][8]. A page that passes all URL reputation checks, content filtering, and malware scanning today can become a phishing lure tomorrow for any user who asks an AI to summarize it.

The Immersive Labs research team noted in related work that AI-powered email security products face a compounding irony: because they also use LLMs to evaluate email content, they can themselves be manipulated by injected instructions hidden in the emails they are scanning, potentially causing the security tool to clear malicious messages it was supposed to flag [3]. Google has acknowledged the same structural concern for Gemini integrations with Google Workspace, publishing layered defense guidance that emphasizes the need to treat AI-generated summaries and responses as potentially influenced by untrusted input [11].

Recommendations

Immediate Actions

Security teams should issue workforce advisories that explicitly extend phishing awareness training to AI chat interfaces. Employees who would never click an unfamiliar link in an email may click the same link when it appears as a styled button inside the ChatGPT or Copilot interface, because they perceive the AI tool as a trusted intermediary. Communications should clarify that AI-rendered links require the same scrutiny as any other externally sourced link, and that AI summaries of external web pages are not sanitized by the AI — they may reflect injected content from those pages.

Until vendors patch the rendering behaviors, organizations should consider whether employees should be submitting external URLs to AI summarization tools, particularly in contexts involving competitive intelligence, legal research, or executive correspondence where an attacker might predict which URLs will be submitted. Where AI summarization of external content is necessary for workflow, organizations should prefer tools that strip link rendering from AI output or that apply explicit human confirmation before surfacing active links.

Any Microsoft 365 Copilot deployments should verify they are running current patch levels to ensure EchoLeak and related prompt injection remediations have been applied. Organizations using Microsoft Defender for Office 365 should review available prompt injection detection settings, as Microsoft has begun rolling out classifiers designed to flag indirect prompt injection attempts in AI-processed content [12].

Short-Term Mitigations

Organizations should evaluate AI tool selection with rendering behavior as a formal criterion. Tools that present AI output in plaintext-only formats — stripping Markdown link rendering, auto-image-fetch, and embedded button elements — are structurally less exposed to XPIA delivery than tools that render rich Markdown. Where rich rendering is a valued feature for internal documents, organizations should investigate whether rendering policies can be scoped: full Markdown rendering for internally generated content, plaintext for summaries of external URLs and documents.

Security operations teams should begin baselining expected AI tool usage patterns and reviewing available logs for anomalous interactions. At a minimum, organizations should confirm which AI tools with access to sensitive data sources — enterprise email, SharePoint, internal wikis — have been deployed in the environment and whether those tools offer audit logging sufficient to detect prompt injection attempts in AI-generated content. Where logging is insufficient, this represents a gap that should inform procurement and configuration decisions.

AI tool developers and application teams should apply input-output context segregation as a design principle for any AI feature that processes external content. Instructions embedded in external content — web pages, PDFs, email bodies — should be parsed in a context that cannot influence system-level behavior or rendered output in the same way as developer-authored prompts. Reference implementation guidance is available through the OWASP Gen AI Security Project’s LLM01:2025 mitigation recommendations [8].

Strategic Considerations

The ChatGPhish and EchoLeak disclosures represent an early chapter in what is likely to be a sustained research focus on AI-assisted phishing delivery. As AI tools become more capable agents — capable of sending emails, booking meetings, submitting forms, and executing transactions on behalf of users — the consequences of successful prompt injection escalate from delivering a phishing link to executing unauthorized actions at scale. Organizations should begin mapping their AI tool inventory against the scope of actions each tool can take, assessing which tools hold elevated access to sensitive data or systems, and establishing escalation criteria for responding to prompt injection disclosures affecting those tools.

Vendor patch velocity varies significantly and should be evaluated per vendor rather than assumed uniform. OpenAI’s initial classification of ChatGPhish as non-reproducible extended the exposure window by at least thirty days; Microsoft’s handling of EchoLeak followed a more standard coordinated disclosure cycle, resulting in a June 2025 patch following MSRC assignment. Security teams should factor individual vendors’ disclosed responsiveness to prompt injection reports into AI vendor risk assessments, and consider whether pre-deployment security review requirements or contract SLAs around vulnerability response timelines are appropriate for AI tools with access to sensitive organizational data.

CSA Resource Alignment

The ChatGPhish vulnerability maps to multiple layers of CSA’s AI security frameworks. Within the MAESTRO threat modeling framework for agentic AI systems, this attack class maps to the Prompt and Instruction Threats category (Layer 1), where adversarial instructions embedded in external context manipulate AI behavior in ways that violate user expectations and organizational security policies. The fact that ChatGPhish operates through a summarization interface — an action the user explicitly initiates — illustrates why MAESTRO’s guidance emphasizes that agentic threat modeling must account for the full pipeline of data the agent processes, not only the user’s direct instruction.

The AI Controls Matrix (AICM) addresses the structural controls relevant here across multiple domains. Input validation controls under AICM require that AI systems segregate content from trusted and untrusted sources before processing, an architectural requirement that the ChatGPT and Copilot rendering behaviors violated. Output controls under AICM require that AI-generated content be reviewed for injection artifacts before it is presented to users in ways that could cause harm, a control that rendering-trust assumptions bypass. The AICM’s shared security responsibility model clarifies that Application Providers — including organizations deploying ChatGPT enterprise or Copilot for Microsoft 365 — bear accountability for evaluating these controls and compensating for gaps in the underlying model’s behavior.

CSA’s Zero Trust guidance reinforces the posture shift these vulnerabilities demand: AI-rendered output should never be implicitly trusted simply because it appears inside a known and trusted tool interface. Zero Trust principles applied to AI output require continuous verification of content provenance and explicit confirmation before acting on AI-generated instructions or links, regardless of which tool generated them. This posture is especially important as AI tools acquire agentic capabilities and begin taking consequential actions rather than merely producing text.

Organizations using CSA’s STAR for AI program to assess AI vendor security posture should consider adding prompt injection responsiveness — disclosure timelines, remediation velocity, and detection capability — as evaluation criteria for AI tools that process external content or hold access to sensitive organizational data.

References

[1] Permiso. “ChatGPhish: The Page Is the Payload.” Permiso Security Research, May 29, 2026.

[2] The Hacker News. “ChatGPhish Vulnerability Turns ChatGPT Web Summaries Into a Phishing Surface.” The Hacker News, May 2026.

[3] Immersive Labs. “Weaponizing LLMs: Bypassing Email Security Products via Indirect Prompt Injection.” Immersive Labs, 2025.

[4] SOC Prime. “CVE-2025-32711 Vulnerability: ‘EchoLeak’ Flaw in Microsoft 365 Copilot Could Enable a Zero-Click Attack on an AI Agent.” SOC Prime, 2025.

[5] The Hacker News. “Zero-Click AI Vulnerability Exposes Microsoft 365 Copilot Data Without User Interaction.” The Hacker News, June 2025.

[6] Lakera. “Indirect Prompt Injection: The Hidden Threat Breaking Modern AI Systems.” Lakera AI, April 2026.

[7] IBM. “What Is a Prompt Injection Attack?” IBM Think, 2025.

[8] OWASP Gen AI Security Project. “LLM01:2025 Prompt Injection.” OWASP Foundation, 2025.

[9] Zylos Research. “Indirect Prompt Injection: Attacks, Defenses, and the 2026 State of the Art.” Zylos AI, April 2026.

[10] Unit 42, Palo Alto Networks. “Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild.” Unit 42, 2026.

[11] Google. “Indirect Prompt Injections and Google’s Layered Defense Strategy for Gemini.” Google Workspace Help, 2025.

[12] Microsoft Security Blog. “Detecting and Analyzing Prompt Abuse in AI Tools.” Microsoft, March 2026.

← Back to Research Index