macOS.Gaslight: DPRK Prompt Injection Blinds AI Malware Triage

Authors: Cloud Security Alliance AI Safety Initiative
Published: 2026-06-25

Categories: Threat Intelligence, AI Security, macOS Security, Nation-State Threats
Download PDF

macOS.Gaslight: DPRK Prompt Injection Blinds AI Malware Triage

Key Takeaways

  • macOS.Gaslight is a Rust-based macOS backdoor and infostealer, first uploaded to VirusTotal on May 22, 2026 and caught by an Apple XProtect update in June 2026, which SentinelLABS attributes with high confidence to North Korean (DPRK) threat activity within the BONZAI malware cluster [1][7].
  • The implant carries a 3.5 KB Markdown-fenced block of 38 fabricated system messages designed to convince an LLM-assisted malware triage agent that its session has failed — an evasion layer that, according to SentinelLABS, represents the first documented instance of this technique embedded in a malware binary, targeting the analyst’s tooling rather than the sandbox environment [1].
  • Beyond its AI-evasion payload, macOS.Gaslight functions as a backdoor with a Telegram Bot API command-and-control channel and an embedded Python infostealer that extracts browser credentials, the macOS Keychain database, command histories, and system profiles [1][2].
  • The technique may represent a deliberate DPRK escalation: the same threat actor ecosystem that optimized npm malware for AI coding agents in the PromptMink campaign now embeds adversarial instructions directly into executable binaries to subvert the AI tools defenders use to investigate them [1][3].
  • Security teams deploying LLM-assisted triage workflows must treat the contents of every sample under analysis as adversarial input and enforce strict boundaries between sample data and model instruction channels — failure to do so creates a blind spot that skilled adversaries are already exploiting [1].

Background

The adoption of LLM-assisted tools in security operations has grown considerably in recent years. Analysts at security vendors, threat intelligence firms, and enterprise SOCs increasingly use AI co-pilots to decompile, summarize, and triage malware samples, dramatically reducing characterization time in many reported workflows. The implicit trust model underlying these workflows treats the analyzed artifact as data and the LLM as the reasoning layer. macOS.Gaslight is designed to subvert exactly that distinction.

North Korea’s state-sponsored cyber actors have demonstrated sustained interest in exploiting AI infrastructure and AI-assisted workflows rather than merely defending against them. The PromptMink campaign, documented by CSA in May 2026, showed Famous Chollima engineering malicious npm packages whose documentation was deliberately crafted to pass LLM package-recommendation heuristics and trigger AI coding agents to install malicious dependencies without human review [3]. macOS.Gaslight extends this pattern into a different phase of the kill chain: rather than targeting AI agents that execute code, it targets AI agents that analyze code, attempting to ensure that the implant survives triage without triggering an alert.

The BONZAI malware cluster, to which Apple’s XProtect engine has attributed this sample under the rule identifier MACOS_BONZAI_COBUCH, represents one of several families of DPRK-aligned macOS implants that have emerged over the past several years [1]. A sibling sample linked to the same infrastructure was caught independently by Apple’s AIRPIPE detection rule, reinforcing the attribution assessment. DPRK-aligned actors have shown a documented preference for macOS targets in sectors where the platform is prevalent — including cryptocurrency, financial services, and technology firms — a pattern visible across multiple prior advisories and threat intelligence reports. The addition of an AI-evasion component in macOS.Gaslight suggests that DPRK threat actors have identified AI-assisted triage as a meaningful operational barrier and have adapted accordingly.

Prior academic and vendor research documented prompt injection attempts embedded in documents, web pages, and code comments intended to manipulate AI-powered assistants into taking harmful actions on behalf of an attacker [4]. The prompt injection in macOS.Gaslight is distinct in its target: it does not attempt to redirect an AI agent into performing a task for the attacker but rather to cause the analyst’s triage tool to abort or truncate its own analysis, leaving the malware’s behavioral fingerprint uncharacterized and the analyst with a false picture of the sample’s capabilities.

Security Analysis

The Prompt Injection Technique

The evasion capability in macOS.Gaslight is contained in a 3.5 KB Markdown-fenced blob embedded in the binary. The blob consists of 38 fabricated messages formatted using {{DATA}} token delimiters, a syntax that mimics the prompt structure of several common LLM frameworks, in an attempt to blur the boundary between trusted system instructions and the content being analyzed [1]. The fabricated messages span several categories of simulated failure: token-limit expiry warnings, out-of-memory kills, disk exhaustion errors, repeated operation failures, warnings about detected injection vulnerabilities within the current session, and static-analysis flags suggesting the tool’s own integrity has been compromised.

The cumulative effect of these cascading failure messages is to present an LLM triage agent with what appears to be an unstable or corrupted analysis session. An AI tool that interprets these messages as genuine system state — rather than as content embedded in the object under examination — may respond by aborting its analysis, truncating its output, or flagging the session as contaminated and discarding results. Any of these outcomes serves the attacker’s goal: the sample exits triage without a behavioral summary, and the analyst receives either no report or an incomplete one that omits the implant’s true capabilities. SentinelLABS notes that the technique stacks 38 such messages in sequence, representing an evolution from single-block prompt injection attempts documented by Check Point Research in 2025 and similar work by the Socket security team [1].

The attack surface here is the absence of input validation at the boundary between sample content and the model’s instruction channel. LLM-assisted analysis tools that consume raw binary content, decompiled output, or extracted strings without sanitizing that content before presenting it to the model are vulnerable to this class of attack. The malware does not exploit any vulnerability in the LLM itself; it exploits the assumption that a malware sample’s contents are inert data rather than instructions that the model might act upon.

Backdoor Architecture and Command-and-Control

The primary implant is written in Rust, a language increasingly favored by nation-state-aligned threat actors because its compiled binaries are difficult to reverse-engineer, lack the runtime metadata present in interpreted languages, and do not carry the recognizable patterns of older C or C++ malware families. The backdoor communicates with its operators through the Telegram Bot API, polling the getUpdates endpoint in a continuous loop to receive commands [1][2]. Telegram traffic blends readily with legitimate enterprise network activity, and the platform’s API is accessible from nearly every corporate network without specialized egress controls — a practical advantage that has made it a recurring C2 mechanism in recent DPRK-attributed samples.

The C2 implementation incorporates several notable operational security features. All payload data is encrypted with AES-GCM using the aes-gcm Rust crate, with a fresh nonce generated per message via CCRandomGenerateBytes. Certificate pinning using SecTrustSetAnchorCertificatesOnly prevents man-in-the-middle interception of the C2 channel. The AES key is supplied at runtime rather than embedded in the binary, which means that static analysis of the binary alone cannot recover the decryption key. Perhaps most notably, the URL constructor contains logic that detects when the Telegram API path segment resolves to the literal string “file” and automatically substitutes the actual bot token with the placeholder file/token:redacted in all diagnostic output, error strings, and crash artifacts — preventing the operator’s bot token from being recovered from logs or memory dumps [1].

The implant exposes six primary operator commands: help, id, shell, kill, upload, and stop. The shell verb routes commands through either execvp or posix_spawnp depending on system state, providing an interactive execution capability. File exfiltration is routed through Telegram’s multipart attach:// mechanism. Evidence of a seventh focus command exists within the binary but its function has not been determined [1]. To maintain long-running polling sessions, the implant creates an IOPMAssertionCreateWithName power management assertion that prevents the host system from entering sleep during periods of user inactivity — a technique that sustains the C2 connection without requiring the operator to time their sessions to active user periods.

Infostealer Capability

Embedded within the backdoor is a base64-encoded Python infostealer payload, 6.6 KB in size. Because macOS systems do not universally carry a compatible Python interpreter, the implant includes a 2 KB bash installer that fetches a standalone CPython 3.10.18 binary directly from the astral-sh/python-build-standalone open-source project, supporting both arm64 and x86_64 architectures [1]. This approach avoids dependency on system Python, ensures consistent interpreter availability across macOS versions, and uses a legitimate project as a delivery vector for the runtime dependency, reducing the likelihood that the network request to fetch the interpreter triggers a security alert. SentinelLABS notes that the bash installer exhibits “written by AI” characteristics — including emoji usage and extensive inline comments — consistent with AI-assisted code generation.

The Python script collects a broad set of credential and contextual data. It harvests saved passwords and autofill data from Chrome, Brave, Firefox, and Safari; extracts the raw login.keychain-db file, which contains macOS keychain credentials; reads terminal command history files, which often contain credentials passed as command-line arguments; enumerates installed applications and running processes via ps aux; and generates a full hardware and software profile via system_profiler. All collected data is compressed into a ZIP archive and exfiltrated through the Telegram upload command channel.

Persistence and Anti-Analysis

The implant installs a LaunchAgent using the label com.apple.system.services.activity, a naming convention designed to appear as a legitimate Apple system component in LaunchAgent listings. Using Apple’s com.apple.* namespace for malicious persistence entries is a tactic observed in several prior macOS malware analyses, and it is effective against manual review of LaunchAgent directories by analysts unfamiliar with the complete set of legitimate Apple system labels. The implant resolves its own executable path at runtime via __NSGetExecutablePath rather than from a hardcoded path, ensuring that the LaunchAgent plist remains accurate even if the binary is relocated. System API calls are resolved at runtime through dlsym rather than through static symbol references, reducing the number of API names visible to static analysis tools.

Attribution Assessment

SentinelLABS attributes macOS.Gaslight to DPRK-aligned threat activity with high confidence, based on Apple XProtect’s attribution of the sample to the MACOS_BONZAI_COBUCH signature family and the detection of a sibling sample under Apple’s AIRPIPE rule [1]. The BONZAI and AIRPIPE families have been associated with North Korean threat activity in prior Apple XProtect signature releases, though Apple’s internal XProtect attribution is not separately published as a standalone advisory. The broader configuration schema visible in the binary — which includes Linux-targeting fields (payload_path_linux, persist_name_linux, persist_type_linux) and GitHub-based C2 configuration fields (github_token, github_repo, github_polling_interval) that are present but unexercised — suggests the implant is part of a cross-platform operator infrastructure rather than a purpose-built single-target tool [1].

The sample’s SHA-256 hash is 6328567511d88fdc2ae0939c5ef17b7a63d2a833881900de018a4f12f4982525. The sibling sample captured by the AIRPIPE rule carries hash 77b4fd46994992f0e57302cfe76ed23c0d90101381d2b89fc2ddf5c4536e77ca [1].

Recommendations

Immediate Actions

Security teams should check their macOS fleet for indicators of compromise associated with macOS.Gaslight. This includes searching endpoint detection logs for the two SHA-256 hashes identified above, auditing LaunchAgent directories across managed macOS devices for any entry bearing the label com.apple.system.services.activity or other entries masquerading in the com.apple.* namespace, and monitoring for outbound connections to the Telegram Bot API (api.telegram.org) from endpoints that have no legitimate business justification for that traffic pattern. Apple’s XProtect update released in June 2026 should be applied on all macOS devices if automatic update delivery has not already deployed it.

Teams that operate AI-assisted malware triage tooling should immediately audit whether their pipelines pass raw sample content — including decompiled output, extracted strings, and embedded blobs — directly to an LLM without any sanitization or boundary enforcement. Any such pipeline is potentially vulnerable to the class of attack demonstrated by macOS.Gaslight. The audit should determine whether the tooling treats sample content as data or as instructions, and whether there is any mechanism to prevent embedded prompt-like content from influencing model behavior.

Short-Term Mitigations

The architectural mitigation for AI-assisted analysis tools is strict separation between the instruction channel and the data channel. LLM-based triage workflows should implement a sanitization step that strips or escapes Markdown syntax, token delimiter patterns (including variants of {{DATA}}, [INST], <|system|>, and similar framework-specific markers), and sequences of repeated failure-message patterns before any extracted content is passed to the model. Where feasible, triage tools should use structured analysis APIs that accept binary artifacts for automated analysis in an isolated environment rather than passing extracted strings directly into a conversational LLM session.

Network-level controls can reduce the C2 risk for this specific malware family. Blocking outbound connections to api.telegram.org on endpoints where Telegram is not an authorized business application eliminates the polling channel, though operators may adapt to alternative infrastructure in response. More durable controls include DNS-layer monitoring for connections to the Telegram API from non-corporate-owned hosts and behavioral detection rules that flag persistent polling loops to external API endpoints from non-browser processes.

Keychain access patterns offer a detection opportunity. The macOS.Gaslight infostealer’s direct access to login.keychain-db at the file level — bypassing the Keychain API and its user-consent prompts — is a behavioral anomaly that macOS endpoint detection tools can monitor. Any non-Apple process accessing the raw keychain database file outside of a user-initiated backup workflow should be treated as a high-priority alert.

Strategic Considerations

macOS.Gaslight should be understood not as an isolated incident but as evidence of a pattern. DPRK threat actors have now demonstrated operational capability to target AI-assisted security workflows at two distinct phases of security operations — software supply chain analysis (PromptMink) and post-deployment malware triage (macOS.Gaslight). Whether this reflects coordinated strategic planning or parallel opportunistic adaptation is not determinable from these two data points alone, but the operational capability demonstrated is significant regardless. This pattern suggests that organizations should inventory every AI-assisted security function — triage, alert correlation, threat hunting, code review — and assess each for adversarial input assumptions.

The security community’s reliance on AI-assisted analysis is likely to grow as the volume of malware samples and security alerts increases — a trend widely documented in cybersecurity workforce surveys and vendor volume reports. Threat actors with the resources and long time horizons characteristic of DPRK operators will continue to invest in techniques that subvert these defenses. SentinelLABS has noted that “as LLM-assisted analysis becomes routine, defenders should expect more samples built to exploit it” [1]. Organizations should treat this as a capability development imperative, not a vendor problem: the responsibility for hardening AI-assisted analysis pipelines against adversarial inputs sits with the teams that build and deploy them.

CSA Resource Alignment

The techniques documented in macOS.Gaslight align with threat categories CSA has addressed in its AI safety and agentic security frameworks. The prompt injection component falls within the scope of the CSA Agentic AI Red Teaming Guide’s treatment of Agent Memory and Context Manipulation, which identifies the injection of adversarial instructions into an agent’s context as a high-priority red teaming scenario, and Goal and Instruction Manipulation, which covers techniques that redirect an agent away from its intended task [5]. The guide’s recommendation to validate all inputs to agentic systems and enforce strict boundaries between data and instruction channels is directly applicable to AI-assisted malware analysis pipelines.

The AI Controls Matrix (AICM) addresses the underlying control requirements through its model input validation and adversarial robustness control domains. Specifically, the AICM controls governing input sanitization, content boundary enforcement, and monitoring of AI system behavior for anomalous outputs are the relevant mitigating controls for the evasion technique macOS.Gaslight employs. Organizations that have completed a STAR for AI Level 1 self-assessment should review their AI-CAIQ responses in the context of this threat to verify that their stated input validation controls are implemented for security-operational AI tools, not only for customer-facing AI applications.

The MAESTRO threat modeling framework for agentic AI systems, which CSA developed to categorize threats across the layers of the agentic AI stack, addresses the trust boundary violations that macOS.Gaslight exploits at the analysis tool layer. The malware’s technique is a practical instantiation of the class of attacks MAESTRO characterizes as occurring when the trust model between the data being processed and the processing agent is insufficiently enforced. DPRK’s documented pattern of targeting AI infrastructure across the software development lifecycle — from supply chain (PromptMink) to endpoint analysis (macOS.Gaslight) — also aligns with the multi-layer threat modeling that MAESTRO is designed to support, as no single layer’s controls are sufficient against an adversary operating across multiple phases of the technology lifecycle [6].

References

[1] Phil Stokes, SentinelLABS. “macOS.Gaslight | Rust Backdoor Turns Prompt Injection on the Analyst, Not the Sandbox”. SentinelOne, June 23, 2026.

[2] The Hacker News. “New Gaslight macOS Malware Uses Prompt Injection to Disrupt AI-Assisted Analysis”. The Hacker News, June 25, 2026.

[3] Cloud Security Alliance AI Safety Initiative. “DPRK PromptMink: Nation-State npm Malware Targets AI Coding Agents”. CSA Labs, May 2026.

[4] Oleg Brodt, Elad Feldman, Bruce Schneier, and Ben Nassi. “The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multistep Malware Delivery Mechanism”. arXiv, January 2026.

[5] Cloud Security Alliance. “Agentic AI Red Teaming Guide”. CSA, 2025.

[6] Michael Barnhart. “North Korea’s Integration of AI Across Cyber, Economic, and Military Domains”. 38 North / Stimson Center, February 27, 2026.

[7] Infosecurity Magazine. “macOS Backdoor Uses Prompt Injection to Evade AI Triage”. Infosecurity Magazine, June 24, 2026.

← Back to Research Index