Published: 2026-05-07
Categories: AI Security, Vulnerability Management, Cloud Security
CVE-2026-33626: AI Inference SSRF Exploited Within 12 Hours
Key Takeaways
The LMDeploy CVE-2026-33626 incident makes a pattern unambiguous: the window between public vulnerability disclosure and in-the-wild exploitation of AI infrastructure flaws has collapsed to hours, not days. Security teams operating AI inference stacks should treat this as a forcing function for tightening patch SLAs and network segmentation practices across their full AI deployment footprint.
- A Server-Side Request Forgery (SSRF) flaw in LMDeploy’s vision-language image loader (CVE-2026-33626, CVSS 7.5) was publicly disclosed and exploited within approximately 12 hours [1][2].
- The attack used LMDeploy’s
load_image()function as an HTTP SSRF primitive to probe internal networks, access the AWS Instance Metadata Service (IMDS), and enumerate backend services including Redis and MySQL [3]. - All LMDeploy versions prior to 0.12.3 with vision-language support are affected; the fix is available in v0.12.3 [4][5].
- AI inference servers are high-value SSRF targets because in common cloud deployment configurations they run with privileged IAM roles and broad internal network visibility [7].
- Organizations must extend vulnerability patch cadence and network segmentation practices explicitly to AI inference infrastructure, which is too often treated as an internal research system rather than an internet-exposed attack surface.
Background
LMDeploy is an open-source toolkit developed by Shanghai AI Laboratory’s InternLM project for compressing, deploying, and serving large language models [6]. It exposes an OpenAI-compatible REST API, supports both text and vision-language (multimodal) models, and in benchmarks delivers up to 1.8 times higher request throughput than comparable inference frameworks through persistent batching, blocked key-value caching, and optimized GPU kernels [6]. These performance characteristics have made it a practical deployment choice for organizations running multimodal AI at scale, including production workloads built on InternLM, DeepSeek, and other openly distributed model families.
The toolkit’s vision-language support is central to this incident. Multimodal LLMs accept image inputs alongside text, and those images must be fetched and preprocessed before inference begins. LMDeploy’s standard approach is to accept an image_url parameter in client requests and have the inference server retrieve the image directly—a pattern that mirrors the OpenAI API’s image URL handling. This design creates a latent SSRF condition whenever the server does not strictly validate the destination URL, because the model server, not the client, initiates the outbound HTTP request. In cloud-hosted deployments, that distinction is consequential: the model server carries cloud IAM credentials and can reach services on the instance’s local network that are not exposed to the internet.
The broader AI inference ecosystem has grown rapidly as organizations deploy open-weight models in production. Inference servers such as LMDeploy, vLLM, Ollama, and Text Generation Inference (TGI) are now on the critical path for enterprise AI workloads, yet security hardening practices for these systems have not kept pace with their adoption. Industry reporting and security vendor analyses have documented that AI inference APIs are frequently deployed without authentication, that inference servers often carry broad IAM permissions granted for model artifact access or logging integration, and that operators do not consistently segment AI inference from production backend networks [7]. CVE-2026-33626 is a code-level defect in LMDeploy’s URL handling, but its real-world impact is magnified by the architectural patterns common in AI inference deployments: absent authentication, broad IAM permissions, and insufficient network segmentation. These deployment gaps transform a high-severity SSRF vulnerability into a reliable path to cloud credential theft.
Security Analysis
The Vulnerability
CVE-2026-33626 is a Server-Side Request Forgery vulnerability in the load_image() function located in lmdeploy/vl/utils.py [4][8]. In versions prior to 0.12.3, this function accepts an image URL from an incoming client request and fetches it via HTTP without validating the destination against a blocklist of reserved IP ranges, link-local addresses, or loopback addresses. An attacker who can reach the inference API endpoint—whether it is directly internet-exposed or accessible from within a shared cloud environment—can supply a crafted image_url pointing to any reachable network address, causing the model server to act as an unwitting HTTP intermediary with access to its internal network environment.
The vulnerability was assigned a CVSS base score of 7.5 (High), reflecting the absence of required user interaction, the lack of privilege prerequisites, and the potential for significant information disclosure from the targeted infrastructure [8][9]. Exploitation does not require authentication on the inference endpoint unless the deploying organization has independently enforced token validation—a configuration step not mandated by LMDeploy’s defaults [7]. Vision-language API endpoints that accept image URLs are structurally SSRF candidates; the absence of server-side URL validation is the specific enabling condition. The remediation introduced in v0.12.3 applies URL safety checks that reject requests targeting RFC1918 private address ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), loopback addresses (127.0.0.0/8), and link-local addresses (169.254.0.0/16), which closes the primary exploitation path [4].
The 12-Hour Exploitation Timeline
The CVE-2026-33626 advisory became publicly available on GitHub in the early hours of April 22, 2026. Sysdig’s threat research team, operating a network of honeypot systems instrumented to detect exploitation of newly disclosed AI infrastructure vulnerabilities, recorded the first confirmed exploitation attempt against an LMDeploy endpoint at 03:35 UTC on April 22—12 hours and 31 minutes after the advisory’s initial appearance on the GitHub advisory page [3]. The originating IP address, 103.116.72.119, was geolocated to Kowloon Bay, Hong Kong [3]—though IP geolocation does not establish the attacker’s physical location, as threat actors commonly route through proxies and cloud infrastructure.
This timeline is consistent with an accelerating pattern that industry reporting has documented across AI infrastructure vulnerabilities over the past year. Critical flaws in inference servers, model gateways, and agent orchestration tools have repeatedly been weaponized within hours of advisory publication [10]. The compressed exploitation window is consistent with the maturation of automated scanning infrastructure—tools that index security advisories and public commit diffs, correlate them with internet-wide port scans, and probe candidate targets before many administrators have completed patch review. The exact mechanism in this specific incident was not forensically confirmed, but this pattern has been documented across comparable AI infrastructure CVEs. For AI inference endpoints, which are increasingly internet-reachable but infrequently covered by aggressive vulnerability scanning programs, this gap is acutely dangerous.
Attack Sequence
Sysdig’s forensic reconstruction of the exploitation session reveals a systematic, eight-minute reconnaissance campaign that treated the LMDeploy vision-language endpoint as a general-purpose HTTP proxy [3]. The attacker submitted a series of crafted API requests, each specifying an internal destination as the image_url value, and used the model server’s responses—including TCP connection behavior, HTTP status codes, and timing differences—to map the host’s network environment without ever directly connecting to internal resources.
The probe sequence targeted four distinct internal resources in succession. First, requests directed at 169.254.169.254—the well-known AWS Instance Metadata Service address—attempted to retrieve the temporary IAM role credentials attached to the hosting EC2 instance [3]. On instances configured with IMDSv1, which does not require session tokens, these requests return credentials that can be used immediately to perform actions across the AWS account. Second, the attacker enumerated standard database service ports, identifying active Redis and MySQL instances accessible from the model server’s network segment. Third, a secondary HTTP administrative interface was discovered—a management plane unreachable from the internet but visible to the model server, illustrating how an inference workload positioned on an internal subnet inherits lateral movement potential that was not part of the original threat model. Finally, requests incorporating an out-of-band DNS exfiltration domain were issued, a technique used to confirm reachability and log internal hostnames and IP addresses through DNS query records logged on an attacker-controlled name server [3]. Sysdig’s analysis characterizes the session as consistent with automated exploitation tooling executing a predefined reconnaissance playbook [3]. The eight-minute duration and structured probe sequence both support this assessment, though the session’s brevity is not definitive proof of automation over skilled manual operation.
Broader Context
CVE-2026-33626 arrives against a backdrop of expanding AI infrastructure attack surface and accelerating vulnerability discovery. Trend Micro’s 2026 AI security report projects between 2,800 and 3,600 AI-related CVEs for the year—a 31 to 69 percent increase from 2,130 in 2025 [10]. SSRF vulnerabilities have been a persistent theme within this expanding set. CVE-2025-51591 in the Pandoc document converter was exploited in attempts to access AWS IMDS through crafted HTML inputs; attacks documented by Wiz were blocked by IMDSv2 enforcement [11]. CVE-2025-53767 in Azure OpenAI—assigned a perfect CVSS 10.0—demonstrated the catastrophic potential of metadata service exploitation through insufficient URL validation in an AI API endpoint [12]. The World Economic Forum’s April 2026 analysis formally recommended treating AI inference infrastructure as critical infrastructure, in part because of the convergence of high cloud privilege and rapid exploit development targeting these systems [13].
The prevalence of SSRF as an attack class in AI infrastructure follows directly from how these systems are architecturally structured. Vision-LLM image loaders, agentic tool-call handlers, retrieval-augmented generation (RAG) document fetchers, and model fine-tuning pipelines share a structural characteristic: they accept external URLs or file paths from users and issue outbound network requests on behalf of those users—precisely the condition that enables SSRF [7]. As organizations integrate AI inference into production architectures, these URL-fetching components introduce network trust boundary violations that were not present in traditional application stacks and may not be covered by existing security controls or threat models. As of 2024, only 32% of EC2 instances had enforced IMDSv2, according to Datadog’s State of Cloud Security report [14]—and while AWS began defaulting new instance launches to IMDSv2-required in mid-2024, legacy instances with IMDSv1 remain a meaningful portion of the installed base, meaning metadata service access from a compromised model server frequently requires no session token and returns usable credentials immediately.
Recommendations
Immediate Actions
Organizations running LMDeploy with vision-language features should upgrade to version 0.12.3 or later as the primary remediation for CVE-2026-33626 [4]. The patched release is the only complete fix; all other measures described here are defense-in-depth controls that reduce the window or blast radius of exploitation rather than eliminating the vulnerability. While preparing the upgrade, teams should audit whether the inference API is internet-accessible or reachable from shared cloud environment segments, since both conditions allow external actors to supply image_url parameters to the endpoint.
For deployments where patching cannot be completed immediately, applying an egress firewall rule on the inference server host to block outbound HTTP and HTTPS requests to RFC1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), the loopback range (127.0.0.0/8), and the link-local range (169.254.0.0/16) will deny the primary exploitation path. This is a network-layer defense-in-depth measure, not a substitute for the software patch. AWS customers should simultaneously verify that IMDSv2 is enforced on any EC2 instance running LMDeploy or similar inference services, setting HttpTokens=required and HttpPutResponseHopLimit=1 at the instance level, which eliminates the metadata credential theft vector even if the SSRF condition remains [14].
Short-Term Mitigations
Authentication should be enforced on all inference API endpoints. LMDeploy’s default configuration does not require authentication, which means any actor who can route packets to the service port can submit image URLs and exploit SSRF conditions without needing to compromise any credentials first. Adding a reverse proxy with authentication—nginx with mutual TLS or an API gateway enforcing signed token validation—materially raises the exploitation threshold even when application-layer vulnerabilities exist. Network segmentation should isolate AI inference servers into dedicated subnets with explicit deny rules blocking lateral movement to production databases, secrets managers, administrative interfaces, and other sensitive backend services.
Security teams should also review the IAM role permissions attached to instances running AI inference workloads. A model server that requires access to an S3 bucket for model artifacts does not need permissions to access RDS databases, Secrets Manager beyond a narrow scope, or unrelated AWS services. Applying the principle of least privilege to inference workload IAM roles limits the consequence of successful SSRF exploitation of any present or future vulnerability in the inference stack, since stolen credentials will have limited downstream reach.
Strategic Considerations
The LMDeploy incident illustrates that AI inference infrastructure must be managed under the same vulnerability management discipline applied to internet-facing application servers. This means tracking security advisories for every AI dependency—inference servers, model-serving frameworks, quantization libraries, and vision preprocessing utilities—with the rigor currently applied to web application framework updates. For organizations with public-facing AI inference deployments, target patch SLAs of 24 hours or less for critical and high-severity findings. Organizations without existing rapid-response capabilities should treat this as a program maturity goal and prioritize network controls as interim compensating measures.
Organizations should conduct systematic SSRF reviews across their full AI inference stack, not only the primary inference server. RAG document fetchers, agent tool-call handlers, vision preprocessing pipelines, and model fine-tuning infrastructure all present SSRF-equivalent surfaces when they accept user-controlled URLs or file paths and issue outbound network requests. Integrating SSRF review as a standard component of threat modeling for AI components—conducted proactively during design rather than reactively following a CVE disclosure—is the appropriate long-term posture for organizations that depend on AI inference in production.
CSA Resource Alignment
The Cloud Security Alliance has developed several frameworks and publications that directly apply to the security considerations raised by CVE-2026-33626.
CSA’s MAESTRO framework for agentic AI threat modeling provides layered threat analysis for AI systems at each architectural tier, including the inference infrastructure layer at which SSRF vulnerabilities like CVE-2026-33626 manifest. MAESTRO’s analysis of AI inference components as attack surfaces is directly applicable to organizations threat-modeling their LMDeploy and similar deployments, particularly for identifying trust boundary violations between vision-language processing components and internal cloud networks. Security teams conducting MAESTRO-based threat analysis should enumerate all URL-fetching operations performed by inference components as a standard step in the threat identification process.
The AI Controls Matrix (AICM), CSA’s comprehensive control framework for AI systems, addresses the infrastructure security, supply chain management, and vulnerability management controls relevant to this incident. AICM controls pertaining to AI infrastructure hardening, network segmentation of AI workloads, and patch management cadence for AI dependencies map directly to the remediation steps recommended in this note. As a superset of the Cloud Controls Matrix (CCM), AICM provides the appropriate control scope for organizations seeking to align AI inference security practices with a recognized framework. Organizations pursuing AICM alignment should explicitly include AI inference servers in scope for infrastructure security controls rather than treating them as exempt research or internal tools.
CSA’s Zero Trust guidance is particularly relevant to the architectural remediation of SSRF risk in AI inference environments. The core zero trust principle—that no component should be implicitly trusted based on network location—applies directly to the implicit assumption that a model server’s internally-issued HTTP requests should be permitted to reach cloud metadata endpoints and backend databases without authentication. A zero trust network architecture applied to AI inference would have contained the blast radius of CVE-2026-33626 by requiring explicit, token-authenticated access for the model server to reach internal services, rather than relying on network-layer trust. CSA’s publication Zero Trust & AI: Strengthening Security by Reducing Complexity provides directly applicable guidance for operationalizing these principles in AI-integrated environments.
CSA’s AI Organizational Responsibilities publications address the governance structures organizations need to track and respond to AI security incidents at speed. The 12-hour exploitation timeline documented in this incident underscores that monitoring security advisories for AI dependencies and executing patch deployment must be explicitly assigned to named owners within an organization’s operational structure—responsibility left implicit will not be fulfilled at the cadence this threat environment requires. CSA’s STAR for AI certification program provides a structured mechanism for organizations to assess and attest to the maturity of these operational controls across their AI deployments.
References
[1] The Hacker News. “LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Disclosure.” The Hacker News, April 2026.
[2] Fyself News. “LMDeploy CVE-2026-33626 Flaw Exploited Within 13 Hours of Publication.” Fyself, April 2026.
[3] Sysdig Threat Research Team. “CVE-2026-33626: How Attackers Exploited LMDeploy LLM Inference Engines in 12 Hours.” Sysdig, April 2026.
[4] GitLab Security Advisories. “CVE-2026-33626: LMDeploy has Server-Side Request Forgery (SSRF) via Vision-Language Image Loading.” GitLab, April 2026.
[5] Tenable. “CVE-2026-33626.” Tenable CVE Database, April 2026.
[6] InternLM. “LMDeploy: A Toolkit for Compressing, Deploying, and Serving LLMs.” GitHub, accessed May 2026.
[7] Indusface. “Exposed LLM Infrastructure: Risks and Exploits.” Indusface Blog, 2026.
[8] SentinelOne. “CVE-2026-33626: InternLM LMDeploy SSRF Vulnerability.” SentinelOne Vulnerability Database, 2026.
[9] CIRCL Vulnerability Lookup. “CVE-2026-33626.” CIRCL, 2026.
[10] Trend Micro. “Fault Lines in the AI Ecosystem: TrendAI State of AI Security Report.” Trend Micro, 2026.
[11] The Hacker News. “Hackers Exploit Pandoc CVE-2025-51591 to Target AWS IMDS and Steal EC2 IAM Credentials.” The Hacker News, September 2025.
[12] Foresiet. “AI Security Incidents: Full Attack Path Analysis (April 2026).” Foresiet, April 2026.
[13] World Economic Forum. “It’s Time to Start Treating AI Infrastructure as Critical Infrastructure.” World Economic Forum, April 2026.
[14] Datadog. “State of Cloud Security 2024.” Datadog, 2024.