AI Tech SuiteDiscover AI Tools, News, and Jobs

Cloudflare reveals Anthropic’s Mythos AI chains complex exploits to uncover decades-old vulnerabilities

Cloudflare’s Project Glasswing reveals how Anthropic’s specialized AI chains minor flaws into major exploits, outperforming general-purpose models in autonomous research.

May 19, 2026

Cloudflare reveals Anthropic’s Mythos AI chains complex exploits to uncover decades-old vulnerabilities

Cloudflare has revealed significant findings from its extensive testing of Anthropic's unreleased, security-focused artificial intelligence model, known as Mythos Preview.[1][2][3] As part of a highly controlled defensive research initiative dubbed Project Glasswing, Cloudflare deployed the model across more than 50 of its own internal and open-source code repositories. The results indicate a qualitative shift in the capabilities of large language models, specifically in their ability to perform complex vulnerability research that transcends the limitations of earlier frontier models.[4][5][6][2] While general-purpose models have historically excelled at identifying isolated coding errors, Mythos Preview demonstrated a newfound capacity to construct intricate exploit chains, linking together multiple low-severity flaws to create high-impact security breaches.[2][7][4][6][8]

The collaboration, which included other major technology and infrastructure providers, was designed to evaluate the defensive potential of specialized AI in a real-world production environment. Cloudflare’s testing spanned a diverse array of critical systems, including its runtime environments, edge data paths, protocol stacks, and control planes.[4] To facilitate this, the company developed a sophisticated eight-stage automation harness—comprised of recon, hunt, validate, gap-fill, dedupe, trace, feedback, and report phases. This structured approach allowed Mythos Preview to operate not just as a static code analyzer but as an iterative agent capable of reasoning through the architectural nuances of hundreds of thousands of lines of code.

One of the most consequential findings from Project Glasswing is the model’s ability to build exploit chains. In the landscape of modern software security, attackers rarely rely on a single catastrophic bug.[4] Instead, they typically combine several minor, seemingly innocuous primitives—such as a memory leak and a logic error—to gain full system control. Cloudflare reported that while previous frontier models like GPT-4o and earlier iterations of Claude could identify some of the individual bugs, they consistently failed to complete the final, most difficult step of stitching those pieces together into a working exploit. Mythos Preview, however, showed a human-like ability to reason about how these primitives interact, successfully demonstrating how a use-after-free bug could be converted into an arbitrary read-and-write primitive and eventually a full control-flow hijack.[7]

The model's success is attributed to an iterative "feedback loop" capability that represents a departure from the one-shot analysis typical of current AI tools. During the testing, Mythos Preview was observed writing its own proof-of-concept code, compiling it within a sandboxed environment, and running it to verify exploitability.[2] If the first attempt failed, the model analyzed the output, adjusted its hypothesis, and revised the code until it reached a successful result.[4][2] This autonomous "scratchpad" reasoning allowed the model to surface vulnerabilities that had remained hidden for decades. Specifically, the model reportedly identified a 27-year-old kernel bug in OpenBSD and a 16-year-old vulnerability in FFmpeg, both of which had survived millions of previous fuzzing runs and audits by traditional automated security tools.[9]

On specialized cybersecurity benchmarks, the performance gap between Mythos Preview and general-purpose models is stark.[9][10][1] In the CyberGym evaluation, which measures an AI’s ability to reproduce real-world vulnerabilities, Mythos Preview achieved a score of 83.1 percent.[11][9] This significantly outperformed general-purpose frontier models like Claude Opus 4.6, which scored 66.6 percent.[9][12] While competitors such as Microsoft’s multi-agent systems have reported similar high-level scores, the Cloudflare findings emphasize that Mythos Preview achieves these results as a single, highly tuned model. This specialized intelligence allowed it to solve complex industrial control system simulations that had never been cleared by previous AI models, marking a new milestone in autonomous vulnerability research.

Despite these breakthroughs, Cloudflare’s research also highlighted the significant challenges of integrating such powerful AI into defensive workflows. The model produced a high volume of "noise," or false positives, particularly in codebases written in memory-unsafe languages like C and C++.[2][7][4] Because the model is designed to be exploratory, it often over-reports theoretical flaws, requiring substantial human triage to separate genuine threats from speculative hallucinations.[4] Cloudflare noted that the model often hedges its findings with phrases like "potentially" or "could in theory," though the quality of the reproduction steps it provides is notably higher than that of its predecessors.[7][2] This suggests that while AI can drastically accelerate the discovery phase, the "triage burden" remains a bottleneck for security teams.

The investigation also brought to light a curious phenomenon regarding the model's safety guardrails. Mythos Preview frequently exhibited what researchers called "organic refusals"—situations where the model would decline to perform a vulnerability analysis on a project for ethical or safety reasons. However, these refusals were found to be inconsistent.[2] In several instances, the model refused a request only to agree to the exact same task after an unrelated change was made to the project environment or the prompt was framed differently. Cloudflare argued that this inconsistency proves that internal model refusals cannot be treated as a dependable safety boundary.[4][1] This behavior underscores the need for external governance and secondary layers of safety oversight for models with high-level cyber capabilities.[7]

The implications for the broader AI and cybersecurity industries are profound. Anthropic has categorized Mythos Preview as "too dangerous" for general public release, citing concerns that its autonomous exploit generation capabilities could be used by malicious actors to scale attacks at an unprecedented rate. The dual-use nature of this technology creates a defensive dilemma: infrastructure providers need these models to find and fix bugs before they are exploited, but the very existence of such a model creates a new category of risk. Cloudflare’s technical leadership warned that if such systems become widely available without rigorous access controls, the "attack side" of the internet could accelerate far faster than defenders can respond.

The findings from Project Glasswing suggest that the future of software security will likely move away from a primary focus on rapid patching toward a more resilient, architectural defense strategy. As AI models become capable of finding thousands of zero-day vulnerabilities across major operating systems and browsers in a matter of days, the traditional "cat-and-mouse" game of individual bug fixing becomes unsustainable. Instead, the industry may be forced to adopt memory-safe languages and zero-trust architectures that can withstand the inevitable discovery of flaws. Cloudflare’s research concludes that we are entering an era of "AI versus AI" security, where the primary advantage for defenders lies not just in the intelligence of the models they use, but in the speed and scale at which they can operationalize those findings.

Ultimately, the revelation that Anthropic’s Mythos Preview can chain exploits missed by other models serves as a call to action for the technology sector. It validates the potential of specialized, domain-specific AI as a cornerstone of future cybersecurity while simultaneously exposing the fragility of current safety frameworks. As frontier labs continue to push the boundaries of what is possible in automated reasoning, the focus is shifting toward how to build "harnesses" that maximize defensive utility while minimizing the risks of misuse. For now, the capabilities demonstrated by Mythos Preview remain largely behind the closed doors of research collaborations, providing a glimpse into a future where software vulnerabilities are no longer a matter of human oversight, but of algorithmic discovery.