Anthropic confirms Claude Mythos after internal leak reveals a breakthrough in autonomous reasoning

An accidental leak reveals Claude Mythos, a next-generation model capable of unprecedented autonomous reasoning and advanced cybersecurity.

March 27, 2026

Anthropic confirms Claude Mythos after internal leak reveals a breakthrough in autonomous reasoning
In a significant lapse for a company defined by its commitment to safety and security, Anthropic has confirmed the existence of a next-generation artificial intelligence model after a misconfigured internal system inadvertently exposed thousands of proprietary documents.[1] The leak, which surfaced through an unsecured content management system, has forced the San Francisco-based startup to acknowledge that it is currently testing a system it describes as a step change in reasoning capabilities. This admission comes at a critical juncture for the industry, as the leading AI labs move beyond the incremental improvements of the past two years toward a new frontier of autonomous problem-solving and logic.
The exposure was initially identified by security researchers who discovered a publicly searchable data store containing nearly three thousand unpublished assets, including draft blog posts and internal performance metrics.[1] The breach did not involve the compromise of the core model weights or user data, but rather a fundamental failure in the settings of a web-based repository. Because the system was set to a public-by-default configuration for uploaded files, sensitive details regarding the company’s roadmap and its most advanced experimental results were left visible to anyone with the correct URL. Once alerted to the vulnerability, Anthropic restricted access to the data, but by that point, the details of its upcoming flagship model had already begun to circulate within the research community.
Internal documents recovered from the breach refer to the new model architecture as Claude Mythos, representing a departure from the company’s existing lineup.[2][3] According to these materials, Mythos is intended to anchor a brand-new product tier tentatively named Capybara, which is positioned to sit above the current high-performance Opus tier.[2][3] Anthropic’s internal assessments describe the model as the most capable it has ever built, specifically highlighting its performance in complex, multi-step tasks that have historically stymied large language models. A company spokesperson later confirmed that the system is currently in the hands of a small group of early-access customers for testing and safety evaluation, emphasizing that the jump in intelligence is not merely a refinement of existing techniques but a fundamental shift in the model’s ability to reason through abstract problems.
The performance metrics revealed in the leak provide a clearer picture of what this step change entails. Mythos reportedly achieves dramatically higher scores on rigorous evaluations such as Terminal-Bench 2.0, a benchmark designed to test an AI’s ability to operate in live terminal environments and manage complex system architectures. While the company’s recent high-water mark, Claude Opus 4.6, achieved a top-tier score of 65.4 percent on this benchmark, the leaked documents suggest that the new model has surpassed this by a wide margin.[2] The advancements appear most pronounced in the areas of software engineering and academic reasoning, where the model demonstrates a vastly improved capacity for long-horizon planning. Rather than simply predicting the next token in a sequence, the system appears to utilize enhanced chain-of-thought processes that allow it to catch its own errors and refine its logic before presenting a final output.
However, the leap in intelligence has created a profound dilemma for Anthropic regarding the safe deployment of such powerful technology.[4][5] Internal safety reports included in the leak warn that Mythos possesses unprecedented cybersecurity capabilities that could outpace current defensive measures.[1] The documents suggest the model is far ahead of any other existing AI in its ability to identify and exploit software vulnerabilities. While this makes the model an invaluable tool for security researchers looking to harden their systems, it also presents a significant risk if the technology were to be misused. Anthropic’s own internal red-teaming exercises reportedly found that the model could be used to automate the creation of sophisticated exploits at a speed and scale that would overwhelm human defenders. This has led the company to adopt an extremely cautious rollout strategy, limiting initial access to organizations focused exclusively on cyber defense.[2]
The timing of this revelation is particularly impactful as the race for artificial general intelligence enters a more aggressive phase. Anthropic’s chief rival, OpenAI, is reportedly preparing for its own major release, codenamed Spud, which aims to achieve similar breakthroughs in autonomous reasoning. Both companies are operating under immense pressure from investors as they approach potential initial public offerings later this year. Anthropic recently updated its revenue forecasts to reflect an anticipated eighteen billion dollar annual run rate, a target that depends heavily on maintaining a technological lead in the enterprise market.[6] The "Capybara" tier is expected to be significantly more expensive to operate than previous models, targeting high-value industrial and scientific use cases where the cost of compute is secondary to the quality of the output.
The irony of the data breach has not been lost on industry observers, given that the details of a model capable of advanced cyber defense were exposed by a basic human error in a content management system.[2] This incident underscores a persistent vulnerability in the AI sector: while these companies are building increasingly sophisticated digital brains, the surrounding infrastructure remains subject to the same mundane security flaws that plague the rest of the tech industry. The breach also highlights the intense secrecy surrounding the frontier of AI development, where a single misconfigured setting can reveal the highly guarded benchmarks and codenames that define a company’s competitive edge.
As Anthropic moves toward a formal announcement, the focus remains on whether the company can bridge the gap between building highly capable systems and ensuring they remain under human control. The leaked documents reveal a company that is acutely aware of the risks it is creating. One draft post noted that the upcoming wave of models will likely "presage an era where AI can exploit vulnerabilities in ways that far outpace the efforts of defenders."[1][2] This suggests that the next generation of AI will not just be more helpful assistants, but autonomous agents with the capacity to reshape the digital landscape. Whether the industry is prepared for the arrival of "step change" reasoning remains an open question, but the accidental window into Anthropic’s research labs confirms that the transition is already underway.
The broader implications for the AI market are substantial, as the shift from conversational AI to reasoning-based agents marks the beginning of a second era in the generative boom. Large-scale enterprises are increasingly looking for systems that can perform independent work without constant human supervision, and the capabilities described in the Mythos leak suggest that Anthropic is nearing that goal. However, the requirement for more compute and the higher pricing of the new tier indicate that the divide between general-purpose consumer AI and high-end industrial intelligence is widening. As the company works to close the security loop that led to this exposure, the industry will be watching closely to see if Mythos lives up to the internal hype or if the challenges of controlling such a powerful system will continue to delay its public debut.

Sources
Share this article