Frontier AI models outpace safety benchmarks and execute autonomous network breaches at machine speed

Frontier AI is outpacing safety benchmarks while enabling autonomous cyberattacks that can breach secure networks in minutes

May 10, 2026

Frontier AI models outpace safety benchmarks and execute autonomous network breaches at machine speed
The rapid evolution of frontier artificial intelligence has reached a critical inflection point where the capabilities of the latest models are beginning to outpace the industry's ability to reliably measure them.[1][2][3][4] According to a series of recent technical evaluations and cybersecurity reports, the emergence of highly advanced systems is creating a widening gap between what AI can do and what safety frameworks can catch.[1][2] The Model Evaluation and Threat Research organization, known as METR, recently disclosed that its current suite of tests is barely sufficient to assess the full performance range of Anthropic’s new Claude Mythos Preview.[2][3] Simultaneously, security leader Palo Alto Networks has issued a stark warning regarding the shift from AI-assisted human hacking to truly autonomous AI attackers capable of executing complex breach sequences in a fraction of the time required by traditional adversaries.[5]
The technical community is particularly concerned with the latest findings from METR, a non-profit dedicated to assessing whether AI systems pose catastrophic risks. In its recent assessment of Claude Mythos, METR reported that the model has effectively hit the ceiling of its existing "time horizon" methodology.[2][3] This metric defines the length of a task an AI agent can complete with a 50 percent success rate, using human-expert completion time as a baseline.[3][6] While previous state-of-the-art models were measured in the range of twelve hours, Mythos has demonstrated a 50 percent success rate on tasks that would take a human expert at least 16 hours to finish. However, the organization admitted that its testing suite is structurally unprepared for this level of competence.[3] Out of 228 specialized tasks in the current METR battery, only five cover the capability range required to measure models operating at or above this 16-hour threshold.[2][3] This scarcity of high-difficulty tasks makes quantitative measurements increasingly unstable and less meaningful, signaling that the "road" for traditional AI benchmarking is running out.[3]
This measurement bottleneck arrives at a moment when the offensive capabilities of these models are becoming dangerously practical. Palo Alto Networks’ Unit 42 research team recently detailed the rise of "autonomous operator" AI, a category of models that no longer requires step-by-step human guidance to navigate a corporate network. Their testing revealed that frontier models are now proficient at "vulnerability chaining"—the process of identifying multiple low-severity flaws and linking them into a single critical attack path.[5][4] In one documented scenario, an AI agent was given a single prompt to exfiltrate sensitive data from a cloud environment.[7] The model autonomously mapped the infrastructure, identified a server-side request forgery vulnerability, extracted authentication tokens, escalated its own privileges, and moved the data to an external storage bucket.[7] The entire lifecycle of the attack, from initial access to data exfiltration, was compressed into just 25 minutes.[2][1][4] For context, typical human-led attacks or semi-automated scans often take hours or days to reach the same stage.
The speed of these AI-driven breaches represents a fundamental collapse in the traditional defense timeline. Cybersecurity experts argue that the industry-standard "mean time to respond" is no longer a viable metric when an attacker operates at machine speed.[1][4] Most security operations centers are built around human workflows that assume a window of several hours to detect and contain a breach. However, with the 25-minute attack cycle demonstrated by frontier models, the window for human intervention has effectively vanished. Palo Alto Networks noted that in its early-access testing of these models, three weeks of AI-assisted code analysis matched the depth and coverage of a full year of manual penetration testing.[5][8][2] This suggests that the barrier to entry for high-level cyber espionage is falling, as models gain the "intuitive" understanding of software logic necessary to bypass traditional signature-based scanners.
The broader implications for the AI industry are tied to a worrying trend: the speed of model development is significantly outstripping the speed of evaluation and defense.[1] Industry data shows that the median gap between major model releases has plummeted to just 11 days, creating a relentless cycle of capability jumps that leaves researchers struggling to update safety benchmarks. This "capability discontinuity" is what led Anthropic to restrict access to Claude Mythos Preview, making it available only to a select group of defensive partners through an initiative called Project Glasswing.[9][10] The coalition, which includes technology giants and financial institutions, is using the model specifically to find and patch zero-day vulnerabilities in critical infrastructure before malicious actors can develop similar capabilities. The decision to withhold a model from public release based on its cybersecurity proficiency is a rare move in an industry otherwise defined by an aggressive race to market.
This tension between innovation and safety is further complicated by what researchers call the "jagged" nature of AI capabilities. While a model like Mythos may struggle with certain simple real-world nuances, it may simultaneously possess world-class skills in specific, high-risk domains like C++ compilation or kernel-level exploit development. This unevenness makes it difficult for organizations to predict exactly where their vulnerabilities lie. METR’s struggle to find enough 16-hour tasks highlights that as AI agents become capable of managing long-term, multi-step projects, the complexity of those projects becomes so high that even human experts find it difficult to design and verify the tests. We are entering an era where the evaluator must be as intelligent and as fast as the system being evaluated, a requirement that current third-party audit infrastructure is not yet meeting.
The shift toward agentic AI—systems that can set their own sub-goals and iterate on failures—means that the unsupervised attack surface is expanding. As local AI agents become more common on employee desktops, every workstation effectively gains the power of a server capable of generating and deploying complex code.[4][8][5] Security leaders warn that most organizations currently have zero visibility into the code being generated by their own workforce’s AI tools, let alone the code being deployed by external automated threats. The consensus among cybersecurity professionals is that the only viable response to machine-speed offense is machine-speed defense. This requires a transition to autonomous security platforms that can identify and remediate exposures in real-time, effectively fighting AI with AI.
Ultimately, the warnings from both METR and Palo Alto Networks suggest that the era of treating AI as a mere productivity assistant is over. The technology has matured into an autonomous operator capable of navigating complex systems with a level of persistence and speed that traditional security architectures were never designed to withstand.[1] As the industry moves toward even more powerful models, the priority is shifting from simply increasing performance to building the sophisticated measurement and defense tools necessary to keep pace with the frontier.[1] The primary challenge is no longer just the existence of the models, but the fact that our methods for understanding and controlling them are falling behind. Without a significant investment in the infrastructure of AI evaluation and real-time autonomous defense, the window of safety that organizations currently enjoy may soon close entirely.

Sources
Share this article