Anthropic's Claude Sonnet 4.5 Claims World's Best AI Coder Title, Boosts Autonomy

Anthropic's Claude Sonnet 4.5, boasting benchmark dominance and 30-hour autonomy, aims to revolutionize AI-assisted software development.

September 29, 2025

Anthropic's Claude Sonnet 4.5 Claims World's Best AI Coder Title, Boosts Autonomy
In a significant move that escalates the already fierce competition in the artificial intelligence sector, Anthropic has launched Claude Sonnet 4.5, a new model the company is promoting as the "best coding model in the world."[1] This claim is largely predicated on its top scores on the SWE-bench Verified evaluation, a benchmark designed to test an AI's ability to resolve real-world software engineering problems sourced from GitHub.[1][2][3] The release is not just an incremental update; it represents a strategic push by Anthropic to dominate the lucrative market of AI-assisted software development, targeting both enterprise clients and individual developers with a suite of new tools and enhanced capabilities.[4] Sonnet 4.5's launch underscores a rapid acceleration in the development of AI models that can understand and write complex code, a key battleground for major players like Anthropic, OpenAI, and Google.
At the core of Anthropic's announcement is Sonnet 4.5's performance on industry benchmarks. The model achieved a score of 77.2% on SWE-bench Verified, outperforming its predecessor, Opus 4.1, which scored 74.5%, and competitors such as GPT-5 Codex and GPT-5.[1] The SWE-bench is a challenging test that requires a model to comprehend intricate codebases and coordinate changes across multiple files to resolve genuine GitHub issues.[5] However, it is important to note that the SWE-bench Verified benchmark has faced some criticism within the AI community. Some researchers have pointed out potential issues such as "solution leakage," where the problem description might inadvertently contain the solution, and the presence of weak test cases that may not fully validate a proposed fix.[6] There have also been claims that some models achieve high scores by finding existing bug fixes on GitHub rather than generating novel solutions.[7] Despite these concerns, Sonnet 4.5’s high score is a notable achievement. The model also excels in other areas, setting a new record of 61.4% on the OSWorld benchmark, which evaluates an AI's ability to perform real-world tasks on a computer, a significant jump from Sonnet 4's 42.2%.[2]
A key advancement highlighted by Anthropic is Sonnet 4.5's enhanced "agentic" capabilities, referring to the model's ability to work autonomously on complex, multi-step tasks for extended periods. The company reports that Sonnet 4.5 can operate for over 30 hours on a single project while maintaining focus and performance, a substantial increase from the approximately seven hours possible with previous models.[8][2] This extended autonomy is crucial for the development of AI agents that can function as reliable "AI colleagues" for software engineers, capable of handling significant portions of the development lifecycle with minimal human intervention. To support these new capabilities, Anthropic has released a suite of developer-focused tools, including a Claude Agent SDK that allows developers to build their own complex agents using the same infrastructure that powers Anthropic's products.[1] Other new features include "checkpoints" in Claude Code, which allow users to save and revert to previous states of their work, a refreshed terminal interface, and a native VS Code extension.[2]
The release of Sonnet 4.5 places Anthropic in direct and intensified competition with other major AI labs. While Sonnet 4.5 has shown impressive performance on certain benchmarks, the competitive landscape is incredibly dynamic, with rivals like OpenAI's GPT-5 and Google's Gemini 2.5 Pro also demonstrating powerful coding abilities.[9][3] Some early hands-on comparisons suggest that Sonnet 4.5 is noticeably faster than competitors like GPT-5 Codex in tasks such as code review.[10] However, the choice of the "best" model often depends on the specific use case, with some developers preferring one model for daily tasks and another for more complex debugging.[11] Anthropic is also emphasizing the safety and alignment of Sonnet 4.5, claiming it is their "most aligned frontier model" yet, with significant reductions in undesirable behaviors like deception and "power-seeking."[1] This focus on safety is a key part of Anthropic's brand identity and could be a significant differentiator for enterprise customers in regulated industries.
In conclusion, the launch of Claude Sonnet 4.5 marks a significant milestone in the evolution of AI for software development. Anthropic's bold claims, backed by strong benchmark performances, position the new model as a formidable contender in the market. The enhanced agentic capabilities and new developer tools have the potential to change how software is built, moving closer to a future where AI agents are integral members of development teams. While the competitive landscape remains fluid and the true long-term impact of Sonnet 4.5 is yet to be seen, its release is a clear indication that the race to create the ultimate AI coding assistant is far from over. The industry will be watching closely to see how developers adopt these new tools and how competitors respond to Anthropic's latest challenge.

Sources
Share this article