Anthropic's Sonnet 4.5 Delivers 30-Hour Autonomous AI Coding Breakthrough
Anthropic's Sonnet 4.5 redefines AI coding, working autonomously for 30+ hours on complex tasks and leading benchmarks.
September 29, 2025

Anthropic has unveiled its latest artificial intelligence model, Claude Sonnet 4.5, signaling a significant leap forward in the realm of AI-driven coding and complex task execution. The company boldly claims it is the "best coding model in the world," a declaration supported by benchmark data and, most notably, by the model's remarkable ability to operate autonomously on intricate software engineering tasks for more than 30 hours.[1][2][3][4] This breakthrough in endurance and capability positions Sonnet 4.5 not merely as an incremental upgrade but as a potentially transformative tool for developers and a major stride towards more sophisticated, long-running AI agents. The new model arrives with a suite of enhancements to the Claude ecosystem, including new features for its coding assistant, and is being deployed across major platforms, signaling Anthropic's aggressive push to lead in the highly competitive AI landscape.
The standout feature of Claude Sonnet 4.5 is its profound enhancement in handling long-duration, complex coding challenges. Early trials have demonstrated the model can maintain focus and performance for over 30 hours on multi-step projects, a massive increase from the roughly seven-hour capacity of its predecessor, Claude Opus 4.[4][5][6] This extended operational capacity directly addresses a critical limitation of previous AI models, which often struggled with maintaining context and coherence over long periods. For developers, this translates into the ability to delegate substantial, codebase-spanning tasks, such as major refactoring, complex bug hunting, or even architectural planning, freeing up engineers to focus on higher-level strategic work.[2] This endurance is complemented by new features in the Claude Code environment, including "checkpoints," a highly requested function that allows users to save snapshots of their work and revert to previous states if a line of inquiry proves fruitless.[7][1]
Beyond its stamina, Anthropic substantiates its "best in the world" claim with impressive performance on industry-standard benchmarks. Claude Sonnet 4.5 has set a new state-of-the-art score on the SWE-bench Verified evaluation, a test that measures an AI's ability to resolve real-world software engineering issues from GitHub.[3][6] On this rigorous test, it has outperformed leading models from competitors, including GPT-5 and Gemini 2.5 Pro.[3][5] Furthermore, the model has shown a significant leap in its ability to use computer systems as a human would. On the OSWorld benchmark, which evaluates performance on real-world computer tasks, Sonnet 4.5 achieved a score of 61.4%, a substantial jump from Sonnet 4's previous leading score of 42.2%.[2][5] These quantitative successes are backed by qualitative improvements in reasoning, instruction-following, and the generation of production-ready code, according to early partners who have seen significant performance gains in their own systems.[8][2]
The implications of Sonnet 4.5 extend beyond just writing code; they point toward a future of more capable and reliable AI agents. Anthropic has emphasized that the model is its strongest yet for building complex agents that can interact with software tools and execute tasks on a user's behalf.[7][2][9] The model's improved capacity for multi-step reasoning, context management, and using parallel tool calls allows it to tackle sophisticated workflows in fields like financial analysis, cybersecurity, and scientific research.[2][9][4] In tandem with its power, Anthropic asserts that Sonnet 4.5 is its "most aligned" model to date, with extensive safety training to reduce undesirable behaviors like deception or power-seeking and to better defend against prompt injection attacks, a critical concern for agentic systems.[7][2][5] This focus on safety and alignment is crucial as AI models become more autonomous and integrated into critical business processes.
In conclusion, the launch of Claude Sonnet 4.5 represents a major milestone in the development of AI for software engineering and autonomous systems. By dramatically extending the duration for which a model can handle complex tasks and by setting new performance standards, Anthropic has raised the bar for what developers can expect from an AI collaborator. The model's availability across the Claude API, developer platforms like Amazon Bedrock and Google Cloud's Vertex AI, and integrations with tools such as GitHub Copilot ensures its capabilities will be widely accessible.[8][10][11] While the rapid pace of AI development means its reign at the top may be challenged soon, Sonnet 4.5's demonstration of a model that can work reliably for more than a standard workday marks a pivotal moment, accelerating the industry's move from simple AI assistants to powerful, persistent, and highly capable AI agents.
Sources
[1]
[3]
[5]
[9]
[10]
[11]