Anthropic's Claude 4.6 Achieves Million-Token Context, Crushing Enterprise Benchmarks
A million-token context and improved reliability define the new standard for long-horizon enterprise AI.
February 5, 2026

Anthropic has once again escalated the competitive arms race in frontier artificial intelligence with the release of Claude Opus 4.6, its new flagship large language model, which introduces a one million token context window. This monumental expansion in the model’s working memory is a defining feature that moves the industry closer to truly autonomous, long-horizon AI applications, effectively doubling down on the memory capacity that has become a key battleground among leading AI developers. The context window, which dictates how much information a model can process and remember in a single interaction or conversation, has grown five-fold from the 200,000 tokens supported by its predecessor, Claude Opus 4.5. This leap allows the model to ingest the equivalent of approximately 1,500 pages of text, or an entire large codebase, a development that Anthropic states significantly enhances its reliability in locating and reasoning over massive document collections.[1][2]
The key innovation in Opus 4.6 is not just the sheer size of the context window, which is currently available in a beta rollout, but the model's demonstrated ability to overcome the longstanding problem of "context rot." This performance degradation, where models begin to "forget" or miss crucial details buried within the middle of a very long text input, has plagued previous large-context models. Anthropic's internal evaluations on the MRCR v2 benchmark, a test specifically designed to gauge a model's ability to retrieve "hidden" information in vast amounts of text, show a dramatic improvement. On the 8-needle 1M variant of this test, Opus 4.6 achieved a score of 76 percent, a qualitative shift compared to the 18.5 percent score of the older Sonnet 4.5 model on the same long-context task. This improvement suggests the company has successfully engineered the model to maintain focus and attention across hundreds of thousands of tokens, making it a reliable tool for enterprise-scale document analysis and research.[3][4]
The increased memory and improved retrieval capabilities translate directly into a substantial upgrade for professional and enterprise workflows. The model exhibits state-of-the-art performance across several evaluations focusing on economically valuable knowledge work, including finance, legal, and other complex domains. On the GDPval-AA benchmark, which assesses performance on real-world professional tasks, Opus 4.6 outperformed one of the industry's next-best models, OpenAI’s GPT-5.2, by a margin of approximately 144 Elo points, and it surpassed its own predecessor, Opus 4.5, by 190 points. Furthermore, the model has been optimized for agentic search and coding, leading all frontier models on the BrowseComp evaluation, which measures the ability to locate hard-to-find information through multi-step online searches. The new capabilities include generating teams of agents in Claude Code that can collaborate in parallel on complex software development projects, a feature designed to handle large codebases and long-horizon tasks more effectively than a single, sequential agent.[3][5][6]
The introduction of the one million token context window throws down a gauntlet in the competitive landscape, positioning Claude Opus 4.6 as a direct and formidable rival to other models with similar long-context capabilities, such as Google’s Gemini 1.5 Pro and Flash, which also support up to a million tokens per prompt. This ongoing competition around context size marks a new phase in the development of large language models, shifting the focus from simply raw parameter count to the practical utility of long-term memory and complex reasoning. For end-users and developers, the availability of such an immense context window removes critical barriers that previously limited the adoption of LLMs for applications requiring comprehensive analysis of large datasets, like an entire legal brief, a lengthy financial report, or a full repository of medical records. Despite the significant capability gains, Anthropic has maintained its standard pricing for the new model, charging $5 per million input tokens and $25 per million output tokens for the standard context up to 200,000 tokens, with premium rates applying to prompts exceeding that threshold, an aggressive strategy aimed at accelerating enterprise adoption.[1][6][4][7]
The broader implications for the AI industry are profound, suggesting a future where LLMs function not merely as conversational partners but as powerful, persistent collaborators capable of managing and executing highly complex, multi-step tasks over extended periods. The ability of Opus 4.6 to maintain context and quality across massive projects is poised to make AI a more practical and dependable tool for sustained, high-stakes work, especially in areas like financial analysis, deep research, and the creation of detailed documents, spreadsheets, and presentations. The company has also announced deep integrations with common office tools, including a preview of Claude in PowerPoint that can automatically match corporate branding and generate full presentations from a simple description, further cementing its strategy to embed AI directly into familiar professional workflows. While the race for even larger context windows continues, as seen with models supporting context windows up to ten million tokens in research settings, Anthropic's breakthrough with Claude Opus 4.6 validates the immediate, real-world utility of a million-token capacity, transforming what an AI's "working memory" means for the global enterprise market.[3][1][6][8]