Xiaomi disrupts AI market with MiMo-V2.5-Pro open-source model for sustained autonomous coding
Xiaomi’s trillion-parameter open-weight model empowers long-horizon autonomous agents with unprecedented token efficiency at a fraction of proprietary costs.
May 3, 2026

The landscape of artificial intelligence is undergoing a fundamental shift from conversational assistants to autonomous agents capable of sustained, multi-hour labor.[1][2] Xiaomi has positioned itself at the forefront of this transition with the release of MiMo-V2.5-Pro, a massive open-weight large language model that directly challenges the dominance of top-tier closed systems.[3] By prioritizing token efficiency and long-horizon coherence, the new model aims to match the coding prowess of industry leaders like Anthropic’s Claude Opus 4.6 while operating at a fraction of the computational and financial cost. This release marks a significant moment for the open-source community, providing developers with a frontier-grade tool for complex software engineering that can run autonomously for extended periods without human intervention.
At the heart of MiMo-V2.5-Pro is a sophisticated Mixture-of-Experts architecture designed to balance raw power with operational efficiency. The model boasts a total of 1.02 trillion parameters, yet it remains surprisingly lean during inference by activating only 42 billion parameters for any given request.[4][2][5] This sparse design is augmented by a hybrid attention mechanism that interleaves local sliding-window attention with global attention at a six-to-one ratio.[6][7] This structural choice reduces the memory overhead of the key-value cache by nearly seven times, a critical factor for maintaining performance across the model's expansive one-million-token context window.[7][6] To further enhance throughput, Xiaomi integrated a multi-token prediction module that allows the system to generate multiple tokens per step, effectively tripling output speeds and accelerating the reinforcement learning loops required for complex task completion.
The primary competitive advantage claimed by Xiaomi is a dramatic reduction in token consumption. In benchmark evaluations on the ClawEval suite—a metric designed to measure agentic performance—MiMo-V2.5-Pro achieved a 64 percent success rate while consuming 40 to 60 percent fewer tokens than Western rivals such as Claude Opus 4.6 and GPT-5.4.[7][3] This efficiency is not merely a matter of brevity but a reflection of what researchers call harness awareness, where the model actively manages its own memory and structures its reasoning to avoid the repetitive or wandering trajectories that plague many current-generation models. By reaching frontier-tier capability with a smaller token footprint, the model significantly alters the cost-to-benefit ratio for enterprises looking to deploy autonomous coding agents at scale.[8]
In practical applications, the model has demonstrated a remarkable capacity for sustained autonomous work, moving beyond simple code snippets to the construction of entire functional systems.[8][9][4] In internal demonstrations, MiMo-V2.5-Pro was tasked with building a full SysY compiler in the Rust programming language—a project typically assigned to advanced computer science students over several weeks.[7] The model completed the task in 4.3 hours, making 672 sequential tool calls and achieving a perfect score against a hidden test suite.[10][4][11][5] Rather than relying on a trial-and-error approach, the agent first established a comprehensive architectural scaffold for the entire pipeline, then methodically implemented each layer.[5][7] In another test, it developed an eight-thousand-line desktop video editor from a few initial prompts, running autonomously for 11.5 hours and managing nearly two thousand tool calls to navigate a complex web of dependencies and refactors.
The release of MiMo-V2.5-Pro intensifies an ongoing rivalry among Chinese AI providers, particularly between Xiaomi and DeepSeek.[5] The MiMo project is led by Luo Fuli, a former core contributor to the DeepSeek architecture, whose influence is evident in the model's focus on sparse efficiency and agentic reasoning. Prior to its official launch, a version of the model was quietly tested on public API platforms under the codename Hunter Alpha, where it quickly rose to the top of usage rankings. By releasing the weights under a permissive MIT license, Xiaomi is pursuing a strategy of volume and democratization, betting that the future of AI lies in open ecosystems where developers can fine-tune and host models locally to preserve data privacy and minimize latency.
For the broader AI industry, the implications of this release are profound. The focus of competition is shifting from raw benchmark scores toward the sustainability of autonomous workflows.[2][8][5] For years, the metric of success was the one-shot response—how accurately a model could answer a single prompt. Xiaomi’s latest effort suggests that the new gold standard is the multi-hour session, where a model must maintain coherence, follow complex instructions, and self-correct across thousands of steps. This move toward agent-centric design is particularly relevant for the software engineering sector, where the goal is no longer just to assist a developer but to act as an independent contributor capable of handling end-to-end project lifecycles.
The economic impact of such models cannot be overstated. By pricing the model at one dollar per million input tokens and three dollars per million output tokens, Xiaomi is undercutting the established pricing structures of closed-model providers by a significant margin. When coupled with the inherent token efficiency of the architecture, the total cost of running a complex coding project drops from hundreds of dollars to just a few. This democratization of high-level reasoning allows smaller startups and independent developers to build sophisticated automation tools that were previously the exclusive domain of large technology corporations with massive RAG-ready infrastructures.
Furthermore, the model’s ability to handle multimodal inputs within a unified architecture—processing text, images, and video in a single pass—extends its utility beyond pure coding into fields like hardware design and industrial automation. Xiaomi’s background as a hardware giant is reflected in the model’s training data, which includes specialized sets for circuit simulation and supply chain logic. This vertical integration allows the model to reason about physical systems with the same precision it applies to software repositories, making it a versatile "brain" for complex, real-world engineering environments.
In conclusion, MiMo-V2.5-Pro represents a maturation of the open-weight model landscape. It provides a credible alternative to the most advanced proprietary systems while introducing a new emphasis on the longevity and cost-efficiency of autonomous agents.[3][2] As the industry continues to move away from simple chatbots and toward integrated digital workers, the ability of a model to stay on task for hours and manage its own computational resources will likely become the defining characteristic of intelligence. Xiaomi’s successful fusion of a trillion-parameter scale with a lightweight active footprint suggests that the path to artificial general intelligence may be paved with efficiency as much as with raw power. This release challenges Western incumbents to rethink their closed-garden strategies and underscores the rising influence of open-source innovation in the global race for AI leadership.
Sources
[1]
[2]
[3]
[4]
[7]
[10]
[11]