AI Tech SuiteDiscover AI Tools, News, and Jobs

Alibaba Qwen 3.5 outperforms GPT-5 mini in reasoning while slashing costs with open weights

Alibaba’s Qwen 3.5 series challenges proprietary giants, delivering frontier-class reasoning and agentic utility through an efficient open-weight architecture.

February 26, 2026

Alibaba Qwen 3.5 outperforms GPT-5 mini in reasoning while slashing costs with open weights

The release of the Qwen 3.5 model series by Alibaba Cloud’s Tongyi Lab marks a definitive shift in the global artificial intelligence landscape, signaling that the era of raw parameter scaling is being superseded by a focus on architectural efficiency and agentic utility.[1] By introducing a comprehensive family of models including Qwen 3.5-Flash, Qwen 3.5-35B-A3B, Qwen 3.5-122B-A10B, and the dense Qwen 3.5-27B, Alibaba has effectively challenged the dominance of proprietary "frontier" models from United States-based labs. Most notably, the series takes direct aim at OpenAI’s GPT-5 mini and Anthropic’s Claude Sonnet 4.5, offering comparable or superior performance in critical reasoning and multimodal tasks at a fraction of the computational and financial cost.[1] This strategic deployment under the permissive Apache 2.0 license underscores Alibaba’s ambition to democratize high-performance AI while cementing its role as a primary architect of the open-weights ecosystem.

At the heart of the Qwen 3.5 series is a sophisticated hybrid architecture that prioritizes "intelligence density" over sheer volume. The series leans heavily into the Mixture-of-Experts (MoE) design, where only a small subset of parameters is activated for any given task.[2] For instance, the flagship Qwen 3.5-122B-A10B possesses 122 billion total parameters but activates only 10 billion per forward pass, allowing it to deliver massive-scale reasoning capabilities with the speed and latency typical of much smaller models. This efficiency is further bolstered by the integration of Gated Delta Networks, a form of linear attention, which dramatically reduces the memory footprint of the Key-Value (KV) cache.[3][1] By repeating linear attention layers between standard global attention blocks, Alibaba has managed to maintain high logical consistency across long-context windows—reaching up to one million tokens in the Flash variant—while ensuring that the computational overhead remains manageable even for developers running models on mid-range hardware.

The performance benchmarks released alongside the model series reveal a significant closing of the gap between open-weights and closed-source proprietary systems. In rigorous testing, the Qwen 3.5-122B-A10B model has demonstrated the ability to outperform GPT-5 mini across a spectrum of diverse evaluations, particularly in STEM reasoning and knowledge-based benchmarks.[4] On the MMLU-Pro test, which measures advanced multidisciplinary knowledge, the 122B model achieved a score of 86.7, surpassing GPT-5 mini’s 83.7.[1][4] The discrepancy is even more pronounced in complex STEM reasoning, where Qwen’s GPQA Diamond score reached 86.6 compared to 82.8 for its OpenAI rival.[1] While the US-based models maintain a narrow edge in certain specialized coding and translation tasks, Alibaba’s models have established a new high-water mark for open-weight visual understanding and agentic planning, proving that smaller, better-architected models can effectively "punch up" against trillion-parameter giants.

Beyond raw intelligence scores, the economic implications of the Qwen 3.5 release are poised to disrupt the API-driven business models of major AI providers.[1] Alibaba has priced the Qwen 3.5-Flash model aggressively, charging just ten cents per million input tokens and forty cents per million output tokens.[5] This pricing strategy places the model at approximately sixty percent cheaper than its predecessors and significantly below the market rate for Claude 4.5 Sonnet or similar mid-range proprietary offerings. For enterprises and startups, this price-to-performance ratio creates an irresistible value proposition, particularly for high-volume production environments that require low latency and high throughput.[1] By providing a production-ready hosted version with official tool integration and massive context support, Alibaba is positioning itself not just as a research entity, but as a critical infrastructure provider for the next generation of AI-native applications.[1]

A core pillar of the Qwen 3.5 strategy is its focus on the "agentic era," where AI models move beyond simple chat interfaces to act as autonomous operators within digital environments. The series introduces "visual agentic" capabilities, allowing the models to perceive and interact with mobile and desktop user interfaces independently.[6][7] This is made possible by an early-fusion multimodal training process where the models are exposed to trillions of multimodal tokens from the start of pre-training, rather than adding vision as an afterthought.[1] The result is a system capable of handling complex, multi-step workflows—such as navigating a multi-page document to extract data or operating a GUI to perform a technical task—with a level of reliability previously seen only in the most expensive proprietary models.[1] This shift toward agentic reasoning suggests that Alibaba views the future of AI not as a search replacement, but as a programmable workforce capable of executing sophisticated tasks with minimal human intervention.[1]

The broader impact of this release on the AI industry cannot be overstated, as it further erodes the "moat" traditionally held by closed-model developers.[1] By releasing the weights for nearly the entire Qwen 3.5 family, Alibaba is empowering a global community of developers to fine-tune, quantize, and deploy frontier-class intelligence on their own terms and infrastructure. This move addresses growing concerns regarding data sovereignty and vendor lock-in, particularly for European and Asian enterprises that are increasingly wary of relying solely on US-based cloud providers.[1] As models like the Qwen 3.5-35B-A3B prove that a model with only three billion active parameters can outperform a previous-generation system six times its size, the industry is witnessing a "Cambrian explosion" of efficiency that could lead to the widespread adoption of local, specialized AI assistants.

In conclusion, the Qwen 3.5 series represents more than just a successful iteration of a popular model family; it is a declaration of architectural and economic maturity in the open-source community. By outperforming established leaders like GPT-5 mini in critical reasoning sectors while undercutting the market on cost, Alibaba has effectively challenged the narrative that top-tier performance requires closed-source secrecy and astronomical spending. The focus on Mixture-of-Experts and hybrid attention mechanisms provides a blueprint for how AI can continue to advance in capability without becoming prohibitively expensive to run. As these models become integrated into everything from autonomous coding agents to enterprise-scale multimodal workflows, the pressure on Western AI labs to justify their high costs and proprietary barriers will likely intensify, further accelerating the move toward a more transparent and accessible global AI ecosystem.[1]