Alibaba's Qwen3.7-Max AI autonomously optimizes its own custom silicon in 35-hour marathon

Alibaba’s new Qwen3.7-Max model autonomously optimizes custom silicon over a 35-hour run, signaling a shift toward self-improving AI.

May 23, 2026

Alibaba's Qwen3.7-Max AI autonomously optimizes its own custom silicon in 35-hour marathon
The global race for artificial intelligence supremacy has shifted from conversational chatbots to autonomous agents capable of independent, long-horizon decision-making[1][2]. In a major milestone for this new era, Alibaba's Qwen team has released Qwen3.7-Max, a proprietary flagship model engineered specifically for complex, multi-step workflows[3][2]. To demonstrate the model's capabilities, Alibaba showcased Qwen3.7-Max executing a fully autonomous, 35-hour programming task to optimize code for the company's own custom silicon[2][4]. Operating entirely without human intervention, the AI successfully managed thousands of tool calls and iteratively solved complex engineering problems[5][2]. This milestone marks a major turning point in the integration of AI-driven software development and hardware optimization[6][7].
The core showcase of the model’s stamina took place in an isolated server environment equipped with the Zhenwu M890, an AI training and inference processor designed by T-Head, Alibaba's semiconductor design subsidiary[5][4]. Qwen3.7-Max was tasked with optimizing the performance of an attention kernel using the Triton programming language[4][8]. Crucially, the model had no access to the processor's official architecture documentation or performance analysis data beforehand[8]. Over the course of 35 continuous hours, the model operated entirely on its own, executing 1,158 distinct tool calls and performing 432 separate kernel evaluations[4][8]. It diagnosed compilation failures, adjusted code on the fly, and systematically eliminated bottlenecks to achieve a tenfold geometric mean speedup on the target operator[4][8].
This marathon optimization run proceeded through several highly technical, self-directed stages[8]. First, the model partitioned the prefix key-value cache along the token dimension using a Split-K technique, allowing it to fully utilize all 36 streaming multiprocessor cores of the custom chip[8]. It then replaced latency-heavy host-device synchronizations with pre-allocated variables[8]. Next, the model eliminated synchronous memory copies used to query prefix lengths by leveraging tensor metadata, removing host-device communication overhead entirely[8]. In its final tuning stage, Qwen3.7-Max restructured the operator to process all four query tokens within a single thread block, sharing memory loads to amortize access costs[8]. This sequence of actions demonstrated an unprecedented level of logical reasoning and hardware-level understanding[8].
The performance of Qwen3.7-Max on this task highlights a widening gap between frontier models designed for autonomous agency and traditional systems[2]. In identical tests, other prominent Chinese AI models struggled to match its stamina[4][8]. While Qwen3.7-Max achieved a tenfold performance improvement, alternative systems like GLM 5.1 and Kimi K2.6 capped out at speedups of 7.3 times and 5.0 times, respectively[4][8]. Meanwhile, DeepSeek V4 Pro achieved only a 3.3 times speedup before terminating the task prematurely after failing to initiate tool calls over several consecutive rounds[8]. This stark difference underscores the importance of optimizing model architectures specifically for long-horizon loops rather than simple text generation[7].
To achieve this level of persistence, Alibaba engineered Qwen3.7-Max with a specialized dual-mode architecture[9]. Under this system, the model can dynamically switch between a thinking mode and a non-thinking mode[9]. The thinking mode is deployed for deep reasoning, advanced coding, and multi-step tasks, allowing the model to actively plan and self-correct[9]. For lightweight tasks requiring rapid responses, the model switches to its non-thinking mode to save computational resources[9]. Additionally, the Qwen team utilized cross-framework reinforcement learning during training, decoupling tasks, execution frameworks, and validators[8]. This approach ensures that the model is scaffold-agnostic, performing consistently whether integrated into proprietary platforms or external developer environments like Claude Code and OpenClaw[2][8].
Alibaba is also demonstrating that the agentic capabilities of Qwen3.7-Max extend beyond virtual environments into physical engineering. In robotic demonstrations, developers used the model to autonomously steer and control a complex four-legged robot[10]. By translating high-level natural language instructions into real-time motor commands and spatial planning loops, the model showcased its capacity to act as a unified brain for physical automation. This cross-domain adaptability, from low-level GPU kernel programming to real-time physical robotics, positions the model as a versatile foundation for both software systems and advanced industrial automation[3].
Industry evaluations confirm the model's competitive standing against the world’s leading AI systems[10]. On the Artificial Analysis Intelligence Index, Qwen3.7-Max scored 56.6, placing it fifth globally and ranking as the top model originating from China[6][11]. In direct benchmarks designed for coding agents, such as Terminal-Bench 2.0, Qwen3.7-Max scored 69.7, outperforming Western rivals like Anthropic's Claude Opus 4.6, which scored 65.4[2][12]. It also achieved a strong score of 80.4 on the SWE-bench Verified coding benchmark[2][12]. Notably, Alibaba’s decision to release Qwen3.7-Max as a proprietary, hosted model accessible only via paid APIs represents a strategic shift[1][3]. While previous iterations of the Qwen family were celebrated for their open-weight releases, the massive financial costs associated with training and running frontier-level agentic models have pushed Alibaba to adopt a commercial model similar to those of Western tech giants[1].
The release of Qwen3.7-Max coincided with a broader, full-stack AI infrastructure upgrade by Alibaba[5]. Alongside the model, the cloud giant introduced the Panjiu AL128 Supernode Server, designed to support scalable agent inference and massive model training workloads[5]. By pairing its specialized software models with proprietary physical hardware like the Zhenwu M890 processor, Alibaba is building a closed-loop ecosystem[5][6]. This integration allows the software to be tuned precisely to the underlying silicon, and, as demonstrated by the 35-hour autonomous run, enables the software to play an active role in optimizing its own physical execution layer[5][6].
The implications of Alibaba’s latest development stretch far beyond the immediate performance gains of a single chip. As AI systems transition from passive assistants to active, autonomous engineers capable of multi-day operations, the landscape of software development and system architecture will inevitably transform[1][2]. The successful 35-hour autonomous run of Qwen3.7-Max demonstrates that AI is no longer just a tool utilized by human developers, but an independent agent capable of conducting its own hardware-level optimization[6][7]. This self-improving loop of software optimizing the hardware it runs on could dramatically accelerate the pace of technological advancement, redefining the future of both the global semiconductor industry and enterprise automation.

Sources
Share this article