AI Tech SuiteDiscover AI Tools, News, and Jobs

Alibaba's Qwen3.7-Max AI autonomously optimizes its own custom silicon in 35-hour marathon

Alibaba’s new Qwen3.7-Max model autonomously optimizes custom silicon over a 35-hour run, signaling a shift toward self-improving AI.

May 23, 2026

Alibaba's Qwen3.7-Max AI autonomously optimizes its own custom silicon in 35-hour marathon

The global race for artificial intelligence supremacy has shifted from conversational chatbots to autonomous agents capable of independent, long-horizon decision-making[1][2]. In a major milestone for this new era, Alibaba's Qwen team has released Qwen3.7-Max, a proprietary flagship model engineered specifically for complex, multi-step workflows[3][2]. To demonstrate the model's capabilities, Alibaba showcased Qwen3.7-Max executing a fully autonomous, 35-hour programming task to optimize code for the company's own custom silicon[2][4]. Operating entirely without human intervention, the AI successfully managed thousands of tool calls and iteratively solved complex engineering problems[5][2]. This milestone marks a major turning point in the integration of AI-driven software development and hardware optimization[6][7].

The core showcase of the model’s stamina took place in an isolated server environment equipped with the Zhenwu M890, an AI training and inference processor designed by T-Head, Alibaba's semiconductor design subsidiary[5][4]. Qwen3.7-Max was tasked with optimizing the performance of an attention kernel using the Triton programming language[4][8]. Crucially, the model had no access to the processor's official architecture documentation or performance analysis data beforehand[8]. Over the course of 35 continuous hours, the model operated entirely on its own, executing 1,158 distinct tool calls and performing 432 separate kernel evaluations[4][8]. It diagnosed compilation failures, adjusted code on the fly, and systematically eliminated bottlenecks to achieve a tenfold geometric mean speedup on the target operator[4][8].

This marathon optimization run proceeded through several highly technical, self-directed stages[8]. First, the model partitioned the prefix key-value cache along the token dimension using a Split-K technique, allowing it to fully utilize all 36 streaming multiprocessor cores of the custom chip[8]. It then replaced latency-heavy host-device synchronizations with pre-allocated variables[8]. Next, the model eliminated synchronous memory copies used to query prefix lengths by leveraging tensor metadata, removing host-device communication overhead entirely[8]. In its final tuning stage, Qwen3.7-Max restructured the operator to process all four query tokens within a single thread block, sharing memory loads to amortize access costs[8]. This sequence of actions demonstrated an unprecedented level of logical reasoning and hardware-level understanding[8].

The performance of Qwen3.7-Max on this task highlights a widening gap between frontier models designed for autonomous agency and traditional systems[2]. In identical tests, other prominent Chinese AI models struggled to match its stamina[4][8]. While Qwen3.7-Max achieved a tenfold performance improvement, alternative systems like GLM 5.1 and Kimi K2.6 capped out at speedups of 7.3 times and 5.0 times, respectively[4][8]. Meanwhile, DeepSeek V4 Pro achieved only a 3.3 times speedup before terminating the task prematurely after failing to initiate tool calls over several consecutive rounds[8]. This stark difference underscores the importance of optimizing model architectures specifically for long-horizon loops rather than simple text generation[7].

To achieve this level of persistence, Alibaba engineered Qwen3.7-Max with a specialized dual-mode architecture[9]. Under this system, the model can dynamically switch between a thinking mode and a non-thinking mode[9]. The thinking mode is deployed for deep reasoning, advanced coding, and multi-step tasks, allowing the model to actively plan and self-correct[9]. For lightweight tasks requiring rapid responses, the model switches to its non-thinking mode to save computational resources[9]. Additionally, the Qwen team utilized cross-framework reinforcement learning during training, decoupling tasks, execution frameworks, and validators[8]. This approach ensures that the model is scaffold-agnostic, performing consistently whether integrated into proprietary platforms or external developer environments like Claude Code and OpenClaw[2][8].

Alibaba is also demonstrating that the agentic capabilities of Qwen3.7-Max extend beyond virtual environments into physical engineering. In robotic demonstrations, developers used the model to autonomously steer and control a complex four-legged robot[10]. By translating high-level natural language instructions into real-time motor commands and spatial planning loops, the model showcased its capacity to act as a unified brain for physical automation. This cross-domain adaptability, from low-level GPU kernel programming to real-time physical robotics, positions the model as a versatile foundation for both software systems and advanced industrial automation[3].

Industry evaluations confirm the model's competitive standing against the world’s leading AI systems[10]. On the Artificial Analysis Intelligence Index, Qwen3.7-Max scored 56.6, placing it fifth globally and ranking as the top model originating from China[6][11]. In direct benchmarks designed for coding agents, such as Terminal-Bench 2.0, Qwen3.7-Max scored 69.7, outperforming Western rivals like Anthropic's Claude Opus 4.6, which scored 65.4[2][12]. It also achieved a strong score of 80.4 on the SWE-bench Verified coding benchmark[2][12]. Notably, Alibaba’s decision to release Qwen3.7-Max as a proprietary, hosted model accessible only via paid APIs represents a strategic shift[1][3]. While previous iterations of the Qwen family were celebrated for their open-weight releases, the massive financial costs associated with training and running frontier-level agentic models have pushed Alibaba to adopt a commercial model similar to those of Western tech giants[1].

The release of Qwen3.7-Max coincided with a broader, full-stack AI infrastructure upgrade by Alibaba[5]. Alongside the model, the cloud giant introduced the Panjiu AL128 Supernode Server, designed to support scalable agent inference and massive model training workloads[5]. By pairing its specialized software models with proprietary physical hardware like the Zhenwu M890 processor, Alibaba is building a closed-loop ecosystem[5][6]. This integration allows the software to be tuned precisely to the underlying silicon, and, as demonstrated by the 35-hour autonomous run, enables the software to play an active role in optimizing its own physical execution layer[5][6].

The implications of Alibaba’s latest development stretch far beyond the immediate performance gains of a single chip. As AI systems transition from passive assistants to active, autonomous engineers capable of multi-day operations, the landscape of software development and system architecture will inevitably transform[1][2]. The successful 35-hour autonomous run of Qwen3.7-Max demonstrates that AI is no longer just a tool utilized by human developers, but an independent agent capable of conducting its own hardware-level optimization[6][7]. This self-improving loop of software optimizing the hardware it runs on could dramatically accelerate the pace of technological advancement, redefining the future of both the global semiconductor industry and enterprise automation.