Tencent AI Breakthrough: Smaller Models Explain Strategy, Outperform Giants

Tencent's TiG framework builds explainable AI that reasons through complex games, showing smaller models can outperform giants.

October 4, 2025

Tencent AI Breakthrough: Smaller Models Explain Strategy, Outperform Giants
In a significant development emerging from the heart of the technology and gaming world, researchers at Tencent are pioneering a new approach to artificial intelligence that teaches models not just to win complex games, but to explain the strategic thinking behind their moves. Using the enormously popular mobile game *Honor of Kings* as a training ground, this new research has yielded a surprising result: smaller, more efficient AI systems can, under specific training conditions, outperform their much larger counterparts. This breakthrough challenges the prevailing "bigger is better" philosophy in AI development and signals a potential paradigm shift towards more transparent, interpretable, and efficient artificial intelligence. The project centers on a novel framework designed to bridge what researchers have identified as a critical "knowing-doing" gap in AI: while many AI agents can excel at playing games, they cannot articulate their reasoning, and conversely, large language models can discuss strategy in the abstract but struggle with the practical execution within a dynamic game environment.
At the core of Tencent's research is the "Think in Games" (TiG) framework, a novel method that reframes the learning process for AI in interactive settings.[1][2] The system trains large language models to develop a deep, procedural understanding of game mechanics by directly interacting with the game environment.[2][3] Instead of just outputting an action, the AI first generates a natural language thought process, enclosed in a `` tag, which outlines its analysis of the current game state, its reasoning, and its strategic considerations.[1][4] Following this internal monologue, it then produces a final, actionable instruction in an `` tag.[1] This approach essentially compels the AI to reason about its choices before making them, creating a transparent and interpretable decision-making process. The training for this advanced AI is conducted in two distinct phases. Initially, the model undergoes supervised learning, where it is fed anonymized recordings of real human matches from *Honor of Kings* to learn the basic mechanics and strategic patterns.[5] This is followed by a crucial second phase of reinforcement learning, which refines the AI's strategic capabilities through direct gameplay feedback.[5][2][3] An algorithm known as Group Relative Policy Optimization (GRPO) is employed, using a simple reward system that gives the model a point for making a correct strategic move, enabling it to learn and adapt its strategy through trial and error.[1][5][3][4]
Perhaps the most startling finding from the Tencent study is the remarkable efficiency and power demonstrated by smaller language models trained using the TiG framework.[1] The research team tested several models, but a Qwen model with just 14 billion parameters, when subjected to the full TiG training pipeline, achieved an impressive 90.91% accuracy in predicting the correct strategic action in *Honor of Kings*.[1] This result is not only a significant improvement over its base performance but, more importantly, it substantially surpasses the 86.67% accuracy achieved by Deepseek-R1, a much larger and more computationally expensive model.[1] This finding directly challenges the industry's prevailing trend of scaling up models to massive sizes, suggesting that smarter, more targeted training methods can unlock superior performance from more resource-efficient systems.[1] The success of the smaller, fine-tuned model demonstrates the incredible data and computational efficiency of the TiG approach, proving that it is possible to achieve high performance without relying on sheer scale.
This latest research from Tencent AI Lab is not the company's first foray into the world of high-stakes gaming AI. It represents a significant evolution of their earlier work on AI agents designed purely for superhuman performance. Several years prior, Tencent developed an AI named "Wukong," also known as "Jiewu," which was trained on *Honor of Kings* with the goal of achieving victory against the best human players.[6][7] Wukong proved immensely successful, defeating top amateur players with a 99.8% win rate and even beating professional esports teams in live competitions.[6][8] This earlier AI learned by playing against itself at a superhuman pace, accumulating the equivalent of more than 400 years of gameplay per day.[8][9] While Wukong demonstrated that an AI could master the complex, strategic, and cooperative environment of a top-tier multiplayer online battle arena (MOBA) game, the new TiG framework addresses a more nuanced challenge: creating an AI that can not only win but also reason and communicate its strategy like a human coach.[8][5] This shift marks a deliberate move from simply creating powerful "black box" agents to developing transparent AI systems that can serve as explainable partners in complex decision-making scenarios.
The implications of Tencent's research extend far beyond the digital battlegrounds of *Honor of Kings*. The ability to create smaller, more efficient, and interpretable AI models has profound potential for the broader technology industry. An AI that can articulate its strategic reasoning could revolutionize player training and coaching in esports, providing real-time analysis and personalized advice.[8] More broadly, this framework could be adapted for any complex, interactive decision-making domain, from robotics and autonomous vehicle navigation to financial modeling and logistics. The success of the TiG framework in enabling smaller models to outperform larger ones offers a new, more sustainable path for AI development, potentially democratizing access to high-performance AI by reducing the immense computational costs typically associated with training state-of-the-art models. By solving the "knowing-doing" gap, Tencent's work paves the way for a future where AI acts not just as a powerful tool, but as a transparent and collaborative partner in solving some of the world's most complex challenges.

Sources
Share this article