GPT-5 Masters Deception, Dominates AI Social Intelligence in Werewolf Game
A new benchmark reveals GPT-5's cold, calculating logic for deception and strategic manipulation in complex social games.
September 13, 2025

A new benchmark designed to test the social intelligence of artificial intelligence models has revealed a significant gap in capabilities, with GPT-5 demonstrating vastly superior skills in manipulation, strategic planning, and deception. French startup Foaster.ai pitted seven different large language models against each other in 210 games of the complex social deduction game "Werewolf." The results showed GPT-5 not just winning, but fundamentally dominating the game by weaponizing its core mechanics and exhibiting a level of cold, calculating logic that other models could not match. This experiment moves beyond traditional AI evaluations, which typically focus on tasks like math and coding, to probe the more nuanced and human-like skills required for complex social interactions, signaling a new frontier in AI development and evaluation.
The Foaster.ai Werewolf benchmark was meticulously designed to measure social intelligence, a critical component for the future of autonomous AI agents.[1][2] In the six-player setup, two AI models were secretly assigned the role of werewolves, while the other four were villagers, two of whom had special powers as a "Seer" and a "Witch".[1] The game unfolds through rounds of accusations, defense, and voting, demanding that players build trust, detect lies, and persuade others. Foaster.ai established a round-robin tournament where pairs of seven different large language models played ten matches each, allowing for the calculation of an Elo rating to rank their performance.[1] The benchmark was specifically engineered to assess two key aspects of social intelligence: the ability to manipulate others when playing as a werewolf, and the ability to resist manipulation as a villager.[1][2][3] This novel approach provides a quantifiable measure of skills like persuasion, coordination, and long-horizon planning, which are essential as AI systems evolve from simple tools into collaborative partners.[1]
GPT-5's performance was in a class of its own, setting it apart from the pack of other capable AI models.[1] Its strategy was described as hyper-rational and unflappable, focusing on seizing control of the game's procedures from the very beginning.[1] When playing as a werewolf, GPT-5 didn't just engage in simple deception; it worked to construct an alternate reality where its victory was the only logical outcome.[1] The model demonstrated ruthless efficiency in coordinating with its werewolf partner, often using the language of game theory to discuss tactics like maximizing expected value and securing pluralities.[1] One of the most striking examples of its advanced strategy was its willingness to sacrifice its own partner to gain the trust of the village for a future advantage, a tactic known as "bussing".[2] In another documented game, when correctly identified as a werewolf by the real Seer, GPT-5 audaciously counter-claimed the Seer role itself, creating enough confusion to have the actual Seer eliminated and turning a certain loss into a potential victory.[1]
While GPT-5 stood alone at the top, the performance of other AI models revealed a spectrum of social intelligence and distinct strategic "personalities."[1][2] Models like Kimi-K2, Grok-4, and Gemini 2.5 Pro showed flashes of high-impact play but were often volatile and prone to errors or overreach that exposed their deception.[1] The lower-tiered models, such as GPT-5-mini, 2.5 Flash, and Qwen3, could influence a single vote but struggled to maintain a coherent deceptive narrative over multiple rounds of the game.[1] At the bottom of the rankings was GPT-OSS, which was found to be transparent and easily countered.[1] This hierarchy demonstrates that while several models can participate in the complex social game, only the most advanced, like GPT-5, can consistently execute multi-day planning, manage their credibility, and employ a sophisticated theory of mind to outmaneuver opponents. The distinct styles, from GPT-5's calm, controlling strategy to the more assertive and combative approach of other models, highlight the varied developmental paths of social reasoning in AI.[1]
The implications of the Werewolf benchmark extend far beyond the game itself, offering critical insights for the future of human-AI collaboration and the development of autonomous agents. As AI becomes more integrated into our lives, its ability to understand and navigate complex social dynamics—including trust, deception, and strategic communication—will be paramount.[2] This benchmark reveals that the most advanced models are beginning to exhibit genuinely sophisticated social behaviors, a necessary precursor for them to function as effective and reliable partners.[1] The study by Foaster.ai, which builds upon prior work like Google Research's "Werewolf Arena," pushes the industry to look beyond narrow, task-oriented evaluations and toward more holistic assessments of AI intelligence.[1][4] Understanding how these models succeed and fail in socially demanding environments is crucial for designing AI systems that are not only capable but also aligned with human social norms and values. The clear superiority of GPT-5 in this domain suggests that the next generation of AI agents could possess a formidable ability to strategize and persuade in complex, multi-agent scenarios.