Beyond Parrots: AI Models Independently Construct World Models From Data.
Copenhagen's Othello study reveals diverse AI models spontaneously build internal game models, challenging the 'stochastic parrot' theory.
June 22, 2025

A recent experiment from the University of Copenhagen has added significant weight to the "world model" hypothesis, which suggests that large language models can develop internal representations of the environments they are trained on, even without explicit instruction. By training several different language models exclusively on sequences of moves from the board game Othello, researchers found that the models were not only able to learn the rules of the game but also independently construct a model of the 8x8 game board. This work builds upon and strengthens the findings of earlier studies that first used Othello as a testbed for exploring emergent abilities in AI.
The core of this investigation lies in the world model hypothesis, a concept that probes whether language models do more than just predict the next word or token; it asks if they build an underlying model of the reality that generates the data they see. The initial "Othello-GPT" experiment provided tantalizing evidence for this.[1][2] In that study, researchers trained a GPT-variant solely on move sequences from Othello games.[1][2][3] Despite never being explicitly told the rules or that the game is played on a board, the model, dubbed Othello-GPT, learned to make valid moves with near-perfect accuracy.[2] Probing the model's internal activations revealed that it had developed a representation of the board state, which could be used to predict which squares were occupied by black or white pieces.[3] This suggested the model had spontaneously created an internal "world model" of Othello.[1] Subsequent work even showed that this internal representation was linear, meaning the board state was encoded in a relatively straightforward way within the model's neural network.[3] These initial findings were exciting because they hinted that models trained on language might also be developing internal models of the real world.[1]
The new research from the University of Copenhagen, detailed in a paper by Yifei Yuan and Anders Søgaard, significantly expands on this premise.[4][5] Where the original work focused primarily on a GPT-2 model, the Copenhagen team evaluated a broader array of seven different language models, including both encoder-decoder architectures like T5 and BART, and decoder-only models like GPT-2, Mistral, and LLaMA-2.[4][5] This was a crucial step to determine if the emergence of a world model was a quirk of a specific architecture or a more general phenomenon.[6] The models were trained on two datasets: one containing thousands of real championship Othello games and another with millions of synthetically generated games.[4][5] The task was simple: predict the next legal move in a given sequence.[4]
The results were compelling and provided much stronger evidence for the Othello world model hypothesis.[4] All seven models, regardless of their underlying architecture, were able to learn the game and predict subsequent moves with up to 99% accuracy when trained on a large dataset.[4][5] More importantly, by using representation alignment tools, the researchers found a high degree of similarity in the board features learned by each distinct model.[4][6][5] This convergence suggests that different models, when faced with the same task, independently arrive at a similar internal representation of the game's structure.[5] The study also pushed beyond just predicting the next single move, exploring the models' ability to generate sequences of multiple moves, which requires a deeper understanding than just knowing the immediate rules.[6] Visualizations of the models' internal states, through latent move projections, further revealed an understanding of spatial relationships on the board, with models identifying not just the best move but also adjacent positions.[5]
The implications of these findings for the artificial intelligence industry are profound. The Othello experiments serve as a simplified, controlled environment for studying a fundamental question: do LLMs understand the world, or are they merely "stochastic parrots" mimicking patterns in data? The evidence from the Copenhagen study strongly supports the former, suggesting that models can induce the underlying principles of a system from raw sequential data. This has significant ramifications for how we view the capabilities of current and future AI. If a model can learn the implicit rules and structure of a board game, it lends credibility to the idea that models trained on vast amounts of text and code can develop coherent internal models of more complex real-world concepts, relationships, and physics. This understanding is crucial for building more robust, reliable, and interpretable AI systems. However, the research also highlights limitations; for instance, the models struggled to generate entire valid game sequences from scratch and required vast amounts of data to achieve high accuracy for even single-move predictions, pointing to the computational expense of inducing these world models.[7]
In conclusion, the detailed and expanded Othello experiment conducted at the University of Copenhagen provides robust new evidence that large language models can indeed form internal world models. By demonstrating this capability across a diverse range of seven models and showing a convergence in their learned representations, the research moves beyond a proof-of-concept to suggest a more fundamental emergent property of these systems.[4][6] While Othello is a far cry from the complexity of the real world, it acts as a valuable laboratory for understanding the hidden mechanics of AI.[8] The findings challenge the view of LLMs as simple pattern-matchers and push the AI community to consider the likelihood that these models are, in fact, building and utilizing internal models of the world to make their predictions, a realization that will shape the future of AI development and our understanding of machine intelligence itself.[1][2]