Turing Award Winner Richard Sutton Warns Generative AI Cannot Achieve True Scientific Discovery

Turing winner Richard Sutton argues that generative models cannot achieve scientific breakthroughs without feedback loops to validate their discoveries.

June 1, 2026

Turing Award Winner Richard Sutton Warns Generative AI Cannot Achieve True Scientific Discovery
Richard Sutton, a pioneer of reinforcement learning and Turing Award winner, has voiced a fundamental critique of modern generative artificial intelligence, claiming that these systems are structurally incapable of performing true scientific discovery[1][2]. Speaking at the Science x AI Summit in Palo Alto, California, Sutton highlighted what he sees as a fatal weakness in today's most hyped technology: the inability of conventional generative AI to evaluate its own results[3][4]. According to the researcher, while large language models, image generators, and video systems are highly effective at mimicking human patterns, they lack the active feedback loops required to validate new ideas[2][5]. Without this critical verification step, any creative or novel insight generated by an AI remains a brief flash of potential that cannot be retained or transformed into real scientific progress[3][5].
The foundational issue with pure generative AI, according to Sutton, lies in its reliance on imitation rather than independent evaluation[6][2]. Sutton, who won the prestigious ACM A.M. Turing Award alongside Andrew Barto for developing the mathematical and algorithmic foundations of reinforcement learning, illustrated his critique with an old academic joke[1][7]. He noted that reviewers often describe a manuscript by saying the work is both novel and good, but unfortunately, the parts that are good are not novel, and the parts that are novel are not good[2][5]. Sutton argued that this exact diagnosis applies to today's generative AI models[2][5]. When large language models generate highly accurate or good outputs, they are generally regurgitating information already present in their massive training datasets[2]. Conversely, when these models attempt to produce genuinely novel outputs that go beyond their training material, they frequently invent false information, a phenomenon commonly referred to as hallucination[2]. In Sutton's view, generative systems are trapped in a binary state: they can mimic useful existing concepts or randomly generate new ones, but they cannot autonomously determine which of their new ideas are actually correct[2].
True scientific discovery requires a dynamic, multi-step evolutionary process that today's generative models are structurally unequipped to execute[2]. To explain why generative AI fails at genuine science, Sutton outlined a three-step cycle that governs all discovery, whether in human research, natural evolution, or machine learning: variation, evaluation, and selective retention[2][5]. Under this framework, an intelligent agent must generate a variety of different hypotheses or trajectories, test those options against a standard or environment, and then selectively retain the methods that yield the best results[2]. Sutton pointed out that while generative models possess stochastic characteristics that allow them to produce variation, they completely lack the crucial second step of evaluation[2][5]. Because these models cannot verify their own outputs, they cannot perform selective retention[2][5]. When a language or image model happens to generate a brilliant, novel idea, that novelty is effectively lost because the system has no internal mechanism to recognize its value[5]. The novelty simply flickers into existence for a brief moment before vanishing, unable to contribute to a cumulative body of scientific knowledge[3][5].
In contrast to the limitations of pure generation, Sutton pointed to advanced AI systems that successfully leverage closed-loop evaluation systems to achieve genuine creativity[3]. Sutton contrasted the static nature of generative models with landmark AI systems that have demonstrated superhuman capabilities by embedding rigorous evaluation loops[3]. He cited Google DeepMind's AlphaGo and its successor AlphaZero, which learned to play complex games not by copying human moves, but by playing millions of games against themselves and evaluating every single move based on whether it increased the chance of winning[1][8][9]. This feedback loop allowed AlphaGo to discover entirely original strategies, such as its famous move 37 in its match against Lee Sedol, which human experts initially dismissed but ultimately recognized as a work of creative genius[1][9]. Similarly, systems like AlphaFold in protein folding, AlphaProof in formal mathematics, and Sony's GT Sophy in autonomous racing all rely on concrete environments or formal verifiers to test their outputs[10]. AlphaProof, for example, successfully generated over 100 million formal proofs by coupling a language model's predictive capabilities with a reinforcement learning algorithm and the Lean proof assistant, which formally verified the mathematical correctness of each step[11][12].
This critique of generative AI is deeply aligned with Sutton's long-standing advocacy for scalable, feedback-driven computational systems over hand-coded human knowledge, a philosophy that carries massive implications for the future of the AI industry[13][14]. In his highly influential essay, The Bitter Lesson, Sutton argued that 70 years of AI history show that researchers who try to build human-designed knowledge into algorithms are consistently outpaced by those who utilize massive computation, general search, and learning algorithms[15][13]. More recently, in a paper co-authored with DeepMind reinforcement learning lead David Silver, titled Welcome to the Era of Experience, Sutton argued for a total paradigm shift[16][14]. They asserted that the future of artificial general intelligence will not be achieved by feeding models increasingly massive amounts of human-curated datasets, which represent static and often flawed human knowledge[16][14]. Instead, they argued that AI must learn continuously by interacting with its environment through direct experience and feedback[16][14]. If this perspective is correct, today's commercial preoccupation with scaling up supervised learning models represents a dead end for achieving true reasoning and scientific discovery[1][17]. Rather than simply throwing more computer power at imitation learning, the industry must shift capital and research efforts toward building rich simulated environments and robust verifiers[1][18]. Early signs of this transition are already visible in coding agents like Claude Code, which runs code locally and iteratively corrects errors based on real-time compiler feedback[16][10]. To unlock AI's potential in accelerating breakthroughs in medicine, materials science, and climate modeling, developers must prioritize these domain-specific verification layers that can evaluate AI-generated hypotheses in the real physical world or high-fidelity simulators[19].
In conclusion, Richard Sutton's critique serves as a vital reality check for an industry swept up in the hype of generative artificial intelligence[17][2]. By drawing a sharp distinction between imitation and discovery, the Turing Award winner reminds us that true intelligence is not merely the ability to repeat what has been said, but the capacity to test, learn, and adapt through trial and error[1][2]. As artificial intelligence continues to integrate into the scientific workflow, the focus must shift from generating more text to designing systems that can interact with reality, experience consequences, and recognize their own breakthroughs[20]. Only by moving beyond the limits of pure mimicry and embracing the rigorous, closed-loop evaluation demanded by the scientific method can artificial intelligence become a true partner in human discovery[2].

Sources
Share this article