KAUST AI Rewrites Its Own Code, Achieves Human-Level Software Skills

The Huxley-Gödel Machine realizes a decades-long AI vision, self-rewriting its code to match human engineering prowess.

November 3, 2025

KAUST AI Rewrites Its Own Code, Achieves Human-Level Software Skills
A research group at King Abdullah University of Science and Technology (KAUST) has developed an AI agent capable of rewriting its own code to progressively improve its performance, a significant step toward realizing a long-held vision in artificial intelligence. The new system, named the Huxley-Gödel Machine (HGM), marks a practical advancement on the theoretical "Gödel Machine" concept proposed decades ago by pioneering AI scientist Jürgen Schmidhuber, who is now the director of the AI Initiative at KAUST.[1][2] The HGM has demonstrated its ability to evolve and enhance its problem-solving skills, ultimately achieving a level of performance on complex software engineering tasks that matches the best human-engineered AI agents.[1]
The development revives a decades-long pursuit within the AI community to create a truly self-improving system. Schmidhuber first proposed the Gödel Machine in 2003 as a hypothetical, universal problem-solver that could optimally rewrite any part of its own code if it could first prove the change would be beneficial.[1][2] This requirement of mathematical proof, however, made the original concept practically impossible to implement. The Huxley-Gödel Machine offers a pragmatic approximation. Instead of demanding mathematical certainty, it evaluates the potential of its self-modifications by looking at the long-term success of its descendants, an approach inspired by evolutionary biology.[3][2] This breakthrough addresses a key challenge in the development of self-improving agents, where immediate performance gains do not always lead to the best long-term evolution.
At the heart of the Huxley-Gödel Machine is a novel metric called Clade-Metaproductivity (CMP).[3][4] The researchers identified a critical issue in previous self-improving agents: a modification that yields a high score on a benchmark right away is not necessarily the best foundation for future improvements. They termed this the "Metaproductivity-Performance Mismatch."[3][5] To solve this, the HGM assesses potential code changes not just on their immediate success, but on the aggregated performance of all the future agents that descend from that change—what they call a "clade." By estimating the long-term potential of an entire family tree of modifications, the HGM can make more strategic decisions, choosing evolutionary paths that lead to more robust and capable future generations of itself, even if it means sacrificing a small, immediate performance boost.[2][4]
The KAUST team rigorously tested their creation against demanding industry benchmarks, most notably SWE-bench and Polyglot. SWE-bench is a highly regarded benchmark that evaluates an AI's ability to solve real-world software engineering problems sourced directly from GitHub issues in popular Python repositories.[6][7] This is considered a more realistic test of an AI's coding ability than solving isolated algorithmic puzzles.[8] On these benchmarks, the HGM consistently outperformed previous self-improving AI methods, such as the Darwin Gödel Machine and SICA.[2] The landmark achievement came when an agent, optimized by HGM on the SWE-bench Verified dataset, was evaluated on the SWE-bench Lite benchmark. It achieved a level of performance that matched the best results of officially verified, human-engineered coding agents, effectively reaching "human-level" performance in this specific, but significant, domain.[1] Furthermore, the HGM achieved these superior results while being more computationally efficient, requiring fewer CPU hours than its predecessors.[1][4]
The introduction of the Huxley-Gödel Machine carries substantial implications for the future of the AI industry. Its success demonstrates a viable path toward creating more autonomous AI systems that can continuously learn and adapt without constant human intervention.[3] Such technology could dramatically accelerate software development, automate complex bug fixes, and potentially discover novel solutions that human developers might miss. However, the advance also brings to the forefront critical questions about safety and control. As these systems become more capable of modifying their own core logic, ensuring their actions remain aligned with human intent becomes paramount.[9][10] While the HGM's performance on benchmarks like SWE-bench is a major milestone, experts note these environments still represent a narrow slice of the full spectrum of software engineering and that performance may not generalize to all real-world scenarios.[8][11] Nonetheless, the Huxley-Gödel Machine stands as a powerful proof-of-concept, transforming a theoretical dream into a tangible reality and setting a new benchmark in the quest for truly intelligent and evolving machines.

Sources
Share this article