Scientists Prove More AI Agents Can Catastrophically Degrade Performance
Groundbreaking research shatters the "more is better" myth, revealing when adding AI agents catastrophically degrades performance.
December 13, 2025

In the rapidly advancing field of artificial intelligence, a common assumption has been that deploying more autonomous AI agents to tackle a complex problem will invariably lead to better, faster, and more robust solutions. However, a groundbreaking new study from researchers at Google Research, Google Deepmind, and the Massachusetts Institute of Technology (MIT) systematically dismantles this "more is better" paradigm. The research provides a quantitative, scientific framework for understanding when multi-agent AI systems thrive and when they falter, revealing that in certain common scenarios, adding more agents doesn't just offer diminishing returns—it can catastrophically degrade performance. The findings challenge practitioners to move beyond heuristic design and adopt a more principled, data-driven approach to building agentic systems.
The core discovery of the study, detailed in a paper titled "Towards a Science of Scaling Agent Systems," is that the performance of a multi-agent system is fundamentally dictated by the nature of the task it is assigned.[1] Through a comprehensive analysis of 180 different agent architectures, the researchers found a stark divergence in outcomes between parallel and sequential tasks.[2][1] For tasks that can be broken down into independent sub-problems and worked on simultaneously, such as a market analysis, multi-agent systems demonstrated their power, achieving performance gains of over 80% compared to a single agent.[2][1] Conversely, for sequential tasks that require a series of dependent steps, like the planning required in the game Minecraft, multi-agent systems failed spectacularly. Performance in these scenarios plummeted by as much as 70% relative to a single, well-provisioned AI agent, proving that collaboration can become a critical liability.[2][1]
A primary reason for this dramatic failure in sequential tasks is a phenomenon the researchers identified as "architecture dependent error amplification."[2] In systems where multiple AI agents operate independently in a decentralized structure, a single error or hallucination from one agent can propagate and multiply throughout the network, similar to a game of "telephone."[3] The study quantified this effect, finding that such "independent agent swarms" can amplify an initial error by a staggering 17.2 times.[3][2] This cascade of compounding failures leads to progressively divergent world states among the agents, eventually causing a total collapse of the system's ability to perform the task.[2] The research suggests a solution to this lies in the system's topology; a centralized or hierarchical architecture, where a "boss" agent coordinates and reviews the work of sub-agents, was found to be far more effective at containing these errors, though it still resulted in a 4.4x amplification.[3]
Furthermore, the study introduces the critical concept of the "Tool-Coordination Trade-off," which highlights the inherent inefficiency of multi-agent systems in environments that require the use of numerous tools, such as web browsers or code interpreters.[3][1] The cognitive and computational overhead required for agents to communicate, negotiate, and decide which tool to use can quickly overwhelm the benefits of parallel work.[3] This "coordination tax" becomes particularly punishing in complex scenarios involving many tools, as agents burn through their token budgets and context windows simply trying to align their actions rather than executing the task.[3][1] Another key finding is the "capability saturation" effect. The researchers discovered that if a single, capable AI model can already achieve a success rate of over 45% on a given task, adding more agents to the mix provides minimal or even negative returns.[3][1] In these cases, the cost of coordination outweighs any potential gains from collaboration.
The implications of this research are profound for the AI industry, offering a much-needed reality check on the prevailing hype around "AI swarms." Instead of assuming that scaling agent counts is a universal solution, developers and engineers are now equipped with a predictive framework to make more informed architectural choices.[4][5] The study's model can help determine the optimal system design based on task characteristics, base model capability, and coordination topology.[4] The clear message is that for many problems, particularly those that are linear or sequential in nature, the most effective and efficient approach is not a team of collaborating agents, but a single, highly capable agent.[3] This moves the field away from guesswork and toward a genuine science of scaling agentic AI, where design choices are driven by quantitative evidence rather than industry trends.