Google DeepMind's AI teaches robots to reason, plan, and think before acting.

Google DeepMind's agentic AI empowers robots to reason, plan, and act autonomously, navigating the physical world with unprecedented intelligence.

September 26, 2025

Google DeepMind's AI teaches robots to reason, plan, and think before acting.
Google DeepMind has unveiled a significant advancement in the field of robotics by introducing two new artificial intelligence models designed to give machines sophisticated reasoning and decision-making abilities. The new systems, named Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, represent a leap forward in creating "agentic" AI, moving robots from merely executing pre-programmed commands to autonomously understanding, planning, and performing complex, multi-step tasks.[1][2] This development is being positioned as a foundational step toward creating general-purpose robots that can intelligently and dexterously navigate the complexities of the physical world.[3][2] The core innovation lies in a framework where the two models work in concert, mirroring a high-level cognitive process that can reason about a task and a lower-level system that translates those plans into physical actions.
The new approach hinges on the specialized functions of each model. Gemini Robotics-ER 1.5, where "ER" stands for Embodied Reasoning, acts as the high-level brain of the operation.[3] This vision-language model (VLM) is engineered to reason about the physical world, exhibiting a state-of-the-art understanding of spatial relationships.[3][4][5] When presented with a complex command in natural language, such as sorting trash according to local regulations, Gemini Robotics-ER 1.5 can break down the objective into a series of logical, achievable steps.[6] Crucially, it can natively call upon digital tools like Google Search to gather necessary information it lacks, such as looking up those waste management rules, before formulating a plan.[3][5][7] This model orchestrates the robot's overall activities by creating a detailed, multi-step plan, which it then communicates to its counterpart.[3][2] Google is making Gemini Robotics-ER 1.5 available to developers through the Gemini API in Google AI Studio, encouraging broader experimentation and application development.[5][7]
The second component of this symbiotic system is Gemini Robotics 1.5, a highly capable vision-language-action (VLA) model that receives the natural language instructions from the ER model and translates them into motor commands.[3][2] This model's breakthrough feature is its ability to "think before acting."[3][5] Instead of directly converting a command into a movement, Gemini Robotics 1.5 generates an internal sequence of reasoning in natural language.[3][5] This internal monologue allows it to better solve semantically complex tasks and makes its decision-making process more transparent, as it can explain its thinking.[3][5][8] For instance, if tasked with sorting laundry by color, the model first internally identifies the different colors and formulates a strategy before moving the items.[3] This capability marks a significant shift from traditional models that simply translate instructions into actions without this intermediate reasoning step.[5] Currently, this more action-oriented model is available only to select partners.[5][9]
One of the most profound implications of this new architecture is its ability to generalize across different robot forms, a concept known as "learning across embodiments."[3] The models demonstrate a remarkable capacity to transfer skills learned on one type of robot to a completely different one without specialized fine-tuning.[3] For example, a task perfected during training on the dual-arm ALOHA 2 robot can be successfully executed by Apptronik's humanoid robot, Apollo, or a bi-arm Franka robot.[3][5][9] This breakthrough significantly accelerates the learning process for new behaviors and is a critical step toward creating a single, versatile AI brain that can power a wide variety of robotic hardware.[2][8] This versatility, combined with the agentic framework, allows the robots to tackle long-horizon tasks that were previously intractable, such as packing a suitcase appropriately for a trip by first checking the destination's weather forecast.[6]
In conclusion, the introduction of Gemini Robotics 1.5 and Gemini Robotics-ER 1.5 signals a pivotal moment for the AI and robotics industries. By separating the high-level reasoning and planning from the low-level execution of actions, Google DeepMind has created a system that endows robots with unprecedented autonomy and adaptability. This move from reactive machines to reasoning agents that can plan, generalize knowledge, and interact with the digital world to inform their physical actions brings the long-held vision of a truly helpful, general-purpose robot significantly closer to reality.[3][2] The ability for these models to think, reason transparently, and transfer skills across different platforms lays the groundwork for a future where intelligent robots can seamlessly integrate into and assist with the complexities of human environments.

Sources
Share this article