Google DeepMind has been making steady progress in the field of AI with regular, highly-regarded updates to Gemini, Imagen, Veo, Gemma, and AlphaFold. Today, Google’s AI team continues to make headlines by announcing its official entry into the robotics industry with the release of two new models based on Gemini 2.0: Gemini Robotics and Gemini Robotics-ER.
Gemini Robotics: Advanced Vision-Language-Action Model
Gemini Robotics is an advanced vision-language-action (VLA) model that builds on Gemini 2.0, adding physical actions as a new output method for controlling robots. Google claims that this new model can understand situations that it has not even encountered during training.
Compared to other leading VLA models, Gemini Robotics performs twice as well on a comprehensive set of generalization benchmarks. Because it is built on the Gemini 2.0 model, it is able to understand a wide range of natural languages, which means it can understand human commands more accurately.
In terms of dexterity, Google claims that Gemini Robotics can handle complex, multi-step tasks that require precise manipulation. For example, the model can fold origami or put snacks into Ziploc bags.
Gemini Robotics-ER: A Visual-Language Model Focusing on Spatial Reasoning
Gemini Robotics-ER is an advanced visual-linguistic model focused on spatial reasoning, allowing roboticists to integrate with their existing low-level controllers. Using this model, roboticists will have all the steps to control a robot immediately, including perception, state estimation, spatial understanding, planning, and code generation.
The Future of Gemini Robotics
Google is partnering with Apptronik to build humanoid robots based on the Gemini 2.0 models. Google is also working with a number of trusted testing partners, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools, to guide the future development of Gemini Robotics-ER.
By enabling robots to understand and perform complex tasks with greater accuracy and adaptability, Google DeepMind is paving the way for a future where robots can seamlessly integrate into many aspects of our lives.