The idea of a robot that does a wide range of household chores, from unloading the dryer to folding laundry to cleaning up a messy table, has long seemed like pure science fiction—perhaps most famously embodied by the 1960s fantasy that was Rosey in The Jetsons.
Physical Intelligence, a startup in San Francisco, has shown that such a dream might actually not be so far off, demonstrating a single artificial intelligence model that has learned to do a wide range of useful home chores—including all of the above—by being trained on an unprecedented amount of data.
The feat raises the prospect of bringing something as magical and generally capable as other AI models like ChatGPT into the physical world.
The advent of large language models (LLMs)—general-purpose learning algorithms fed vast swaths of text from books and the internet—has given chatbots vastly more general capabilities. Physical Intelligence aims to create something similarly capable in the physical world by training a similar kind of algorithm with enormous amounts of robotic data instead.
“We have a recipe that is very general, that can take advantage of data from many different embodiments, from many different robot types, and which is similar to how people train language models,” says the company’s CEO, Karol Hausman.
The company has spent the past eight months developing its “foundation model,” called π0 or pi-zero. π0 was trained using huge amounts of data from several types of robots doing various domestic chores. The company often has humans teleoperate the robots to provide the necessary teaching.
Physical Intelligence, also known as PI or π, was founded earlier this year by several prominent robotics researchers to pursue the new robotics approach inspired by breakthroughs in AI’s language abilities.
“The amount of data we’re training on is larger than any robotics model ever made, by a very significant margin, to our knowledge,” says Sergey Levine, a cofounder of Physical Intelligence and an associate professor at UC Berkeley. “It’s no ChatGPT by any means, but maybe it’s close to GPT-1,” he adds, in reference to the first large language model developed by OpenAI in 2018.
Videos released by Physical Intelligence show a variety of robot models doing a range of household chores with impressive skill. A wheeled robot reaches into a dryer to retrieve clothes. A robot arm buses a table cluttered with cups and plates. A pair of robot arms grab and fold laundry. Another impressive feat mastered by the company’s algorithm is building a cardboard box, which involves a robot gently bending its sides and delicately fitting pieces together.
Folding clothing is especially challenging for robots, requiring more general intelligence about the physical world, Hausman says, because it involves dealing with a wide range of flexible items that deform and crumple unpredictably.
The algorithm displays some surprisingly humanlike quirks, shaking T-shirts and shorts to get them to lie flat, for example.
Hausman notes that the algorithm does not work perfectly, and like modern chatbots, the robots sometimes fail in surprising and amusing ways. When asked to load eggs into a carton, a robot once chose to overfill the box and force it to shut. Another time, a robot suddenly flung a box off a table instead of filling it with things.
Building more generally capable robots is not only a science fiction trope but is, of course, also an enormous commercial opportunity.
Despite stunning AI progress in recent years, robots remain stubbornly dumb and limited. The ones found in factories and warehouses typically go through precisely choreographed routines without much ability to perceive their surroundings or adapt on the fly. The few industrial robots that can see and grasp objects can only do a limited number of things with minimal dexterity due to a lack of general physical intelligence.
More generally capable robots could take on a far wider range of industrial tasks, perhaps after minimal demonstrations. Robots will also need more general abilities in order to cope with the enormous variability and messiness of human homes.
General excitement about AI progress has already translated into optimism about major new leaps in robotics. Elon Musk’s car company, Tesla, is developing a humanoid robot called Optimus, and Musk recently suggested that it would be widely available for $20,000 to $25,000 and capable of doing most tasks by 2040.
Previous efforts to teach robots to do challenging tasks have focused on training a single machine on a single task because learning seemed untransferable. Some recent academic work has shown that with sufficient scale and fine-tuning, learning can be transferred between different tasks and robots. A 2023 Google project called Open X-Embodiment involved sharing robot learning between 22 different robots at 21 different research labs.
A key challenge with the strategy Physical Intelligence is pursuing is that there is not the same scale of robot data available for training as there is for large language models in the form of text. So the company has to generate its own data and come up with techniques to improve learning from a more limited dataset. To develop π0 the company combined so-called vision language models, which are trained on images as well as text, with diffusion modeling, a technique borrowed from AI image generation, to enable a more general kind of learning.
For robots to be able to take on any robot chore that a person asks them to do, such learning will need to be scaled up significantly. “There’s still a long way to go, but we have something that you can think of as scaffolding that illustrates things to come,” Levine says.