Topic of the Month
Robot Brain Spark
Robots can do back flips and parkour. They staff Amazon warehouses. A startup is opening a factory to churn out humanoid robots by the hundreds and thousands. Surely, this is evidence Asimov's future is here. Only, not so much.
Impressive as these feats are, I, Robot they are not. Like machine marionettes, most robots you’ve seen on YouTube hang on invisible strings of code, each dance step planned and choreographed. There is little AI involved. Or rather, there was little AI. Something changed in 2023. And yes, it has to do with ChatGPT. (Surprise!) It turns out the algorithms generating language and images can generate much more.
The key insight is that the models behind impressive AIs like GPT-4 and DALL-E 3 are data-agnostic. In the case of these two models, training data includes text and images. Given enough of that data—and as we know, it’s an awful lot—the models can generate new, surprisingly proficient examples at a prompt. More importantly, the algorithms are generalized enough to go beyond what’s explicitly included in their datasets.
The question is: What else do we want AI to generate? In theory, anything with enough data—from the molecular structures of proteins to motor control in robots—is fair game. In the former, we’ve seen incredible progress. In the latter, things are just warming up, but the technology could allow us to build more flexible, generalizable robotic platforms.
Here are three areas where generative AI is breaking into robotics.
Interfaces. This is the low-hanging fruit: Take an existing large language model chatbot, say ChatGPT, and tack it onto a robot. The robot not only now understands you, but it can draw on the entire internet—included in its training—to read between the lines and accomplish tasks it wasn’t explicitly coded to do. In a recent demo, in which Boston Dynamics’ Spot takes on the role of tour guide, the robot fluidly adopts various personalities, and when asked, “Can you show us your parents?” walks over to an exhibit with prior versions of itself, likely equating the word “parents” with the word “old.”
Simulation. A significant roadblock for AI in robotics is the availability of training data. In contrast to language, which humans generate in prodigious amounts online, robots need data on the balance, pressure, and movements required to accomplish unseen tasks in the real world. There are neither enough robots out there generating such data, nor are they sharing it regularly (though this is changing—more in a moment). But from self-driving cars to two-legged robots, researchers have made progress training robots in digital simulations. In a case of AI building on itself, generative algorithms could make these simulations deeper and more varied, more closely matching the real world.
Control. Along these lines, there have been some notable recent demonstrations of the power of data and AI in robotics control. In an open-source AI system called Dobb-E, volunteers filmed themselves doing various tasks with an iPhone on a stick. Using this data, the team trained Dobb-E to take control of a Stretch robot and complete simple tasks like opening a window blind or pulling a book from a shelf. Each took just 20 minutes to learn with 81 percent success. Meanwhile, a UC Berkeley team trained a transformer—the type of algorithm behind generative AI—to walk in a simulated environment and then transferred it straight to the real world, where it excelled.
To be clear, this isn’t the first time AI and robotics are getting together. Computer vision algorithms have long been at the party. But the most impressive tricks, like humanoid Atlas robots dancing or doing back flips, are mostly thanks to good old fashioned code. The idea that generative AI could yield more generalized systems is tantalizing.
Of course, there’s more work to do. For robots using ChatGPT—or models like it—the language center of the robot’s brain lives on distant cloud servers. There can be significant latency (as much as six seconds in the Boston Dynamics demo). Other issues include slowdowns when lots of people are using ChatGPT, server crashes, and local WiFi glitches. All this makes the current approach impractical for real-world applications. Also, the amount of training data available is minuscule in comparison to language models.
But the underlying technologies aren't static. A big theme in AI this year is the development of smaller, more efficient—but just as skilled—AI models. It’s not unreasonable to expect generative algorithms small enough to fit on your phone—or in a robot. Local AI could allow faster interactions, maybe even closer to real time.
Meanwhile, Google and UC Berkeley are assembling the robotics dataset to end all datasets. The RT-X project is convening 32 robotics labs from around the world to record and share data with the goal of building a foundation model for robotics—that is, a single AI system that can control any robot across a wide range of scenarios.
It’s all very early. There’s little reason to expect a ChatGPT moment, where the general public gets its collective mind blown, this year. But as the data builds up and AI models get more efficient, the dream of general-purpose robots will inch steadily closer.
Know someone who might enjoy the Singularity Monthly?
Share this newsletter with them.