Planning is a monumentally difficult thing

for robots, largely because of how they perceive and interact with the world. A 's perception of the world consists of nothing more than the vast array of pixels collected by its cameras, and its ability to act is limited to setting the positions of the individual motors that control its joints and grippers. It lacks an innate understanding of how those pixels relate to what we might consider meaningful concepts in the world.
"That low-level interface with the world makes it really hard to do decide what to do," said George Konidaris, an assistant professor of computer science at Brown and the lead author of the new study. "Imagine how hard it would be to plan something as simple as a trip to the grocery store if you had to think about each and every muscle you'd flex to get there, and imagine in advance and in detail the terabytes of visual data that would pass through your retinas along the way. You'd immediately get bogged down in the detail. People, of course, don't plan that way. We're able to introduce  that throw away that huge mass of irrelevant detail and focus only on what is important."
Even state-of-the-art robots aren't capable of that kind of . When we see demonstrations of robots planning for and performing multistep tasks, "it's almost always the case that a programmer has explicitly told the robot how to think about the world in order for it to make a plan," Konidaris said. "But if we want robots that can act more autonomously, they're going to need the ability to learn abstractions on their own."