Hello Everyone,
Tesla’s Optimus has been in the news a bit with a recent update video by Tesla. Here it is:
One of the most important Robotics evangelists Jim Fan said this on LinkedIn:
He covers:
5-Finger Dexterity
Teleoperation Software (i.e. human operators wearing VR gear)
Fleet & Tasks
How to collective training data at scale
The latest Tesla Optimus video gives us a peek at their human data collection farm, which I believe is Optimus' biggest lead. What does it take to build such a pipeline? Optimus nailed all of the following:
1. Optimus hands are among the best 5-finger, dexterous robot hands in the world. It's got tactile sensing, 11 degrees of freedom (DOF) compared to many competitors with only 6-7 DOF, and robustness to withstand lots of object interactions without constant maintenance.
2. Teleoperation software: we can see that the human operators are wearing VR goggles and gloves. It is very non-trivial to set up the software to have first-person video streamed in and precise control streamed out, while maintaining extremely low latency. Humans are highly sensitive to even the smallest delay between their own motions and the robot's. Optimus has a fluid whole body controller that enacts the human poses in real-time.
3. Sizeable fleet: you need more than one robot to collect data in parallel, well-trained human contractors taking multiple shifts per day (preferably 24/7), and an on-call maintenance crew to make sure that the robots are always busy. That's a ton of operational complexity that academic research labs don't even think of.
4. Tasks & environments: it's equally important to figure out *what* to teleoperate. Currently, most such efforts are demo-driven: collect data on the tasks that you want to put into a social media video. But solving general-purpose robots requires us to think carefully about the distribution of tasks and environments. From 43"-51" in the video, we can see factory & household settings like moving batteries, handling laundry, sorting daily objects into shelves.
It's an open-ended research question: if you only have the budget to collect training data for 1,000 tasks, what would you pick to maximize skill transfer and generalization?
Closing thought: teleoperation is a necessary but insufficient condition to solve humanoid robotics. It fundamentally does not scale.
In a post made on the Optimus X account on Sunday, the humanoid robot is seen using its end-to-end neural network to perform basic factory tasks, including sorting 4680 battery cells. The video highlights the robot’s ability to do so autonomously, even fixing its own mistakes as it goes along. Though most assume the robot was not operating independently but was also just featuring this teleoperation.
Musk has been hyping up Optimus recently, pledging that Tesla would eventually deliver an amazing new robot that people would buy in stores. It’s not clear if or when this will actually happen as all of a sudden Tesla has many worthy competitors including some in China.
While Optimus has tried to evolve, they now have stiff competition even in 2024.
Internet Mocking Tesla Demos
The technique here is called “teleoperation,” and has been used in robotics since the 1940s. Essentially someone moves their own hand and the robot mimics the movement. It’s cool for mid-20th-century tech, but it’s not the kind of autonomous robot movements that people here in the 21st century expect for cutting-edge and futuristic products. See the video people are making fun of here.
It’s gotten so bad, Robot companies are now including notices when they post new demo videos that make it clear the machine is operating autonomously.
Chinese robot maker Astribot
How can you verify if any of these videos are entirely real and how they were doctored though? It’s not clear.
Optimus Training to do Practical Tasks
Using end-to-end neural networks to sort battery cells, with the system autonomously recovering from any failures.
Optimus is currently being trained at one of Tesla’s factories, with fewer human interventions over time, and is now taking longer walks around the office.
It runs in real time on the bot's FSD computer, using only 2D cameras, along with hand tactile and force sensors.
The training data comes from human teleoperation and is scaled across the fleet to perform different tasks.
Again, it’s not sure if any of this is scalable.
Robot Meme of the Week
“The smarter we make AI, the less it wants to do our jobs”
Optimus Gen 2 is the second generation of Tesla’s humanoid robot. It is designed to be a general-purpose machine that can assist humans in various domains, such as manufacturing, construction, healthcare, and entertainment.
Tesla said in December, 2023 that its new prototype is 30% faster, 10 kg lighter, and has sensors on all fingers.
It's best to take Tesla's claims with a grain of salt until they are independently verified in practical, real-world demonstrations. Elon has a lot riding on his promises about Robotaxis and Optimus being things millions of consumers will use soon, which is far from a certainty.
Keep reading with a 7-day free trial
Subscribe to OK, Robot to keep reading this post and get 7 days of free access to the full post archives.