Kinda genius to scale exoskeleton data collection with UMI grippers when most labs are chasing "general" VLMs / VLAs by training on human demonstration videos.
Imo the latter will be very useful for semantic planning and reasoning, but only after manipulation is solved.
A ballpark cost estimate -
- $10 to $20 hourly wages for the data collectors
- $100,000 to $200,000 per day for 10,000 hours of data
- ~1,500 to 2,500 data collectors doing 4 to 6 hours daily
- $750K to $1.25M on hardware costs at $500 per gripper
Fully loaded cost between $4M to $8M for 270,000 hours of data.
Not bad considering the alternatives.
For example, teleoperation is way less efficient - it's 5x-6x slower than human demos, and 2x-3x more expensive per hour of operator time. But could become feasible after low-level and mid-level manipulation and task planning is solved.
Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html
Intuitively, yes. But is it really true in practice?
Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.
In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".
I’m curious how they prompt the model or otherwise tell it what its goal is. They seem to suggest some language processing — perhaps they’re starting with a multimodal text + vision LLM?
If it really is fully autonomous, that first video is insane. I struggle to put those little tags into the slot in the box sometimes, and I'm pretty sure I'm human, but the bot gets it on the first attempt.
Yeah, this company (GeneralistAI) is, in my opinion, the most advanced robotics+AI company in the world. Slightly behind them Google DeepMind Robotics and Physical Intelligence, and then the rest.
Kinda genius to scale exoskeleton data collection with UMI grippers when most labs are chasing "general" VLMs / VLAs by training on human demonstration videos.
Imo the latter will be very useful for semantic planning and reasoning, but only after manipulation is solved.
A ballpark cost estimate -
- $10 to $20 hourly wages for the data collectors
- $100,000 to $200,000 per day for 10,000 hours of data
- ~1,500 to 2,500 data collectors doing 4 to 6 hours daily
- $750K to $1.25M on hardware costs at $500 per gripper
Fully loaded cost between $4M to $8M for 270,000 hours of data.
Not bad considering the alternatives.
For example, teleoperation is way less efficient - it's 5x-6x slower than human demos, and 2x-3x more expensive per hour of operator time. But could become feasible after low-level and mid-level manipulation and task planning is solved.
Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html
Intuitively, yes. But is it really true in practice?
Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.
In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".
I’m curious how they prompt the model or otherwise tell it what its goal is. They seem to suggest some language processing — perhaps they’re starting with a multimodal text + vision LLM?
If it really is fully autonomous, that first video is insane. I struggle to put those little tags into the slot in the box sometimes, and I'm pretty sure I'm human, but the bot gets it on the first attempt.
Yeah, this company (GeneralistAI) is, in my opinion, the most advanced robotics+AI company in the world. Slightly behind them Google DeepMind Robotics and Physical Intelligence, and then the rest.