GEN-0 / Embodied Foundation Models That Scale with Physical Interaction

blueblisters 14 hours ago

Kinda genius to scale exoskeleton data collection with UMI grippers when most labs are chasing "general" VLMs / VLAs by training on human demonstration videos.

Imo the latter will be very useful for semantic planning and reasoning, but only after manipulation is solved.

A ballpark cost estimate -

- $10 to $20 hourly wages for the data collectors

- $100,000 to $200,000 per day for 10,000 hours of data

- ~1,500 to 2,500 data collectors doing 4 to 6 hours daily

- $750K to $1.25M on hardware costs at $500 per gripper

Fully loaded cost between $4M to $8M for 270,000 hours of data.

Not bad considering the alternatives.

For example, teleoperation is way less efficient - it's 5x-6x slower than human demos, and 2x-3x more expensive per hour of operator time. But could become feasible after low-level and mid-level manipulation and task planning is solved.

v9v 12 hours ago

Not teleoperating can have certain disadvantages due to mismatches between how humans move vs. how robots move though. See here: https://evjang.com/2024/08/31/motors.html
- ACCount37 10 hours ago
  
  Intuitively, yes. But is it really true in practice?
  Thinking about it, I'm reminded of various "additive training" tricks. Teach an AI to do A, and then to do B, and it might just generalize that to doing A+B with no extra training. Works often enough on things like LLMs.
  In this case, we use non-robot data to teach an AI how to do diverse tasks, and robot-specific data (real or sim) to teach an AI how to operate a robot body. Which might generalize well enough to "doing diverse tasks through a robot body".

amluto 15 hours ago

I’m curious how they prompt the model or otherwise tell it what its goal is. They seem to suggest some language processing — perhaps they’re starting with a multimodal text + vision LLM?

tyushk 17 hours ago

If it really is fully autonomous, that first video is insane. I struggle to put those little tags into the slot in the box sometimes, and I'm pretty sure I'm human, but the bot gets it on the first attempt.

krasin 16 hours ago

Yeah, this company (GeneralistAI) is, in my opinion, the most advanced robotics+AI company in the world. Slightly behind them Google DeepMind Robotics and Physical Intelligence, and then the rest.