Progress in robotics for household tasks is limited by a scarcity of real-world training data, not mechanical engineering. Companies are now deploying capital-intensive "in-field" teams to collect multi-modal data from inside homes, capturing the complexity of mundane human activities to train more capable robots.
To overcome the data bottleneck in robotics, Sunday developed gloves that capture human hand movements. This allows them to train their robot's manipulation skills without needing a physical robot for teleoperation. By separating data gathering (gloves) from execution (robot), they can scale their training dataset far more efficiently than competitors who rely on robot-in-the-loop data collection methods.
The rapid progress of many LLMs was possible because they could leverage the same massive public dataset: the internet. In robotics, no such public corpus of robot interaction data exists. This “data void” means progress is tied to a company's ability to generate its own proprietary data.
For consumer robotics, the biggest bottleneck is real-world data. By aggressively cutting costs to make robots affordable, companies can deploy more units faster. This generates a massive data advantage, creating a feedback loop that improves the product and widens the competitive moat.
Physical Intelligence demonstrated an emergent capability where its robotics model, after reaching a certain performance threshold, significantly improved by training on egocentric human video. This solves a major bottleneck by leveraging vast, existing video datasets instead of expensive, limited teleoperated data.
The future of valuable AI lies not in models trained on the abundant public internet, but in those built on scarce, proprietary data. For fields like robotics and biology, this data doesn't exist to be scraped; it must be actively created, making the data generation process itself the key competitive moat.
While Figure's CEO criticizes competitors for using human operators in robot videos, this 'wizard of oz' technique is a critical data-gathering and development stage. Just as early Waymo cars had human operators, teleoperation is how companies collect the training data needed for true autonomy.
The adoption of powerful AI architectures like transformers in robotics was bottlenecked by data quality, not algorithmic invention. Only after data collection methods improved to capture more dexterous, high-fidelity human actions did these advanced models become effective, reversing the typical 'algorithm-first' narrative of AI progress.
The robotics field has a scalable recipe for AI-driven manipulation (like GPT), but hasn't yet scaled it into a polished, mass-market consumer product (like ChatGPT). The current phase focuses on scaling data and refining systems, not just fundamental algorithm discovery, to bridge this gap.
Ken Goldberg quantifies the challenge: the text data used to train LLMs would take a human 100,000 years to read. Equivalent data for robot manipulation (vision-to-control signals) doesn't exist online and must be generated from scratch, explaining the slower progress in physical AI.
The "bitter lesson" (scale and simple models win) works for language because training data (text) aligns with the output (text). Robotics faces a critical misalignment: it's trained on passive web videos but needs to output physical actions in a 3D world. This data gap is a fundamental hurdle that pure scaling cannot solve.