Figure is observing that data from one robot performing a task (e.g., moving packages in a warehouse) improves the performance of other robots on completely different tasks (e.g., folding laundry at home). This powerful transfer learning, enabled by deep learning, is a key driver for scaling general-purpose capabilities.
Amazon’s strategic advantage isn't just in developing AI for AWS and robots for warehouses. The real breakthrough is the convergence of these technologies, where AI provides the "brain" that transforms programmed machines into adaptive, learning systems, accelerating automation's impact.
Physical Intelligence demonstrated an emergent capability where its robotics model, after reaching a certain performance threshold, significantly improved by training on egocentric human video. This solves a major bottleneck by leveraging vast, existing video datasets instead of expensive, limited teleoperated data.
Progress in robotics for household tasks is limited by a scarcity of real-world training data, not mechanical engineering. Companies are now deploying capital-intensive "in-field" teams to collect multi-modal data from inside homes, capturing the complexity of mundane human activities to train more capable robots.
The adoption of powerful AI architectures like transformers in robotics was bottlenecked by data quality, not algorithmic invention. Only after data collection methods improved to capture more dexterous, high-fidelity human actions did these advanced models become effective, reversing the typical 'algorithm-first' narrative of AI progress.
Instead of simulating photorealistic worlds, robotics firm Flexion trains its models on simplified, abstract representations. For example, it uses perception models like Segment Anything to 'paint' a door red and its handle green. By training on this simplified abstraction, the robot learns the core task (opening doors) in a way that generalizes across all real-world doors, bypassing the need for perfect simulation.
Experience in robotics, where systems often fail, cultivates resilience and a deep focus on analyzing data to debug problems. This "gritty" skill set is highly transferable and valuable in the world of large language models, where perseverance and data intuition are key.
A human driver's lesson from a mistake is isolated. In contrast, when one self-driving car makes an error and learns, the correction is instantly propagated to all other cars in the network. This collective learning creates an exponential improvement curve that individual humans cannot match.
Intuition Robotics' core bet is that the transfer from simulated to physical worlds is unlocked by a shared action interface. Since many real-world robots like drones and arms are already operated with game controllers, an agent trained in diverse gaming environments only needs to adapt to a new visual world, not an entirely new action space.
Manipulating deformable objects like towels was long considered one of the final, hardest challenges in robotics due to their infinite variations. The fact that Figure's neural networks can now successfully fold laundry indicates that the core technological hurdles for truly general-purpose robots have been overcome.
Unlike older robots requiring precise maps and trajectory calculations, new robots use internet-scale common sense and learn motion by mimicking humans or simulations. This combination has “wiped the slate clean” for what is possible in the field.