A Fleet of 10,000 Humanoid Robots Can Generate Data at YouTube's Daily Upload Rate

Related Insights

Robotics AI Models Are Bootstrapped with YouTube Videos and Simulations

The primary challenge in robotics AI is the lack of real-world training data. To solve this, models are bootstrapped using a combination of learning from human lifestyle videos and extensive simulation environments. This creates a foundational model capable of initial deployment, which then generates a real-world data flywheel.

Tech Turns to Mining, Meta VR Layoffs, Thinking Machines Shakeup | Matthew Prince, Chirantan Desai, Delian Asparouhov, Deepak Pathak, David Tearse, Blake Resnick

TBPN·6 months ago

Internet Video Is the Best Foundational Training Data for Generalist Robots

To build generalist robots, the most effective approach is pre-training foundation models on internet-scale video datasets, not just simulation or tele-operated data. This vast, diverse data provides a deep, implicit understanding of physics and object interaction that is impossible to replicate in controlled environments, enabling true generalization.

Nvidia Invests in Thinking Machines, Meta Acquires Moltbook, BYD F1 | Olivia Moore, David Paffenholz, Adam Goldstein, Max Junestrand, Allan McLennan, Jagdeep Singh, Scott Hickle

TBPN·4 months ago

AI Capability Improves Non-Linearly With Massive Increases in Training Data

A key surprise in AI development was the non-linear impact of scale. Sebastian Thrun noted that while AI trained on millions of documents is 'fine,' training it on hundreds of billions creates an 'unbelievably smart' system, shocking even its creators and demonstrating data volume as a primary driver of breakthroughs.

Search Engine Presents: Are you a good driver?

Odd Lots·3 months ago

Musk Plans an 'Optimus Academy' With 20,000 Real Robots to Solve the AI Data Problem

Unlike cars, which gather data passively, humanoid robots need active training. To solve this, Musk's strategy is to build a physical 'academy' of 10,000-30,000 Optimus robots performing self-play on various tasks, using this real-world data to close the 'sim-to-real' gap from millions of simulated robots.

Elon Musk on Space GPUs, AI, Optimus, and his manufacturing method

Cheeky Pint·5 months ago

Humanoid Robot Development is Bottlenecked by In-Home Data Collection, Not Hardware

Progress in robotics for household tasks is limited by a scarcity of real-world training data, not mechanical engineering. Companies are now deploying capital-intensive "in-field" teams to collect multi-modal data from inside homes, capturing the complexity of mundane human activities to train more capable robots.

Centific’s Role in AI Boom, Databricks $134B Valuation, Alien Hunter Funding | Dec 16, 2025

The Information's TITV·7 months ago

Scarce, Actively Generated Data Is the New Moat for Robotics and Biology AI

The future of valuable AI lies not in models trained on the abundant public internet, but in those built on scarce, proprietary data. For fields like robotics and biology, this data doesn't exist to be scraped; it must be actively created, making the data generation process itself the key competitive moat.

Josh Wolfe & Brett McGurk – Venture, Geopolitics, and the Next Frontier (EP.476)

Capital Allocators – Inside the Institutional Investment Industry·7 months ago

Robot Startup ONE X Prioritizes Home Deployment to Train More General AI

Contrary to starting in controlled industrial settings, ONE X believes the complex, diverse, and social nature of the home is the best environment to develop true general intelligence. The robot must learn to navigate social context, like holding a door for someone, which is data unavailable in a factory.

This 22-Year-Old Built TikTok for Mobile Games, and It’s Growing Fast | E2276

This Week in Startups·3 months ago

Figure AI CEO Claims Data Scarcity, Not Hardware, Is the Main Bottleneck for General Robotics

Brett Adcock states that Figure AI's "Helix 2" neural net provides the right technical stack for general robotics. The biggest remaining obstacle is not hardware but the immense data required to train the robot for a wide distribution of tasks. The company plans to spend nine figures on data acquisition in 2026 to solve this.

Anthropic Hits $380B Valuation, Become Unsloppable, WSJ Mansion Section | Martin Shkreli, Connor Hayes, Alex Bouzari, Brett Adcock

TBPN·5 months ago

Humanoid Robot Companies Sell Hardware at a Loss to Gather Valuable Training Data

Firms are deploying consumer robots not for immediate profit but as a data acquisition strategy. By selling hardware below cost, they collect vast amounts of real-world video and interaction data, which is the true asset used to train more advanced and capable AI models for future applications.

What in the world: predictions for 2026

Economist Podcasts·6 months ago

Humanoid Robots Must Physically Mimic Humans to Learn from Existing Video Data

ONE X designs its robots with human-like physical properties, down to skin tissue stiffness. This allows them to effectively leverage the internet's vast repository of human video data (e.g., YouTube) as a training set, bootstrapping intelligence without needing to create an entirely new internet-sized dataset.

This 22-Year-Old Built TikTok for Mobile Games, and It’s Growing Fast | E2276

This Week in Startups·3 months ago

Get your free personalized podcast brief

Related Insights