Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

The biggest obstacle to fully AI-driven hardware design is the absence of a large, public training dataset. Unlike software code, circuit board designs are proprietary and siloed within companies like Apple and SpaceX. Until this data is generated or aggregated, model capability will be constrained, regardless of architectural breakthroughs.

Related Insights

Even the most advanced AI model can't accelerate science without practical, real-world data. The current bottleneck is often logistical—knowing reagent lead times, lab inventory, and costs. Superior model intelligence is less critical than having access to this operational context.

The rapid progress of many LLMs was possible because they could leverage the same massive public dataset: the internet. In robotics, no such public corpus of robot interaction data exists. This “data void” means progress is tied to a company's ability to generate its own proprietary data.

Public internet data has been largely exhausted for training AI models. The real competitive advantage and source for next-generation, specialized AI will be the vast, untapped reservoirs of proprietary data locked inside corporations, like R&D data from pharmaceutical or semiconductor companies.

The next leap for hardware—AI generating complex 3D CAD designs—is blocked by a data bottleneck. CAD files are a company's most valuable IP, so firms won't share them to train models. The solution may lie in on-premise models or starting with the hobbyist community.

Instead of training models on scarce circuit board data, Diode Computers built a compiler that makes hardware design look like a Python program. This allows powerful language models, which are expert coders, to design physical hardware by leveraging their existing capabilities, bypassing the data bottleneck.

For years, access to compute was the primary bottleneck in AI development. Now, as public web data is largely exhausted, the limiting factor is access to high-quality, proprietary data from enterprises and human experts. This shifts the focus from building massive infrastructure to forming data partnerships and expertise.

True co-design between AI models and chips is currently impossible due to an "asymmetric design cycle." AI models evolve much faster than chips can be designed. By using AI to drastically speed up chip design, it becomes possible to create a virtuous cycle of co-evolution.

While data was once a major constraint for training AI, models can now effectively create their own synthetic data. This has shifted the critical choke points in the AI supply chain to physical infrastructure like power grids and data center construction, which are now the primary limiters of growth.

The internet is an insufficient training ground for scientific AI because most crucial information—including failed experiments, negative data, and nuanced procedural details—is never published. This undocumented knowledge, what scientists call "good hands," represents a major data bottleneck for building truly intelligent scientific models.

Brett Adcock states that Figure AI's "Helix 2" neural net provides the right technical stack for general robotics. The biggest remaining obstacle is not hardware but the immense data required to train the robot for a wide distribution of tasks. The company plans to spend nine figures on data acquisition in 2026 to solve this.

Lack of Public Data, Not Model Capability, Is AI's Main Bottleneck in Hardware | RiffOn