AI chip startup Talos takes a contrarian approach by casting models "straight into silicon," creating inflexible, model-specific hardware. This trades flexibility for massive gains in speed and cost, betting that frontier models will remain stable for periods of 3-12 months, making the "cartridge-swap" model economically viable.
While purpose-built chips (ASICs) like Google's TPU are efficient, the AI industry is still in an early, experimental phase. GPUs offer the programmability and flexibility needed to develop new algorithms, as ASICs risk being hard-coded for models that quickly become obsolete.
New AI models are designed to perform well on available, dominant hardware like NVIDIA's GPUs. This creates a self-reinforcing cycle where the incumbent hardware dictates which model architectures succeed, making it difficult for superior but incompatible chip designs to gain traction.
Designing custom AI hardware is a long-term bet. Google's TPU team co-designs chips with ML researchers to anticipate future needs. They aim to build hardware for the models that will be prominent 2-6 years from now, sometimes embedding speculative features that could provide massive speedups if research trends evolve as predicted.
True co-design between AI models and chips is currently impossible due to an "asymmetric design cycle." AI models evolve much faster than chips can be designed. By using AI to drastically speed up chip design, it becomes possible to create a virtuous cycle of co-evolution.
NVIDIA's commitment to programmable GPUs over fixed-function ASICs (like a "transformer chip") is a strategic bet on rapid AI innovation. Since models are evolving so quickly (e.g., hybrid SSM-transformers), a flexible architecture is necessary to capture future algorithmic breakthroughs.
OpenAI is designing its custom chip for flexibility, not just raw performance on current models. The team learned that major 100x efficiency gains come from evolving algorithms (e.g., dense to sparse transformers), so the hardware must be adaptable to these future architectural changes.
For a $1B training run, the subsequent inference costs will exceed $1B. A custom ASIC could save over 20% ($200M+), which is enough to fund the chip's tape-out. This shifts the hardware bottleneck from manufacturing cost to development timeline.
The current 2-3 year chip design cycle is a major bottleneck for AI progress, as hardware is always chasing outdated software needs. By using AI to slash this timeline, companies can enable a massive expansion of custom chips, optimizing performance for many at-scale software workloads.
Specialized chips (ASICs) like Google's TPU lack the flexibility needed in the early stages of AI development. AMD's CEO asserts that general-purpose GPUs will remain the majority of the market because developers need the freedom to experiment with new models and algorithms, a capability that cannot be hard-coded into purpose-built silicon.
At a massive scale, chip design economics flip. For a $1B training run, the potential efficiency savings on compute and inference can far exceed the ~$200M cost to develop a custom ASIC for that specific task. The bottleneck becomes chip production timelines, not money.