Despite massive financial incentives, high-frequency trading firms rarely develop custom ASICs. CZ explains that FPGAs offer the best trade-off between speed and flexibility. Trading algorithms change too frequently, making the long development cycle of custom silicon impractical compared to reprogrammable FPGAs.

Related Insights

While purpose-built chips (ASICs) like Google's TPU are efficient, the AI industry is still in an early, experimental phase. GPUs offer the programmability and flexibility needed to develop new algorithms, as ASICs risk being hard-coded for models that quickly become obsolete.

Meta is deprioritizing its custom silicon program, opting for large orders of AMD's chips. This reflects a broader trend among hyperscalers: the urgent need for massive, immediate compute power is outweighing the long-term strategic goal of self-sufficiency and avoiding the "Nvidia tax."

For a hyperscaler, the main benefit of designing a custom AI chip isn't necessarily superior performance, but gaining control. It allows them to escape the supply allocations dictated by NVIDIA and chart their own course, even if their chip is slightly less performant or more expensive to deploy.

Just as TSMC enabled "fabless" giants like NVIDIA, Recursive Intelligence envisions a "designless" paradigm. They aim to provide AI-driven chip design as a service, allowing companies to procure custom silicon without the massive overhead of hiring and managing large, specialized hardware engineering teams.

True co-design between AI models and chips is currently impossible due to an "asymmetric design cycle." AI models evolve much faster than chips can be designed. By using AI to drastically speed up chip design, it becomes possible to create a virtuous cycle of co-evolution.

NVIDIA's commitment to programmable GPUs over fixed-function ASICs (like a "transformer chip") is a strategic bet on rapid AI innovation. Since models are evolving so quickly (e.g., hybrid SSM-transformers), a flexible architecture is necessary to capture future algorithmic breakthroughs.

CZ spent nearly a decade, from his first internship in Tokyo to managing a team at Bloomberg, exclusively building low-latency order execution systems for traditional finance. This deep, niche expertise became his unfair advantage when building Binance's high-performance matching engine.

OpenAI is designing its custom chip for flexibility, not just raw performance on current models. The team learned that major 100x efficiency gains come from evolving algorithms (e.g., dense to sparse transformers), so the hardware must be adaptable to these future architectural changes.

The current 2-3 year chip design cycle is a major bottleneck for AI progress, as hardware is always chasing outdated software needs. By using AI to slash this timeline, companies can enable a massive expansion of custom chips, optimizing performance for many at-scale software workloads.

Specialized chips (ASICs) like Google's TPU lack the flexibility needed in the early stages of AI development. AMD's CEO asserts that general-purpose GPUs will remain the majority of the market because developers need the freedom to experiment with new models and algorithms, a capability that cannot be hard-coded into purpose-built silicon.