Cohere's CEO believes if Google had hidden the Transformer paper, another team would have created it within 18 months. Key ideas were already circulating in the research community, making the discovery a matter of synthesis whose time had come, rather than a singular stroke of genius.
While more data and compute yield linear improvements, true step-function advances in AI come from unpredictable algorithmic breakthroughs like Transformers. These creative ideas are the most difficult to innovate on and represent the highest-leverage, yet riskiest, area for investment and research focus.
With industry dominating large-scale compute, academia's function is no longer to train the biggest models. Instead, its value lies in pursuing unconventional, high-risk research in areas like new algorithms, architectures, and theoretical underpinnings that commercial labs, focused on scaling, might overlook.
The hypothesis for ImageNet—that computers could learn to "see" from vast visual data—was sparked by Dr. Li's reading of psychology research on how children learn. This demonstrates that radical innovation often emerges from the cross-pollination of ideas from seemingly unrelated fields.
Unlike traditional engineering, breakthroughs in foundational AI research often feel binary. A model can be completely broken until a handful of key insights are discovered, at which point it suddenly works. This "all or nothing" dynamic makes it impossible to predict timelines, as you don't know if a solution is a week or two years away.
Fei-Fei Li's lab believed they were the first to combine ConvNets and LSTMs for image captioning, only to discover through a journalist that a team at Google had developed the same breakthrough concurrently. This highlights the phenomenon of parallel innovation in scientific research.
OpenAI, the initial leader in generative AI, is now on the defensive as competitors like Google and Anthropic copy and improve upon its core features. This race demonstrates that being first offers no lasting moat; in fact, it provides a roadmap for followers to surpass the leader, creating a first-mover disadvantage.
The current trend toward closed, proprietary AI systems is a misguided and ultimately ineffective strategy. Ideas and talent circulate regardless of corporate walls. True, defensible innovation is fostered by openness and the rapid exchange of research, not by secrecy.
The "Attention is All You Need" paper's key breakthrough was an architecture designed for massive scalability across GPUs. This focus on efficiency, anticipating the industry's shift to larger models, was more crucial to its dominance than the attention mechanism itself.
Google authored the seminal 'Transformers' AI paper but failed to capitalize on it, allowing outsiders to build the next wave of AI. This shows how incumbents can be so 'lost in the sauce' of their current paradigm that they don't notice when their own research creates a fundamental shift.
A key strategy for labs like Anthropic is automating AI research itself. By building models that can perform the tasks of AI researchers, they aim to create a feedback loop that dramatically accelerates the pace of innovation.