We scan new podcasts and send you the top 5 insights daily.
Sergey Edunov, former Llama team lead, claims that LLM architectures have not fundamentally changed since the 2017 Transformer paper. He pivoted to drug discovery AI because the model architectures required for physical sciences are more diverse, complex, and present more interesting research challenges.
Similar to how an LLM uses a 'chain of thought' to reason, Genesis's model 'thinks' by iteratively refining an in-memory representation of a crystal structure. This process is guided by physics-based principles, significantly improving the final prediction's accuracy.
Generating truly novel and valid scientific hypotheses requires a specialized, multi-stage AI process. This involves using a reasoning model for idea generation, a literature-grounded model for validation, and a third system for checking originality against existing research. This layered approach overcomes the limitations of a single, general-purpose LLM.
While GANs failed for protein systems, diffusion models became the key primitive. Now, the frontier of diffusion research is in specialized scientific areas like 3D structure prediction, surpassing the innovation seen in more mainstream AI applications like image generation.
Current LLM agents are effective at executing and optimizing experiments within a defined research track, like hyperparameter tuning. However, they lack the crucial scientific skill of 'lateral thinking'—recognizing when a research path is a dead end and strategically pivoting to a fundamentally new approach.
Despite concerns about the limits of Large Language Models, Microsoft AI's CEO is confident the current transformer architecture is sufficient for achieving superintelligence. Future leaps will come from new methods built on top of LLMs—like advanced reasoning, memory, and recurrency—rather than a fundamental architectural shift.
Contrary to the prevailing 'scaling laws' narrative, leaders at Z.AI believe that simply adding more data and compute to current Transformer architectures yields diminishing returns. They operate under the conviction that a fundamental performance 'wall' exists, necessitating research into new architectures for the next leap in capability.
Dr. Juraji argues against a single "do-it-all" AI. Instead, he envisions a future of "speciated" AI systems where different modules, like the lobes of a brain (e.g., LLMs, causal AI), work together to tackle the multifaceted challenges of drug development.
While acknowledging the power of Large Language Models (LLMs) for linear biological data like protein sequences, CZI's strategy recognizes that biological processes are highly multidimensional and non-linear. The organization is focused on developing new types of AI that can accurately model this complexity, moving beyond the one-dimensional, sequential nature of language-based models.
Turing Award winner Jan LeCun's departure from Meta and public criticism of its 'LLM-pilled' strategy is more than corporate drama. It represents a vital, oppositional viewpoint arguing for 'world models' over scaling LLMs. This intellectual friction is crucial for preventing stagnation and advancing the entire field of AI.
The era of simply scaling up Transformer-based models is ending. AI21's Jamba model, which combines Transformer and Mamba architectures, points to a new innovation wave focused on hybrid designs. This shift aims to improve efficiency and specialized capabilities like long-context processing, moving beyond the 2017 paradigm.