BioHub's Alex Rives Bets on Scaling Laws, Not Human Priors, to Model Proteins

Related Insights

Boost Biology AI Accuracy By Massively Sampling and Then Ranking Results

A key strategy for improving results from generative protein models is "inference-time scaling." This involves generating a vast number of potential structures and then using a separate, fine-tuned scoring model to rank them. This search-and-rank process uncovers high-quality solutions the model might otherwise miss.

🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery

Latent Space: The AI Engineer Podcast·5 months ago

AlphaFold's Success Shows Machine Learning on Experimental Data Beats First-Principles Simulation

DE Shaw Research (DESRES) invested heavily in custom silicon for molecular dynamics (MD) to solve protein folding. In contrast, DeepMind's AlphaFold, using ML on experimental data, solved it on commodity hardware. This demonstrates data-driven approaches can be vastly more effective than brute-force simulation for complex scientific problems.

🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White

Latent Space: The AI Engineer Podcast·6 months ago

Noisy Metagenomic Data, Not Curated Sequences, Unlocked Protein Model Scaling

The ESM-C model's performance leap came from adding billions of "noisy" protein sequences from environmental samples. This vast, diverse dataset overcame the limitations of curated databases like Uniref, removing the data bottleneck and revealing clear scaling laws.

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

Latent Space: The AI Engineer Podcast·2 months ago

AI Models Like AlphaFold May Represent a New Type of Scientific Explanation

Unlike classic theories based on simple equations, large AI models represent a new kind of scientific object. Rather than being mere predictive tools, they could be a novel form of explanation that we must learn to manipulate through new operations like distillation and merging, much like Mathematica made massive equations workable.

Michael Nielsen – How science actually progresses

Dwarkesh Podcast·3 months ago

AI Models Are Designing Successful Proteins That Defy Human Structural Intuition

An anecdote about a "wonky" BindCraft design with disconnected beta sheets, which experts predicted would fail, highlights a key trend. The resulting binder was one of the best ever produced, suggesting AI models are extracting structural principles that go beyond traditional human "protein literacy" and intuition.

Martin Pacesa on BindCraft: An Automated Pipeline for De Novo Protein Binder Design

The Chain: Protein Engineering Podcast·3 months ago

Protein Models Learn Function by Applying a 1954 Linguistic Theory to Amino Acids

The success of protein language models can be explained by Zellig Harris's 1954 linguistic theory. Just as a word's meaning is defined by its contexts, an amino acid's biological role is determined by the sequences it can appear in. The model learns this deep statistical structure, effectively learning biology.

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

Latent Space: The AI Engineer Podcast·2 months ago

Specialized Architectures Still Beat Transformers for Protein Structure Prediction

Contrary to trends in other AI fields, structural biology problems are not yet dominated by simple, scaled-up transformers. Specialized architectures that bake in physical priors, like equivariance, still yield vastly superior performance, as the domain's complexity requires strong inductive biases.

🔬Beyond AlphaFold: How Boltz is Open-Sourcing the Future of Drug Discovery

Latent Space: The AI Engineer Podcast·5 months ago

Protein Language Models Spontaneously Learn Biology's Textbook Hierarchy

Trained only on sequence prediction, ESM-C independently developed a hierarchical feature space mirroring decades of human scientific discovery. Its learned representations range from basic biochemical properties to complex, abstract functional concepts, all without prior biological knowledge.

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

Latent Space: The AI Engineer Podcast·2 months ago

BioHub Designs Therapeutic Antibodies by Searching, Not Generating, Its Protein World Model

ESM-C is used as a predictive "world model" rather than a direct generator. Protein design, including for complex antibodies (SCFVs), is framed as a search problem: find molecules within the model's learned space that satisfy desired criteria. This approach is achieving therapeutically relevant binding affinities.

🔬ESMFold2: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHub

Latent Space: The AI Engineer Podcast·2 months ago

AI Learns Biology's "Grammar" to Design Proteins Beyond Evolution's Scope

Generate Biomedicines' AI learns the fundamental rules of protein structure and function, much like a language's grammar. This allows it to design entirely new proteins by generating novel "sentences" (sequences) that are biologically coherent and functional, rather than just mimicking existing ones found in nature.

Molly Gibson: Superintelligence and the Future of Drug Development

Behind the Breakthroughs·3 months ago

Get your free personalized podcast brief

Related Insights