AlphaFold 2 was a breakthrough for predicting single protein structures. However, this success highlighted the much larger, unsolved challenges of modeling protein interactions, their dynamic movements, and the actual folding process, which are critical for understanding disease and drug discovery.

Related Insights

Instead of building from scratch, ProPhet leverages existing transformer models to create unique mathematical 'languages' for proteins and molecules. Their core innovation is an additional model that translates between them, creating a unified space to predict interactions at scale.

DE Shaw Research (DESRES) invested heavily in custom silicon for molecular dynamics (MD) to solve protein folding. In contrast, DeepMind's AlphaFold, using ML on experimental data, solved it on commodity hardware. This demonstrates data-driven approaches can be vastly more effective than brute-force simulation for complex scientific problems.

Modern protein models use a generative approach (diffusion) instead of regression. Instead of predicting one "correct" structure, they model a distribution of possibilities. This better handles molecular dynamism and avoids averaging between multiple valid states, which is a flaw of regression models.

Models like AlphaFold don't solve protein folding from physics alone. They heavily rely on co-evolutionary data, where correlated mutations across species provide strong hints about which amino acids are physically close. This dramatically constrains the search space for the final structure.

AlphaFold's success in identifying a key protein for human fertilization (out of 2,000 possibilities) showcases AI's power. It acts as a hypothesis generator, dramatically reducing the search space for expensive and time-consuming real-world experiments.

Contrary to trends in other AI fields, structural biology problems are not yet dominated by simple, scaled-up transformers. Specialized architectures that bake in physical priors, like equivariance, still yield vastly superior performance, as the domain's complexity requires strong inductive biases.

John Jumper uses an analogy to explain the leap in complexity from prediction to design. Predicting a protein's structure is like recognizing a bicycle's parts. Designing a new, functional protein is like building a working bicycle—requiring every detail to be correct.

Users on Twitter figured out how to use AlphaFold to predict protein-protein interactions—a key capability the DeepMind team was still developing separately. This highlights the power of open models to unlock emergent capabilities discovered by the community.

Following the success of AlphaFold in predicting protein structures, Demis Hassabis says DeepMind's next grand challenge is creating a full AI simulation of a working cell. This 'virtual cell' would allow researchers to test hypotheses about drugs and diseases millions of times faster than in a physical lab.

Generative AI alone designs proteins that look correct on paper but often fail in the lab. DenovAI adds a physics layer to simulate molecular dynamics—the "jiggling and wiggling"—which weeds out false positives by modeling how proteins actually interact in the real world.