Contrary to trends in other AI fields, structural biology problems are not yet dominated by simple, scaled-up transformers. Specialized architectures that bake in physical priors, like equivariance, still yield vastly superior performance, as the domain's complexity requires strong inductive biases.

Related Insights

The AI industry is hitting data limits for training massive, general-purpose models. The next wave of progress will likely come from creating highly specialized models for specific domains, similar to DeepMind's AlphaFold, which can achieve superhuman performance on narrow tasks.

Unlike LLMs, parameter count is a misleading metric for AI models in structural biology. These models have fewer than a billion parameters but are more computationally expensive to run due to cubic operations that model pairwise interactions, making inference cost the key bottleneck.

Instead of building from scratch, ProPhet leverages existing transformer models to create unique mathematical 'languages' for proteins and molecules. Their core innovation is an additional model that translates between them, creating a unified space to predict interactions at scale.

A classical, bottom-up simulation of a cell is infeasible, according to John Jumper. He sees the more practical path forward as fusing specialized models like AlphaFold with the broad reasoning of LLMs to create hybrid systems that understand biology.

A key strategy for improving results from generative protein models is "inference-time scaling." This involves generating a vast number of potential structures and then using a separate, fine-tuned scoring model to rank them. This search-and-rank process uncovers high-quality solutions the model might otherwise miss.

DE Shaw Research (DESRES) invested heavily in custom silicon for molecular dynamics (MD) to solve protein folding. In contrast, DeepMind's AlphaFold, using ML on experimental data, solved it on commodity hardware. This demonstrates data-driven approaches can be vastly more effective than brute-force simulation for complex scientific problems.

Modern protein models use a generative approach (diffusion) instead of regression. Instead of predicting one "correct" structure, they model a distribution of possibilities. This better handles molecular dynamism and avoids averaging between multiple valid states, which is a flaw of regression models.

Models like AlphaFold don't solve protein folding from physics alone. They heavily rely on co-evolutionary data, where correlated mutations across species provide strong hints about which amino acids are physically close. This dramatically constrains the search space for the final structure.

AlphaFold 2 was a breakthrough for predicting single protein structures. However, this success highlighted the much larger, unsolved challenges of modeling protein interactions, their dynamic movements, and the actual folding process, which are critical for understanding disease and drug discovery.

Generative AI alone designs proteins that look correct on paper but often fail in the lab. DenovAI adds a physics layer to simulate molecular dynamics—the "jiggling and wiggling"—which weeds out false positives by modeling how proteins actually interact in the real world.