Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

AI thrives on learning from the vast, structured data evolution provides for proteins. Molly Gibson explains that small molecules lack this clear "language" or evolutionary history. This fundamental data gap is a primary reason generative AI has been slower to transform small molecule drug discovery compared to biologics.

Related Insights

To evolve AI from pattern matching to understanding physics for protein engineering, structural data is insufficient. Models need physical parameters like Gibbs free energy (delta-G), obtainable from affinity measurements, to become truly predictive and transformative for therapeutic development.

The lengthy timelines of drug development create a significant perception lag for AI's impact. Molly Gibson clarifies that molecules currently in clinical trials were designed years ago using nascent AI models. The true capabilities of today's more advanced AI platforms won't be evident in approved drugs for several more years.

Unlike traditional methods that simulate physical interactions like a key in a lock, ProPhet's AI learns the fundamental patterns governing why certain molecules and proteins interact. This allows for prediction without needing slow, expensive, and often impossible physical or computational simulations.

To break the data bottleneck in AI protein engineering, companies now generate massive synthetic datasets. By creating novel "synthetic epitopes" and measuring their binding, they can produce thousands of validated positive and negative training examples in a single experiment, massively accelerating model development.

The progress of AI in predicting cancer treatment is stalled not by algorithms, but by the data used to train them. Relying solely on static genetic data is insufficient. The critical missing piece is functional, contextual data showing how patient cells actually respond to drugs.

Current AI for protein engineering relies on small public datasets like the PDB (~10,000 structures), causing models to "hallucinate" or default to known examples. This data bottleneck, orders of magnitude smaller than data used for LLMs, hinders the development of novel therapeutics.

The bottleneck for AI in drug development isn't the sophistication of the models but the absence of large-scale, high-quality biological data sets. Without comprehensive data on how drugs interact within complex human systems, even the best AI models cannot make accurate predictions.

ProPhet's strategy is to focus on 'hard-to-drug' proteins, which are often avoided because they lack the structural data required for traditional discovery. Because ProPhet's AI model needs very little protein information to predict interactions, this data scarcity becomes a competitive advantage.

Beyond accelerating timelines, AI's real value lies in its ability to design molecules for targets previously considered 'hard-to-drug.' These models operate on different principles than traditional lab methods and are indifferent to historical challenges, opening up entirely new therapeutic possibilities.

Generate Biomedicines' AI learns the fundamental rules of protein structure and function, much like a language's grammar. This allows it to design entirely new proteins by generating novel "sentences" (sequences) that are biologically coherent and functional, rather than just mimicking existing ones found in nature.