We scan new podcasts and send you the top 5 insights daily.
AI cannot yet revolutionize drug discovery because its strength is synthesizing existing knowledge. The problem is that humans only understand about 20% of the human body's biology, meaning the foundational dataset is too incomplete for AI to reliably predict outcomes for the unknown 80%.
While AI excels at screening vast compound libraries for potential drug candidates, it cannot overcome the ultimate bottleneck: the messy, complex, and poorly documented reality of human biology. The need for physical clinical trials remains the fundamental constraint on medical progress.
The bottleneck for AI in drug discovery is not the algorithm but the lack of high-quality, large-scale biological data. New platforms are needed to generate this necessary "substrate" for AI models to learn from, challenging the narrative that better models alone are the solution.
AI models trained on descriptive data (e.g., RNA-seq) can classify cell states but fail to predict how to transition a diseased cell to a healthy one. True progress requires generating massive "causal" datasets that show the effects of specific genetic perturbations.
Despite the buzz, a clinical development expert cautions that AI's impact in drug development is limited. The primary bottleneck isn't the algorithms but the lack of sufficient, high-quality human biological data that can be translated into reliable predictions, as animal models often fail to provide it.
Despite AI's power, 90% of drugs fail in clinical trials. John Jumper argues the bottleneck isn't finding molecules that target proteins, but our fundamental lack of understanding of disease causality, like with Alzheimer's, which is a biology problem, not a technology one.
The progress of AI in predicting cancer treatment is stalled not by algorithms, but by the data used to train them. Relying solely on static genetic data is insufficient. The critical missing piece is functional, contextual data showing how patient cells actually respond to drugs.
Current AI for protein engineering relies on small public datasets like the PDB (~10,000 structures), causing models to "hallucinate" or default to known examples. This data bottleneck, orders of magnitude smaller than data used for LLMs, hinders the development of novel therapeutics.
The bottleneck for AI in drug development isn't the sophistication of the models but the absence of large-scale, high-quality biological data sets. Without comprehensive data on how drugs interact within complex human systems, even the best AI models cannot make accurate predictions.
Early AI drug discovery platforms built robust models but often failed to generate relevant outputs. Their lack of deep biological understanding led to flawed data collection and training sets, creating a "garbage in, garbage out" problem where models were disconnected from real-world biology.
AI thrives on learning from the vast, structured data evolution provides for proteins. Molly Gibson explains that small molecules lack this clear "language" or evolutionary history. This fundamental data gap is a primary reason generative AI has been slower to transform small molecule drug discovery compared to biologics.