ProPhet's strategy is to focus on 'hard-to-drug' proteins, which are often avoided because they lack the structural data required for traditional discovery. Because ProPhet's AI model needs very little protein information to predict interactions, this data scarcity becomes a competitive advantage.
Recursion's CEO outlines a two-pronged pipeline strategy. The first prong uses phenomics to uncover novel biological insights for new targets, like their FAP program. The second uses their AI-driven small molecule design platform to improve the therapeutic index for known but historically 'hard-to-drug' targets, like CDK7. This balanced portfolio approach de-risks development by leveraging different strengths of their end-to-end platform.
Unlike traditional methods that simulate physical interactions like a key in a lock, ProPhet's AI learns the fundamental patterns governing why certain molecules and proteins interact. This allows for prediction without needing slow, expensive, and often impossible physical or computational simulations.
Instead of building from scratch, ProPhet leverages existing transformer models to create unique mathematical 'languages' for proteins and molecules. Their core innovation is an additional model that translates between them, creating a unified space to predict interactions at scale.
ProPhet's CEO notes his conviction in AI wasn't a sudden breakthrough. Instead, it was a growing understanding that machine learning's ability to handle noisy, incomplete data at scale directly solves the primary bottlenecks of traditional pharmaceutical research.
To break the data bottleneck in AI protein engineering, companies now generate massive synthetic datasets. By creating novel "synthetic epitopes" and measuring their binding, they can produce thousands of validated positive and negative training examples in a single experiment, massively accelerating model development.
The primary barrier to AI in drug discovery is the lack of large, high-quality training datasets. The emergence of federated learning platforms, which protect raw data while collectively training models, is a critical and undersung development for advancing the field.
Current AI for protein engineering relies on small public datasets like the PDB (~10,000 structures), causing models to "hallucinate" or default to known examples. This data bottleneck, orders of magnitude smaller than data used for LLMs, hinders the development of novel therapeutics.
The bottleneck for AI in drug development isn't the sophistication of the models but the absence of large-scale, high-quality biological data sets. Without comprehensive data on how drugs interact within complex human systems, even the best AI models cannot make accurate predictions.
Profluent CEO Ali Madani frames the history of medicine (like penicillin) as one of random discovery—finding useful molecules in nature. His company uses AI language models to move beyond this "caveman-like" approach. By designing novel proteins from scratch, they are shifting the paradigm from finding a needle in a haystack to engineering the exact needle required.
ProPhet uses its AI not just for efficacy (finding a molecule for a target protein) but also for safety. By reversing the query—taking a promising molecule and asking which other proteins it might bind to—they can identify potential off-target interactions, a primary source of toxicity.