Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Today's "virtual cell" models represent training data well but cannot predict outcomes for novel interventions. The next frontier is building models that generalize to serve as true predictive oracles for experiments that haven't yet been performed, a key focus for BioHub.

Related Insights

Future progress in biology requires moving beyond static models. The new paradigm involves an AI that reasons over hypotheses, prioritizes experiments, learns from the empirical outcomes, and updates its internal world model. This creates a scalable, closed-loop system for scientific discovery.

To create a predictive "virtual cell," data collection must shift from passive observation to active intervention. The strategy is to massively scale perturbation experiments (like Perturb-seq) across countless contexts and measure multi-modal responses, teaching the model cause and effect.

Instead of pursuing a purely academic goal of simulating every biochemical process, Noetik's "virtual cell" models are practical tools. They focus on understanding cell biology through heuristics that are useful for making drugs, like predicting a cell's transcriptome or protein expression in a specific context.

AI models trained on descriptive data (e.g., RNA-seq) can classify cell states but fail to predict how to transition a diseased cell to a healthy one. True progress requires generating massive "causal" datasets that show the effects of specific genetic perturbations.

The primary obstacle to creating sophisticated AI models of cells isn't the AI itself, but the data. Existing datasets often perturb only one cellular variable at a time, failing to capture the complex interactions that arise from simultaneous changes. New platforms are needed to generate this multi-dimensional data.

Drawing an analogy from neuroscience, Noetik argues for a top-down modeling approach. Instead of building a perfect simulation of a single cell and scaling up, they model the functional interactions at the tissue level first. This abstraction is more likely to predict patient-level outcomes, which is the ultimate goal.

It's impossible to generate human data at the scale of in silico experiments. The key is to create highly accurate simulations of human physiology (digital twins) and then validate their predictions with limited, strategic human data. If the model proves reliable, it could drastically accelerate R&D.

The next frontier in preclinical research involves feeding multi-omics and spatial data from complex 3D cell models into AI algorithms. This synergy will enable a crucial shift from merely observing biological phenomena to accurately predicting therapeutic outcomes and patient responses.

The most fundamental challenge in AI today is not scale or architecture, but the fact that models generalize dramatically worse than humans. Solving this sample efficiency and robustness problem is the true key to unlocking the next level of AI capabilities and real-world impact.

While petabytes of observational DNA sequence data exist, it's insufficient for the next wave of AI. The key to creating powerful, functional models is generating causal data—from experiments that systematically test function—which is a current data bottleneck.