Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Contrary to the "data is the new oil" axiom, historical oncology data has a short shelf-life. The continuous evolution of treatments and data-generation technologies means recent, contextual data is far more valuable for training AI models than large, outdated archives.

Related Insights

The biggest limitation in precision medicine is the systemic failure to capture and learn from longitudinal data on how patients respond to treatments over time. Without this critical feedback loop, even the most sophisticated diagnostic models will fall short of their potential to improve care.

We possess millions of data points on interventions, but they are useless to AI models because they're trapped in thousands of disparate EMRs in varied formats. The challenge is not generating more data, but solving the human incentive and alignment problems required to create unified data registries.

The bottleneck for AI in drug discovery is not the algorithm but the lack of high-quality, large-scale biological data. New platforms are needed to generate this necessary "substrate" for AI models to learn from, challenging the narrative that better models alone are the solution.

A key pillar of human-centric AI is ensuring data is "future-proof." Because models are trained on historical data, they can quickly become irrelevant or harmful as market conditions change. This requires a proactive strategy to prevent model decay, not just reactive fixes after failures occur.

AI models trained on descriptive data (e.g., RNA-seq) can classify cell states but fail to predict how to transition a diseased cell to a healthy one. True progress requires generating massive "causal" datasets that show the effects of specific genetic perturbations.

The progress of AI in predicting cancer treatment is stalled not by algorithms, but by the data used to train them. Relying solely on static genetic data is insufficient. The critical missing piece is functional, contextual data showing how patient cells actually respond to drugs.

The pharmaceutical industry risks repeating Kodak's failure of inventing but ignoring a disruptive technology. For Kodak, it was digital photography; for pharma, it's AI. The industry possesses vast amounts of data (the new 'film'), but the real danger lies in failing to embrace the AI-driven intelligence layer that can interpret and act on it.

Demonstrating extreme conviction, Noetik invested a year and a half in lab setup, tumor sourcing, and data processing before having a dataset large enough to train its first models. This highlights the immense upfront investment and risk required for a data-first approach in bio-AI, where no off-the-shelf data exists.

To truly understand biological systems, data scale is less important than data quality. The most informative data comes from capturing the dynamic interactions of a system *while* it's being perturbed (e.g., by a drug), not from static snapshots of a system at rest.

Frontier AI models excel in medicine less because of their encyclopedic knowledge and more because of their ability to integrate huge amounts of context. They can synthesize a patient's entire medical history with the latest research—a task difficult for any single human. This highlights that the key to unlocking AI's value is feeding it comprehensive data, as context is the primary driver of superhuman performance.