RiffOn - 🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Noetik trains transformers on massive, proprietary multimodal data to solve the 95% cancer trial failure rate by improving patient selection.

Noetik Argues Intentional Data Generation Trumps Brute-Force Collection in Biology AI

Unlike general AI which leverages vast, existing datasets, Noetik believes progress in biology requires designing and generating specific, high-quality data with foresight into the models that will be trained. They compare this to the intentional, decades-long creation of the PDB dataset for protein folding.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Bets Cancer Drugs Fail from Poor Patient Selection, Not Flawed Pharmacology

Noetik's core thesis is that the 95% failure rate in cancer trials isn't due to bad drug design, but an inability to identify the correct patient sub-population. Their models aim to solve this patient selection problem from the outset, rescuing potentially effective drugs.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Models Patient Heterogeneity Top-Down, Sidestepping Complex Cell Simulation

Drawing an analogy from neuroscience, Noetik argues for a top-down modeling approach. Instead of building a perfect simulation of a single cell and scaling up, they model the functional interactions at the tissue level first. This abstraction is more likely to predict patient-level outcomes, which is the ultimate goal.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Defines "Virtual Cell" Practically to Solve Drug Discovery, Not Perfectly Simulate Biology

Instead of pursuing a purely academic goal of simulating every biochemical process, Noetik's "virtual cell" models are practical tools. They focus on understanding cell biology through heuristics that are useful for making drugs, like predicting a cell's transcriptome or protein expression in a specific context.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Preclinical Research Relies on "Frankensteinian" Cell Lines Unrelated to Human Biology

A major cause of clinical trial failure is that preclinical testing uses immortalized cancer cell lines cultured for decades. These cells have abnormal genomes and gene expressions that don't represent actual tumors, creating a massive translational gap that Noetik's patient-derived data aims to solve.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik’s GSK Deal Pioneers a Foundation Model Licensing Business Model in Pharma

Noetik's $50M deal with GSK licenses their OctoVC foundation model, not a drug candidate or a collaborative project. This shifts the business model from bespoke services to a scalable software-like approach, allowing pharma partners to use the model across their entire pipeline and even fine-tune it on proprietary data.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik "Humanizes" Mouse Models In Silico for Better Translational Research

To bridge the gap between animal models and human trials, Noetik trains models on its human data and then runs inference on mouse histology (H&E) images. This allows them to predict human-relevant biology and gene expression directly from the mouse model, overcoming a key translational hurdle in drug development.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Randomizes Patient Samples Across Slides to Control for Experimental Batch Effects

To mitigate data variations caused by running experiments on different days (batch effects), Noetik employs a sophisticated arraying strategy. They take dozens of samples from a single tumor and distribute them across multiple, randomized arrays, ensuring each patient is represented in different batches for robust calibration and model training.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Uses Standard H&E Pathology Images for Powerful, Low-Cost Clinical Inference

While Noetik's models are trained on complex, multimodal data like spatial transcriptomics, they are designed to run inference using only standard, ubiquitous H&E pathology slides. This creates a highly scalable and practical path to a clinical diagnostic without requiring expensive, novel assays for every patient.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Found Autoregressive Models Excel on Biological Data Only at Longer Context Lengths

When developing their Tario transformer model, Noetik discovered a key scaling behavior: larger, autoregressive models only outperform smaller ones when given a longer context window (i.e., seeing more tissue at once). This suggests that capturing broader spatial relationships is critical for learning complex biological patterns.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Noetik Spent 18 Months Generating Data Before Training Its First Foundational Model

Demonstrating extreme conviction, Noetik invested a year and a half in lab setup, tumor sourcing, and data processing before having a dataset large enough to train its first models. This highlights the immense upfront investment and risk required for a data-first approach in bio-AI, where no off-the-shelf data exists.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

Latent Space: The AI Engineer Podcast·2 months ago

Get your free personalized podcast brief

Get your free personalized podcast brief