RiffOn - Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research | "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Elicit's co-founders discuss their mission to improve high-stakes reasoning using process supervision, domain-specific languages, and world models.

AI Outputs Should Include a 'Certificate of Reasoning' for Scalable Verification

Instead of supervising an AI's hidden thought process, we can demand it produces a 'certificate of reasoning'—a checkable proof—along with its output. This could include citations or sensitivity analyses, shifting verification from observing the process to checking the provided proof.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Elicit's AI Guarantees Workflow Reliability by Using a Domain-Specific Language for Reasoning

Elicit built a Domain-Specific Language (DSL) defining reasoning primitives as microservices. Frontier models orchestrate these primitives to create structured workflows, ensuring complex processes run exactly as defined and overcoming the inherent unreliability of standard LLMs for high-stakes tasks.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

For High-Stakes Enterprise AI, Verifiable Consistency Is the Key Differentiator

For users in life sciences, an AI tool's value lies not just in its power but its ability to apply the exact same reasoning process consistently over thousands of data points. Elicit guarantees the 9,999th item is analyzed identically to the 5th, providing trust at scale.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

AI Can Evaluate Research Quality from First Principles, Surpassing Flawed Metrics

Humans rely on lossy proxies like journal prestige and citation counts to judge research. AI enables a shift to evaluating the work's content directly—methodology, sample size, and logical coherence—for a more accurate assessment of evidence quality tailored to a specific question.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

The Future of Human Work Will Shift from Content Generation to AI Output Evaluation

As AI masters content generation, it will handle the "blank page" problem. The crucial human task will then shift from creation to evaluation: defining what 'good' looks like, identifying AI failure modes, and building better verification systems to ensure outputs are trustworthy and useful.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Efficient AI Systems Use an Orchestrator Agent to Dispatch Tasks to Cheaper, Specialized Models

To manage costs, the optimal architecture isn't running everything on the most powerful model. Instead, a smart orchestrator agent should break down complex problems and dispatch simpler sub-tasks to smaller, cheaper models, optimizing for both cost and performance.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Discretized Reasoning Steps in AI Provide Critical Error Correction

The benefit of discrete reasoning (like generating tokens or tool calls) over a continuous 'neuralese' is error correction, analogous to why digital computing beat analog. A slightly wrong token can be 'rounded' to the correct one, preventing the compounding errors that would plague a purely continuous process.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Current LLMs Lack a Coherent World Model, Making Them Too Unstable for Reliable Decisions

Unlike a human expert, an LLM's probability estimates and conclusions can be drastically altered by simple rephrasing or irrelevant suggestions. This instability shows they are too easily "pushed around" and lack the coherent world model necessary for trustworthy, high-stakes decision support.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Legible Continual AI Learning Is Best Achieved Through External 'World Models'

Instead of relying on opaque model weights, continual learning is more reliably achieved by having AI build explicit, external 'world models' like knowledge graphs. This approach makes the model's understanding inspectable and correctable by humans, enabling more robust causal analysis.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Truth-Seeking in AI May Create a Positive Feedback Loop for Better Reasoning

If the AI community prioritizes truth-seeking over persuasive-sounding outputs, it could create a virtuous cycle. A more truth-seeking AI would better identify the most important interventions to improve its own reasoning, leading to a feedback loop that rapidly enhances epistemic quality.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Elicit's Automated Engineer 'The Line' Ships 30-50 Code Changes Weekly From Slack Reactions

Elicit's system, 'The Line,' automates the full software development lifecycle. It takes feature requests initiated by a Slack emoji, then handles speccing, implementation, video-based testing, code review, and merging to production, calling for human intervention only when necessary.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

LLMs Fail Complex Tasks Because They're Trained on Final Answers, Not Reasoning Steps

When asked to analyze 100 papers, LLMs often admit they didn't complete the task. This failure stems from outcome-based training, which prioritizes a plausible-looking final output over correctly following the required process, revealing a fundamental flaw in current training paradigms.

Radically Better Reasoning: Elicit's Andreas Stuhlmüller & Jungwon Byun on World Models for Research

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 days ago

Get your free personalized podcast brief

Get your free personalized podcast brief