Yoshua Bengio's 'Scientist AI' Prioritizes Truth-Telling Over Imitating Human Text

Related Insights

AIs Can Learn the 'Syntax of Truth' From Math and Code, Then Generalize to Social Domains

The 'Scientist AI' doesn't require a universal database of facts. It only needs a small set of unimpeachable data, like mathematical proofs, to learn the syntactic difference between a factual claim and a communication act. It can then generalize this concept of 'truthfulness' to more ambiguous domains.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Safe AI Must Be Programmed to Value Truth, Beauty, and Curiosity Above All

Elon Musk argues that the key to AI safety isn't complex rules, but embedding core values. Forcing an AI to believe falsehoods can make it 'go insane' and lead to dangerous outcomes, as it tries to reconcile contradictions with reality.

Elon Musk: A Different Conversation | Full Episode | People by WTF Ep. 16

People by WTF·5 months ago

Yoshua Bengio Calls Reinforcement Learning 'Evil' for Building Superintelligence

Bengio argues that training AIs via reinforcement learning (RL) to achieve goals in the world is inherently dangerous. It inevitably leads to instrumental goals and reward hacking, creating systems with unintended drives. His 'Scientist AI' approach is designed to build agents without using RL.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Modern AI Training Is Not Just Next-Token Prediction Anymore

The argument that LLMs are just "stochastic parrots" is outdated. Current frontier models are trained via Reinforcement Learning, where the signal is not "did you predict the right token?" but "did you get the right answer?" This is based on complex, often qualitative criteria, pushing models beyond simple statistical correlation.

Success without Dignity? Nathan finds Hope Amidst Chaos, from The Intelligence Horizon Podcast

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·a month ago

Trusted AI Requires Hybrid Architectures Fusing Prediction with Causal Reasoning

Purely sequence-based prediction models, while powerful, have fundamental limitations in understanding causality. Achieving robust, trustworthy AI will likely require a hybrid approach that integrates current transformer architectures with symbolic systems, world models, and dedicated causal reasoning components.

AI: Smart/Stupid

Running Through Walls·a month ago

Training AIs on Data Tagged as 'Communication Acts' vs. 'Facts' Is Key to Honesty

Bengio's method involves a crucial data preprocessing step: syntactically tagging text as either a 'communication act' (e.g., 'someone said X') or a 'verified factual claim.' This distinction allows the AI to learn the difference between what people say and what is true about the world.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Existing LLMs Can Be Finetuned With Bengio's Method for a Cheaper Path to Safer AI

To get started without the massive cost of training from scratch, Bengio suggests finetuning existing models using his 'Scientist AI' objective. While this forgoes full mathematical guarantees, it offers a pragmatic, low-cost way to empirically improve a model's honesty and demonstrate the approach's value.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Bengio's 'Scientist AI' Can Be Safely Converted From a Passive Oracle Into a Capable Agent

The non-agentic 'Scientist AI' predictor can be made into an agent by adding 'scaffolding' that asks it questions about the likely outcomes of potential actions. This method creates capable agents while retaining the core model's honesty and safety properties, avoiding the pitfalls of standard reinforcement learning.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Bengio's Safety-Focused 'Scientist AI' Could Outperform LLMs by Learning Causal Reasoning

Bengio argues his 'Scientist AI' might actually be more capable, not less. By being trained to find the underlying causal structure of the world, it should generalize better to new situations than current models, which primarily learn correlations. This could provide a commercial advantage, not just a safety one.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

AI Pre-training on Human Text Inherits Dangerous Drives Like Self-Preservation

Yoshua Bengio argues the initial pre-training phase, where models predict text, is a primary source of misalignment. By imitating human data, AIs inherit implicit goals like self-preservation and even 'peer preservation' (protecting other AIs), creating risks before any explicit agentic training occurs.

I Know How to Build Safe Superintelligence | Yoshua Bengio, the most-cited AI researcher

80,000 Hours Podcast·2 days ago

Get your free personalized podcast brief

Related Insights