RiffOn - What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

Google DeepMind's Head of AGI Safety on why prosaic alignment will likely work, the limits of public commitments, and a gradual intelligence explosion.

To Influence AI Labs, External Researchers Must Offer Concrete Solutions or Evals

Abstract theory from outside an AI lab is unlikely to be adopted due to immense internal implementation constraints. To be useful, external research must provide a concrete solution, a new evaluation, or a clear metric that can be easily integrated into a complex, fragile development pipeline.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

GPU Architecture Gives Chain-of-Thought Monitoring Years of Viability

DeepMind's Rohin Shah argues that Transformer models, optimized for parallel processing on GPUs, have low "opaque serial depth." They *must* write down their reasoning steps to their chain-of-thought scratchpad to solve complex serial tasks, making them monitorable. He predicts this will hold for 4-5 years.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

An AI Intelligence Explosion Starts with Cost-Parity, Not Superhuman Skill

Rohin Shah predicts a gradual, not abrupt, start to an intelligence explosion. It will be triggered when automated AI R&D becomes cheaper than human researchers, not when it's vastly more capable. The first automated researchers might be less insightful but use massive, expensive compute to brute-force problems.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Google DeepMind's AGI Safety Team Now Prioritizes Implementation over Pure Research

The primary need on Google DeepMind's AGI safety team has shifted from generating novel research ideas to implementation. The team is hiring for people with strong software engineering skills who can "do the obvious thing and land it" within the company's complex infrastructure.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

AI Labs Should Avoid Firm Safety Commitments as Research Evolves

Rohin Shah argues against AI companies making fixed safety commitments. The best practices for safety research change rapidly; a commitment made today (e.g., including alignment data in pre-training) could be considered harmful in the future, making flexibility crucial.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Accelerating Governance, Not Technical Safety, is the Real AI Bottleneck

Prosaic AI alignment research is similar enough to capabilities research that it will likely accelerate in tandem during an intelligence explosion. The real danger is that governance—which requires different skills and societal buy-in—won't keep pace, as policymakers may be unwilling to automate their own work with AI.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Comprehensive Benchmarking Shows AI Progress is Linear, Not Accelerating

Despite perceptions of rapid acceleration, a large-scale analysis by Google DeepMind and EPOC that stitches together many benchmarks over time shows that general AI capability progress has been remarkably linear. This suggests AI is currently a better tool, not an expanding population of researchers.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Public Commitments from AI Companies Are Largely Ineffective Signals

External pressure for AI companies to make public commitments is misguided because companies can and will back out of them if they become inconvenient or outdated. Rohin Shah points to Anthropic's Responsible Scaling Policy as an example where strong "commitment" language was later weakened.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Frontier Safety Reports Are Formal Safety Declarations, Not Replicable Research

A report like Google's Frontier Safety Report serves a specific purpose: to formally declare that the company has determined a model is safe to release. It is not designed to provide the level of detail needed for external actors to replicate or deeply scrutinize the evaluations; that's the role of academic papers.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Internally Deployed AGIs Must Be Treated as Untrusted Insider Threats

A key part of Google DeepMind's safety plan is to treat powerful, internally-used AI systems as potential untrusted insiders. This means building infrastructure that gives AIs separate identities, forces them to request permissions individually with justifications, and monitors their actions for suspicious behavior.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Focusing on Pre-Deployment Evals Incentivizes Speed Over Safety Quality

Requiring extensive evaluations right before a model launch creates strong incentives to make them as fast as possible, not as thorough. Shah argues progress is continuous, so a safety buffer based on the previous model is often sufficient, and the bigger risk is from internal, not external, deployment.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Model AI Companies as Apathetic, Not Adversarial, to Influence Them

Changing one component of a frontier model (like safety) can break dozens of other fragile constraints (e.g., inference speed). Companies can only implement a few changes at a time. Therefore, external actors should model them as resource-constrained and apathetic, not actively malicious, for effective advocacy.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Myopic Step-by-Step Optimization Prevents Undetectable AI Reward Hacking

A technique called "myopic optimization" can prevent complex, multi-step reward hacking. By training an AI to optimize each action locally without seeing future rewards, it removes the incentive for schemes that pay off later, even if an overseer couldn't spot the deception.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Catastrophic AI Misalignment is Plausible But Not a Default Outcome

Rohin Shah, head of AGI safety at DeepMind, believes existing arguments for catastrophic misalignment are only suggestive, not compelling. While sufficient to warrant significant safety work, he sees major holes in arguments that it's the likely or default outcome of AGI development.

What it's really like to run AGI safety at Google DeepMind (and where I disagree with 'doomers') | Rohin Shah

80,000 Hours Podcast·a month ago

Get your free personalized podcast brief

Get your free personalized podcast brief