Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

The current practice of training AI with human feedback (RLHF) restricts its potential by forcing it to conform to human norms and biases. True breakthroughs, like AlphaGo's winning move, happen when AI operates beyond the confines of human culture and reason.

Related Insights

The fear that AI homogenizes culture is countered by the game of Go. After AlphaGo's 2016 victory, human decision quality surged. Players learned from the AI and began developing novel moves distinct from both prior human strategies and the AI's own plays, ultimately improving the overall level of human skill.

Reinforcement learning incentivizes AIs to find the right answer, not just mimic human text. This leads to them developing their own internal "dialect" for reasoning—a chain of thought that is effective but increasingly incomprehensible and alien to human observers.

An attempt to teach AI 'scientific taste' using RLHF on hypotheses failed because human raters prioritized superficial qualities like tone and feasibility over a hypothesis's potential world-changing impact. This suggests a need for feedback tied to downstream outcomes, not just human preference.

Hands-on AI model training shows that AI is not an objective engine; it's a reflection of its trainer. If the training data or prompts are narrow, the AI will also be narrow, failing to generalize. This process reveals that the model is "only as deep as I tell it to be," highlighting the human's responsibility.

The two greatest AI achievements are generative AI (mimicking human knowledge) and deep reinforcement learning (discovering superhuman strategies). The grand challenge, and the future of AI, is to fuse these two threads into a single system that can both leverage existing knowledge and innovate beyond it.

AlphaGo's infamous 'Move 37' was a play no human expert would have made, initially dismissed as an error. Its eventual success demonstrated that AI can discover novel, superior strategies beyond the existing corpus of human knowledge, fundamentally expanding a field of study rather than just mastering it.

By removing all human game data and learning only from self-play, AlphaZero first rediscovered human strategies and then discarded them for superior, 'alien' ones. This showed that relying solely on human data can limit an AI's potential, anchoring it to existing knowledge and cognitive biases.

Human intelligence is fundamentally shaped by tight constraints: limited lifespan, brain size, and slow communication. AI systems are free from these limits—they can train on millennia of data and scale compute as needed. This core difference ensures AI will evolve into a form of intelligence that is powerful but alien to our own.

Human intelligence is shaped by limitations like a finite lifespan and small brain, forcing efficient learning from sparse data. AI lacks these constraints, learning from lifetimes of data with massive compute. This fundamental difference means AI will naturally evolve into a distinct, non-human form of intelligence unless we explicitly engineer human-like biases into it.

A key, underappreciated advantage of AI is its potential for systematic context-switching. Unlike humans who get stuck in a single line of reasoning, AI systems can be programmed to simultaneously pursue contradictory goals (e.g., proving and disproving a theorem) or be given different starting biases, allowing them to escape cognitive ruts and explore a problem space more thoroughly.