/
© 2026 RiffOn. All rights reserved.
  1. Latent Space: The AI Engineer Podcast
  2. Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2
Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast · Jan 23, 2026

Yi Tay on captaining IMO Gold with Gemini, the power of on-policy RL, and why Transformers are here to stay. Insights from the AI frontier.

RecSys and IR Research Feels Like Modeling in a World with Different Laws of Physics

Research in Recommendation Systems (RecSys) and Information Retrieval (IR) is described as uniquely unintuitive. The feedback from the modeling environment feels "rude" and disconnected from actions, as if the fundamental principles of cause and effect that apply in other ML domains are absent.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Elite AI Labs Use "AGI" in Team Names for Vibes and Signaling, Not Concrete Roadmaps

Naming AI research teams with terms like "AGI" is more about signaling a long-term "north star" and creating "vibes" to attract ambitious talent, rather than reflecting a concrete, step-by-step plan to achieve artificial general intelligence.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

The Transformer Architecture Will Likely Persist to AGI Due to a Decade of Ecosystem Investment

Despite its age, the Transformer architecture is likely here to stay on the path to AGI. A massive ecosystem of optimizers, hardware, and techniques has been built around it, creating a powerful "local minimum" that makes it more practical to iterate on Transformers than to replace them entirely.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Humans Should Increase Their 'Learning Rate' Drastically When Core Beliefs Are Falsified

Applying the machine learning concept of a "learning rate" to human cognition suggests that when a core assumption is proven wrong by a single counterexample, one should radically increase their learning rate and question all related beliefs, rather than making a small, incremental update.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Rejoining Google Is Like Continuing a Saved Game, With Infrastructure and Identity Preserved

Returning to a large tech company like Google after a period away is akin to resuming a saved video game. Your digital identity, username, and access to the vast internal infrastructure remain intact, allowing for a remarkably seamless re-integration despite significant organizational changes.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

AI Researchers Can Rapidly Innovate in New Fields by Relying on Universal Research Skills

Deep expertise in one AI sub-field, like model architectures, isn't a prerequisite for innovating in another, such as Reinforcement Learning. Fundamental research skills are universal and transferable, allowing experienced researchers to quickly contribute to new domains even with minimal background knowledge.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

AI Progress Still Relies on Game-Changing Ideas, Not Just Blind Scaling

Contrary to the "bitter lesson" narrative that scale is all that matters, novel ideas remain a critical driver of AI progress. The field is not yet experiencing diminishing returns on new concepts; game-changing ideas are still being invented and are essential for making scaling effective in the first place.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

AI Coding Assistants Now Autonomously Debug Complex Job Failures for Expert Researchers

AI coding tools have surpassed simple assistance. Expert ML researchers now delegate debugging entirely, feeding an error log to the model and trusting its proposed fix without inspection. This signifies a shift towards AI as an autonomous problem-solver, not just a helper.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Google's IMO Gold Win Required Abandoning a Specialized System for a Single End-to-End LLM

A key decision behind Google DeepMind's IMO Gold medal was abandoning their successful specialized system (AlphaGeometry) for an end-to-end LLM. This reflects a core AGI philosophy: a truly general model must solve complex problems without needing separate, specialized tools.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Modern LLMs Let Researchers Achieve Breakthroughs in Fields like Advanced Math Without Domain Expertise

A remarkable feature of the current LLM era is that AI researchers can contribute to solving grand challenges in highly specialized domains, such as winning an IMO Gold medal, without possessing deep personal knowledge of that field. The model acts as a universal tool that transcends the operator's expertise.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Google's IMO Gold Model Was Trained via an Ad-Hoc 24/7 Collaboration Across Global Time Zones

The model training for the IMO Gold medal was a fluid, hackathon-like effort among four "captains" in London, Mountain View, and Singapore. The team operated with a highly ad-hoc workflow, passing responsibilities across continents as individuals traveled, ensuring continuous progress.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

Top AI Labs Proactively Recruit Talent Who Independently Demonstrate Strong Research Taste

Getting hired at a premier AI lab like Google DeepMind often bypasses traditional applications. Top researchers actively scout and directly contact individuals who produce work that demonstrates excellent "research taste." The key is to independently identify and pursue fruitful research directions, signaling an innate ability to innovate.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago

On-Policy RL Mirrors Human Learning by Rewarding Self-Generated Actions, Unlike Imitative Off-Policy Methods

On-policy reinforcement learning, where a model learns from its own generated actions and their consequences, is analogous to how humans learn from direct experience and mistakes. This contrasts with off-policy methods like supervised fine-tuning (SFT), which resemble simply imitating others' successful paths.

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2 thumbnail

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay 2

Latent Space: The AI Engineer Podcast·a month ago