AI Labs Are Paying Experts Millions Daily to Train Their Replacements in Simulated "RL Gyms"

Related Insights

Mercore’s $10B Valuation Proves Human Expertise Is AI's Most Valuable Fuel

AI startup Mercore's valuation quintupled to $10B by connecting AI labs with domain experts to train models. This reveals that the most critical bottleneck for advanced AI is not just data or compute, but reinforcement learning from highly skilled human feedback, creating a new "RL economy."

#178: OpenAI’s Automated AI Researcher, OpenAI Restructuring, The Fed Warns About AI’s Impact on Hiring, Nvidia Hits $5 Trillion & Wharton Data on AI ROI

The Artificial Intelligence Show·7 months ago

The Emergence of the "AI Trainer" Role for Niche Expertise

To move beyond general knowledge, AI firms are creating a new role: the "AI Trainer." These are not contractors but full-time employees, typically PhDs with deep domain expertise and a computer science interest, tasked with systematically improving model competence in specific fields like physics or mathematics.

Why data is the biggest AI bottleneck (feat. Arthur Mensch of Mistral AI) | E2212

This Week in Startups·6 months ago

Agentic AI Training Requires Simulated 'RL Environments,' Not Just Traditional RLHF

Training AI agents to execute multi-step business workflows demands a new data paradigm. Companies create reinforcement learning (RL) environments—mini world models of business processes—where agents learn by attempting tasks, a more advanced method than simple prompt-completion training (SFT/RLHF).

20VC: Scale, Surge, Turing, Mercor: Who Wins & Who Loses in Data Labelling | Is Revenue in Data Labelling Real or GMV? | Why 99% of Knowledge Work Will Go and What Happens Then? | Why SaaS is Dead in a World of AI with Jonathan Siddharth @ Turing

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch·6 months ago

Simulated RL Environments Are the Next Frontier for Training Capable AI Agents

Beyond supervised fine-tuning (SFT) and human feedback (RLHF), reinforcement learning (RL) in simulated environments is the next evolution. These "playgrounds" teach models to handle messy, multi-step, real-world tasks where current models often fail catastrophically.

The 100-person AI lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Lenny's Podcast: Product | Career | Growth·5 months ago

AI Training Is Shifting from Human Feedback (RLHF) to Expert-Defined AI Feedback (RLAIF)

The frontier of AI training is moving beyond humans ranking model outputs (RLHF). Now, high-skilled experts create detailed success criteria (like rubrics or unit tests), which an AI then uses to provide feedback to the main model at scale, a process called RLAIF.

Why experts writing AI evals is creating the fastest-growing companies in history | Brendan Foody (CEO of Mercor)

Lenny's Podcast: Product | Career | Growth·8 months ago

OpenAI's Mission to Upskill Workers Clashes with Its Goal to Build Human-Replacing AGI

OpenAI is launching initiatives to certify millions of workers for an AI-driven economy. However, their core mission is to build artificial general intelligence (AGI) designed to outperform humans, creating a paradox where they are both the cause of and a proposed solution to job displacement.

#166: OpenAI Jobs Platform, Salesforce AI Job Cuts, White House AI Education Initiative & OpenAI Secondary Sale and Cash Burn

The Artificial Intelligence Show·8 months ago

RL Environment Startups Command Seven-Figure Deals Selling Simulations to AI Labs

A niche, services-heavy market has emerged where startups build bespoke, high-fidelity simulation environments for large AI labs. These deals command at least seven-figure price tags and are critical for training next-generation agentic models, despite the customer base being only a few major labs.

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·7 months ago

Mercore's Rise Signals a New "Reinforcement Learning Economy" for Elite Human Experts

Mercore's $500M revenue in 17 months highlights a shift in AI training. The focus is moving from low-paid data labelers to a marketplace of elite experts like doctors and lawyers providing high-quality, nuanced data. This creates a new, lucrative gig economy for top-tier professionals.

#170: How ChatGPT Is Used at Work, New GDPval Benchmark, AI “Workslop,” ChatGPT Pulse, Meta Vibes & More AI Economy Warnings

The Artificial Intelligence Show·8 months ago

AI Labs Are Automating Their Own Research to Create Compounding Progress

A key strategy for labs like Anthropic is automating AI research itself. By building models that can perform the tasks of AI researchers, they aim to create a feedback loop that dramatically accelerates the pace of innovation.

#172: Sora 2, Claude Sonnet 4.5, ChatGPT Instant Checkout, How OpenAI Uses AI, Grokipedia & Mercor’s AI Productivity Index

The Artificial Intelligence Show·7 months ago

OpenAI's 'Project Mercury' Reveals Its Playbook for Automating High-Skilled Professions

By paying over 100 former Wall Street bankers to train its models on complex financial tasks, OpenAI is creating a template for vertical AI dominance. This 'expert-as-a-contractor' model will be replicated across law, accounting, and consulting to systematically automate lucrative knowledge work sectors.

#176: ChatGPT Atlas, ChatGPT Atlas Security Issues, Letter to Pause Superintelligence, Amazon’s Plan to Automate 600,000 Jobs & New Data on AI Relationships

The Artificial Intelligence Show·7 months ago

Get your free personalized podcast brief

Related Insights