Companies like OpenAI and Anthropic are spending billions creating simulated enterprise apps (RL gyms) where human experts train AI models on complex tasks. This has created a new, rapidly growing "AI trainer" job category, but its ultimate purpose is to automate those same expert roles.
AI startup Mercore's valuation quintupled to $10B by connecting AI labs with domain experts to train models. This reveals that the most critical bottleneck for advanced AI is not just data or compute, but reinforcement learning from highly skilled human feedback, creating a new "RL economy."
To move beyond general knowledge, AI firms are creating a new role: the "AI Trainer." These are not contractors but full-time employees, typically PhDs with deep domain expertise and a computer science interest, tasked with systematically improving model competence in specific fields like physics or mathematics.
Training AI agents to execute multi-step business workflows demands a new data paradigm. Companies create reinforcement learning (RL) environments—mini world models of business processes—where agents learn by attempting tasks, a more advanced method than simple prompt-completion training (SFT/RLHF).
Beyond supervised fine-tuning (SFT) and human feedback (RLHF), reinforcement learning (RL) in simulated environments is the next evolution. These "playgrounds" teach models to handle messy, multi-step, real-world tasks where current models often fail catastrophically.
The frontier of AI training is moving beyond humans ranking model outputs (RLHF). Now, high-skilled experts create detailed success criteria (like rubrics or unit tests), which an AI then uses to provide feedback to the main model at scale, a process called RLAIF.
OpenAI is launching initiatives to certify millions of workers for an AI-driven economy. However, their core mission is to build artificial general intelligence (AGI) designed to outperform humans, creating a paradox where they are both the cause of and a proposed solution to job displacement.
A niche, services-heavy market has emerged where startups build bespoke, high-fidelity simulation environments for large AI labs. These deals command at least seven-figure price tags and are critical for training next-generation agentic models, despite the customer base being only a few major labs.
Mercore's $500M revenue in 17 months highlights a shift in AI training. The focus is moving from low-paid data labelers to a marketplace of elite experts like doctors and lawyers providing high-quality, nuanced data. This creates a new, lucrative gig economy for top-tier professionals.
A key strategy for labs like Anthropic is automating AI research itself. By building models that can perform the tasks of AI researchers, they aim to create a feedback loop that dramatically accelerates the pace of innovation.
By paying over 100 former Wall Street bankers to train its models on complex financial tasks, OpenAI is creating a template for vertical AI dominance. This 'expert-as-a-contractor' model will be replicated across law, accounting, and consulting to systematically automate lucrative knowledge work sectors.