/
© 2026 RiffOn. All rights reserved.
  1. Latent Space: The AI Engineer Podcast
  2. Why Fine-Tuning Lost and RL Won
Why Fine-Tuning Lost and RL Won

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast · Oct 16, 2025

OpenPipe founder Kyle discusses his journey from YC to a CoreWeave acquisition, covering the evolution of fine-tuning, RL, LORAs, and GRPO.

Reproducible Sandbox Environments Are RL's Biggest Bottleneck, Not Algorithms

Algorithms like GRPO are powerful but require parallel rollouts in a reproducible environment. Building and maintaining these high-fidelity sandboxes, complete with realistic data and failure modes, is the hardest part of implementing RL today and a significant barrier for most companies.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

LORAs Became Unpopular with Fine-Tuning's Decline, Despite Superior Inference Economics

The perception of LORAs as a lesser fine-tuning method is a marketing problem. Technically, for task-specific customization, they provide massive operational upside at inference time by allowing multiplexing on a single GPU and enabling per-token pricing models, a benefit often overlooked.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

OpenPipe Grew to $1M ARR by Distilling Expensive GPT-4 Workflows

OpenPipe's initial value was clear: GPT-4 was powerful but prohibitively expensive for production. They offered a managed flow to distill expensive workflows into cheaper, smaller models, resonating with early customers facing massive OpenAI bills and helping them reach $1M ARR in eight months.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

Fine-Tuning Startups Are Squeezed by Lower Model Prices, Not by GPU Providers

OpenPipe's founder felt pressure from frontier labs continually lowering token prices, which eroded their value prop. However, competition from GPU providers never materialized because their fine-tuning services were too difficult to use, highlighting the persistent value of good developer experience.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

Continual Learning Can Unlock 90% of AI Projects Stuck in Proof-of-Concept

Many AI projects fail to reach production because of reliability issues. The vision for continual learning is to deploy agents that are 'good enough,' then use RL to correct behavior based on real-world errors, much like training a human. This solves the final-mile reliability problem and could unlock a vast market.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

RL Environment Startups Command Seven-Figure Deals Selling Simulations to AI Labs

A niche, services-heavy market has emerged where startups build bespoke, high-fidelity simulation environments for large AI labs. These deals command at least seven-figure price tags and are critical for training next-generation agentic models, despite the customer base being only a few major labs.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

LLM-as-Judge Stack Ranking Solves the RL Reward Problem for GRPO

OpenPipe's 'Ruler' library leverages a key insight: GRPO only needs relative rankings, not absolute scores. By having an LLM judge stack-rank a group of agent runs, one can generate effective rewards. This approach works phenomenally well, even with weaker judge models, effectively solving the reward assignment problem.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

Prompt Optimizer JEPA Failed to Outperform RL Fine-Tuning in OpenPipe's Tests

While prompt optimization is theoretically appealing, OpenPipe's team evaluated methods like JEPA and found they provided only minor boosts. Their RL fine-tuning methods delivered vastly superior results (96% vs 56% on a benchmark), suggesting weight updates still trump prompt engineering for complex tasks.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago

Fine-Tuning's Best ROI is for Latency-Critical Apps Forced Onto Smaller Models

The primary driver for fine-tuning isn't cost but necessity. When applications like real-time voice demand low latency, developers are forced to use smaller models. These models often lack quality for specific tasks, making fine-tuning a necessary step to achieve production-level performance.

Why Fine-Tuning Lost and RL Won thumbnail

Why Fine-Tuning Lost and RL Won

Latent Space: The AI Engineer Podcast·4 months ago