RiffOn - Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud | No Priors: Artificial Intelligence | Technology

Baseten CEO Tuhin Srivastava on the AI inference crunch, why custom models dominate, and navigating 30x growth in a supply-constrained world.

AI Inference Is the Ultimate End Market, Persisting Even in an AGI World

The demand for AI inference is insatiable. As models become cheaper and more efficient, developers and businesses find more ways to embed intelligence, creating a perpetually growing market. Even with AGI, the core need will be running inference.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

AI-Native Startups, Not Enterprises, Drive 99% of Current AI Inference Volume

Despite the hype around enterprise AI, the vast majority of current inference workloads are driven by new, AI-native application companies. This indicates that the broader enterprise adoption market is still in its infancy, representing a massive future growth opportunity.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Serve AI Startups Selling to Enterprises to Indirectly Capture Enterprise Requirements

Instead of selling directly to enterprises initially, AI infrastructure companies can learn enterprise needs by proxy. By serving fast-moving AI startups who sell to the enterprise, they receive a "translation" of requirements for data retention, latency, and transparency, preparing them for that market.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Over 95% of Production Open Source LLMs Are Custom-Modified, Not Vanilla

At scale, companies rarely deploy open-source models "off the shelf." Instead, virtually all production workloads involve custom modifications. This can be post-training with proprietary data to improve quality or compiling and quantizing the model to enhance performance and reduce cost.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

AI Application Moats Are Built on Proprietary User Signal, Not Foundational Models

Startups can compete with large AI labs by capturing unique user interaction data from specialized workflows. This proprietary "user signal" enables post-training of models for specific tasks, creating a defensible advantage that labs, lacking that specific context, cannot easily replicate.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Avoid Custom Model Training Until After Achieving Product-Market Fit

The first step for an AI startup is to prove value using the best off-the-shelf models, even if they are expensive. Investing in custom models and post-training is a form of optimization that should only happen after product-market fit is established and there is a clear user signal to optimize for.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Software Makes AI Inference Sticky; Raw GPU Access Is a Commodity Business

Providing GPUs-as-a-Service is not a durable business because customers can easily switch providers. The key to customer retention and high net dollar retention (NDR) is the software layer built on top of the hardware. This software, which handles the complexities of inference, creates the actual stickiness.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

The AI Compute Crunch is Also an Operational Crisis, Not Just a GPU Shortage

The widely discussed GPU supply crunch is only half the problem. There's a severe shortage of suppliers who can operate data centers with the high reliability and SLAs required for mission-critical inference. Out of many providers, only a handful meet the "gold tier" for operational excellence.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Massive AI Scale Exposes Classic Systems Problems, Not Just Novel LLM Issues

When running AI inference at extreme scale, the most surprising and difficult challenges are often not unique to LLMs. Instead, they are classic distributed systems problems—like kernel panics caused by logging overload—that only manifest under immense load. The immaturity of runtimes exacerbates these issues.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Securing New GPUs Requires Multi-Year Contracts With 20-30% Upfront Prepayment

Accessing next-generation GPUs at scale is no longer a simple purchase. The market now demands three-to-five-year commitments with a significant portion (20-30%) of the total contract value paid upfront. This makes a company's cost of capital a critical competitive factor in acquiring compute capacity.

Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Technology | Startups·17 hours ago

Get your free personalized podcast brief

Get your free personalized podcast brief