Vertical AI Leader Abridge Prioritizes Latency Over Cost in Building Custom Models

Related Insights

NeoCloud Providers Avoid AMD Chips Due to Customer Performance Demands

Emerging cloud providers (“NeoClouds”) are sticking exclusively with NVIDIA, despite alternatives from AMD. The perceived performance risk is too high, as customers demand state-of-the-art inference speed and providers can't risk a multi-billion dollar investment on a non-NVIDIA stack that might offer lower throughput.

Nvidia’s $2B Nebius Deal, Oracle’s Q3 Comeback, OpenAI to Launch Sora in ChatGPT

The Information's TITV·5 months ago

Ad-Tech Giant AppLovin's Bootstrapped Culture Forces GPU Optimization Over Massive CapEx

Unlike compute-rich giants, AppLovin's bootstrapped culture enforces extreme efficiency in its AI infrastructure. Engineers don't have unlimited GPUs, forcing them to optimize code and models for cost and performance. This constraint-driven approach leads to significant cost savings and a lean operational model.

Claude Sonnet 4.5 Reactions, David Senra Live in The Ultradome | Dylan Field, Adam Foroughi, Mike Krieger, Jeff Weinstein, Adam Draper, James Hawkins, Erik Bernhardsson

TBPN·10 months ago

Cerebras' Niche Is Price-Insensitive Users Needing Speed-Ups for Agentic AI Tasks

For complex, long-running AI agent tasks, some users will pay 10x the price for a 10x speed improvement. Cerebras' hardware is ideal for this specific, high-value use case within larger platforms like OpenAI's Codex, compressing tasks from hours to minutes.

FULL INTERVIEW: Dylan Patel Says We’re Still Underestimating AI

TBPN·6 months ago

Abridge Solves Real-Time AI's Cost-Latency Dilemma with a "Constellation of Models"

To provide high-quality AI insights in real-time without prohibitive costs, Abridge employs a "fast and slow" thinking approach. It uses a constellation of models, where a cheaper, faster model first triages a situation and then hands off complex tasks to a more powerful, expensive model only when necessary.

AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

Latent Space: The AI Engineer Podcast·3 months ago

Nvidia's Grok Acquisition Targets High-Margin, Low-Latency AI Market

Nvidia bought Grok not just for its chips, but for its specialized SRAM architecture. This technology excels at low-latency inference, a segment where users are now willing to pay a premium for speed. This strategic purchase diversifies Nvidia's portfolio to capture the emerging, high-value market of agentic reasoning workloads.

Dan Wang's Annual Letter, Meta Acquires Manus, Nvidia's $20B Groq Deal | Justin Mares

TBPN·7 months ago

Low Latency, Not Performance or Cost, Is the Primary Driver for Enterprise Fine-Tuning

The most compelling business reason for enterprises to adopt custom fine-tuning is the need for low latency. For real-time applications like voice bots, large frontier models are too slow. This practical constraint forces companies to use smaller, specialized open-source models.

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Nvidia's Grok Acquisition Targets Low-Latency AI Agents, Not a Full Pivot to ASICs

Nvidia's integration of Grok technology is a strategic move to serve exploding demand for low-latency inference from AI agents. This complements its core GPU business by targeting a specific 25% of the inference market, rather than signaling a wholesale shift away from general-purpose architectures.

FULL INTERVIEW: Why I Think Nvidia Is Perfectly Positioned In The AI Race

TBPN·4 months ago

Agentic AI's High Costs and Latency Are Forcing a Shift from Cloud to Local PC Chips

The evolution of AI towards complex, autonomous "agents" makes relying solely on the cloud slow and expensive, as users burn through token budgets. Nvidia's bet is that running these agents locally on powerful new PC chips will be faster and cheaper for consumers, driving a major hardware shift away from pure cloud computing.

Head out of the cloud: Nvidia’s personal-computer shift

Economist Podcasts·2 months ago

NVIDIA Specializes General AI Models for Niche Verticals Like Surgical Robotics

NVIDIA is creating customized versions of its general-purpose AI models, like Cosmos and Groot, for specific industries. By fine-tuning them on specialized data, such as surgical videos, they can power high-value, niche applications like surgical robots, demonstrating a vertical-focused go-to-market strategy.

AI Agents Helping Robots, Amazon Acquires Robotics Startup, White House Unveils AI Legislation

The Information's TITV·4 months ago

Billion-Dollar Training Runs Justify Designing Single-Use Custom ASICs for That Model

At a massive scale, chip design economics flip. For a $1B training run, the potential efficiency savings on compute and inference can far exceed the ~$200M cost to develop a custom ASIC for that specific task. The bottleneck becomes chip production timelines, not money.

Inside AI’s $10B+ Capital Flywheel — Martin Casado & Sarah Wang of a16z

Latent Space: The AI Engineer Podcast·5 months ago

Get your free personalized podcast brief

Related Insights