Chipmaker Etched Uses Low-Voltage Inference to Sidestep Thermal GPU Bottlenecks

Related Insights

GPU Performance-Per-Watt Is Plateauing, Demanding New Architectures

The performance gains from Nvidia's Hopper to Blackwell GPUs come from increased size and power, not efficiency. This signals a potential scaling limit, creating an opportunity for radically new hardware primitives and neural network architectures beyond today's matrix-multiplication-centric models.

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Latent Space: The AI Engineer Podcast·7 months ago

Disaggregating Inference Extends GPU Lifespans to Over 10 Years

Separating inference into "prefill" (memory-bound) and "decode" (bandwidth-bound) tasks is a game-changer for hardware longevity. It allows older GPUs to be used for prefill tasks indefinitely, extending their useful economic life from 3-4 years to 10-15 years, a boon for data centers and their financiers.

Gavin Baker - Watts and Wafers - [Invest Like the Best, EP.473]

Invest Like the Best with Patrick O'Shaughnessy·a month ago

Chipmakers Are Finally Breaking a Decades-Old Power Density Limit of 1 Watt/mm²

For two decades, silicon chips have been thermally constrained to a power density of about 1 watt per square millimeter. New R&D efforts are finally overcoming this barrier, which could lead to smaller, more powerful chips, despite significant thermal and electrical engineering challenges.

Why Hardware-Software Co-Design Is AI's Real 100x: Dylan Patel of SemiAnalysis

Training Data·20 hours ago

Power Scarcity Benefits Top AI Chipmakers by Making Price Irrelevant

When power (watts) is the primary constraint for data centers, the total cost of compute becomes secondary. The crucial metric is performance-per-watt. This gives a massive pricing advantage to the most efficient chipmakers, as customers will pay anything for hardware that maximizes output from their limited power budget.

Gavin Baker - Nvidia v. Google, Scaling Laws, and the Economics of AI - [Invest Like the Best, EP.451]

Invest Like the Best with Patrick O'Shaughnessy·7 months ago

GPUs Are Cheap for Slow AI Tokens but Extremely Expensive for Fast Ones

The GPU architecture is economically optimized for slow AI inference, offering a very low cost per token. However, this efficiency plummets when speed is required, as the cost and power per token increase exponentially, creating a market for alternative architectures in high-speed applications.

Why Cerebras CEO Andrew Feldman Built The World's Largest Computer Chip

Odd Lots·a month ago

Co-designing LLMs with Target Hardware Unlocks Major Inference Efficiency Gains

Model architecture decisions directly impact inference performance. AI company Zyphra pre-selects target hardware and then chooses model parameters—such as a hidden dimension with many powers of two—to align with how GPUs split up workloads, maximizing efficiency from day one.

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony

Latent Space: The AI Engineer Podcast·8 months ago

Future AI Performance Gains Will Come From Low-Voltage Chip Architectures

Adding more FLOPS to current AI chips is useless due to thermal throttling. Etched realized the solution is lowering voltage, which quadratically reduces power consumption. Inspired by bitcoin miners, they created a new power delivery system enabling chips to run at under half the voltage of GPUs.

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

Invest Like the Best with Patrick O'Shaughnessy·14 hours ago

Lab-Grown Diamonds Offer a Cheaper, More Efficient Cooling Solution for AI Servers

Leveraging technology developed for satellites, Akash Systems places a thin layer of synthetic diamond—the world's most thermally conductive material—directly onto GPUs. This dramatically lowers temperatures, increases inference speed, and reduces data center energy costs without expensive liquid cooling systems.

Ellison's Media Empire, Ken Burns Joins, Cursor Mic Drop | Matthew Belloni, Gokul Rajaram, Nik Seetharaman, Raj Rajamani, James Everingham, Dr. Felix Ejeckam

TBPN·4 months ago

Microsoft's Maya 200 AI Chip Is Optimized for Inference, Not Training

Unlike general-purpose NVIDIA GPUs, Microsoft's custom Maya 200 chip focuses specifically on running existing AI models (inference). Microsoft claims this makes it cheaper for certain tasks, like its own Copilot tools, creating a cost-saving value proposition for potential customers like Anthropic.

Anthropic in Talks to Use Microsoft AI Chips, Biggest Reveals in SpaceX IPO Filing

The Information's TITV·a month ago

AI Inference Bottlenecks Are Solved at the Cluster, Not Chip Level

Instead of focusing on on-chip memory bandwidth, Etched optimized for cluster-scale memory. They built a custom interconnect that cuts chip-to-chip latency by over 5x compared to GPUs. This allows the memory of the entire cluster to function as a single, low-latency pool, dramatically improving performance.

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

Invest Like the Best with Patrick O'Shaughnessy·14 hours ago

Get your free personalized podcast brief

Related Insights