Petabyte-Scale Data Storage is a Major Hidden Cost in Video Model Training

Related Insights

A 100x Cost Reduction Is Possible by Accepting One Non-Critical Performance Trade-Off

TurboPuffer achieved its massive cost savings by building on slow S3 storage. While this increased write latency by 1000x—unacceptable for transactional systems—it was a perfectly acceptable trade-off for search and AI workloads, which prioritize fast reads over fast writes.

He built a new database in his bedroom—now he powers Cursor, Notion and Anthropic. | Simon Eskildsen, Founder of turbopuffer

A Product Market Fit Show | Startup Podcast for Founders·9 months ago

AI's Biggest Network Impact Isn't Downstream Content, It's the Upstream Flood from Video Sensors

The proliferation of sensors, especially cameras, will generate massive amounts of video data. This data must be uploaded to cloud AI models for processing, making robust upstream bandwidth—not just downstream—the critical new infrastructure bottleneck and a significant opportunity for telecom companies.

HIGHLIGHTS: John Stankey - CEO of AT&T

In Good Company with Nicolai Tangen·7 months ago

The Biggest Cost Driver in AI Is Data Storage, Not Model Inference

An advanced user reveals their largest new expense from building AI agents isn't tokens, but database and storage costs. AI makes vast amounts of previously inert data useful, creating a surge in demand for storage solutions, which is where the real economic leverage lies.

Elon Musk vs OpenAI Trial, Google Cloud Surge, Meta’s Blocked Acquisition, Anthropic Winning

More or Less·3 months ago

Generative Video is 10,000x More Compute-Intensive Than an LLM Prompt

The computational requirements for generative media scale dramatically across modalities. If a 200-token LLM prompt costs 1 unit of compute, a single image costs 100x that, and a 5-second video costs another 100x on top of that—a 10,000x total increase. 4K video adds another 10x multiplier.

The Rise of Generative Media: fal's Bet on Video, Infrastructure, and Speed

Training Data·7 months ago

Video Data's Low Intelligence-Per-Bit Is Offset by Its Immense Volume

The Sora team views video as having lower "intelligence per bit" compared to text. However, the total volume of available video data is vastly larger and less tapped. This suggests that, unlike LLMs facing a data crunch, video models can scale with more data for a very long time.

OpenAI Sora 2 Team: How Generative Video Will Unlock Creativity and World Models

Training Data·8 months ago

Public Data for AI Models Carries a Hidden $15M+ Compute Cost

While OpenFold trains on public datasets, the pre-processing and distillation to make the data usable requires massive compute resources. This "data prep" phase can cost over $15 million, creating a significant, non-obvious barrier to entry for academic labs and startups wanting to build foundational models.

An AI Collaborative that Welcomes All into the Fold

The Bio Report·8 months ago

Your AI Data Costs Are Rising for Two Reasons

Data is becoming more expensive not from scarcity, but because the work has evolved. Simple labeling is over. Costs are now driven by the need for pricey domain experts for specialized data preparation and creative teams to build complex, synthetic environments for training agents.

20VC: Cohere's Chief Scientist on Why Scaling Laws Will Continue | Whether You Can Buy Success in AI with Talent Acquisitions | The Future of Synthetic Data & What It Means for Models | Why AI Coding is Akin to Image Generation in 2015 with Joelle Pineau

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch·8 months ago

AI Has Scaled the Definition of a 'Large' Data Center by 1000x in Just Two Years

The infrastructure demands of AI have caused an exponential increase in data center scale. Two years ago, a 1-megawatt facility was considered a good size. Today, a large AI data center is a 1-gigawatt facility—a 1000-fold increase. This rapid escalation underscores the immense and expensive capital investment required to power AI.

Europe in the Global AI Race

Thoughts on the Market·8 months ago

AI's Real Network Strain Comes from Upstream Video Data, Not Downstream Content Consumption

The next wave of data growth will be driven by countless sensors (like cameras) sending video upstream for AI processing. This requires a fundamental shift to symmetrical networks, like fiber, that have robust upstream capacity.

AT&T CEO: Connecting the Future, Embracing AI and Driving Cultural Change

In Good Company with Nicolai Tangen·7 months ago

AI Data Center CapEx Can Exceed Rocket and Satellite Manufacturing Costs

Counterintuitively, the capital expenditure for building AI data centers can be significantly higher than for manufacturing complex physical hardware like rockets and satellites. SpaceX's xAI division spent 50% more on CapEx than its rocket and satellite divisions combined, highlighting the immense cost of AI infrastructure at scale.

SpaceX Financials, Does AI Increase Unemployment or Leisure, Chimp Civil War | Diet TBPN

TBPN·3 months ago

Get your free personalized podcast brief

Related Insights