Over 95% of Production Open Source LLMs Are Custom-Modified, Not Vanilla

Related Insights

Enterprises Will Permanently Need Smaller, Custom-Tuned LLMs for Vertical Tasks

For specialized, high-stakes tasks like insurance underwriting, enterprises will favor smaller, on-prem models fine-tuned on proprietary data. These models can be faster, more accurate, and more secure than general-purpose frontier models, creating a lasting market for custom AI solutions.

20VC: Scale, Surge, Turing, Mercor: Who Wins & Who Loses in Data Labelling | Is Revenue in Data Labelling Real or GMV? | Why 99% of Knowledge Work Will Go and What Happens Then? | Why SaaS is Dead in a World of AI with Jonathan Siddharth @ Turing

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch·5 months ago

Quantized LLMs Are "Cousins," Not Clones, of the Original Model

Quantization and distillation don't simply create a smaller version of an LLM. These optimization processes alter the model's behavior to the point where it becomes a new entity—a "cousin." It may be legible and functional, but it will not produce the same outputs as the original.

959: Building Agents 101: Design Patterns, Evals and Optimization (with Sinan Ozdemir)

Super Data Science: ML & AI Podcast with Jon Krohn·3 months ago

Production LLMs Are "Over-trained" by 100x vs. Chinchilla Laws to Optimize for Inference Cost

The Chinchilla scaling law optimizes pre-training compute alone. However, production models must also account for inference costs. By training smaller models on much more data (~100x the Chinchilla optimum), labs create models that are cheaper to run for users, effectively amortizing the higher training cost over the model's lifetime.

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Podcast·3 days ago

Major AI Labs Likely Deploy Distilled MOE Models, Not Their Original Trained Dense Models

The public-facing models from major labs are likely efficient Mixture-of-Experts (MOE) versions distilled from much larger, private, and computationally expensive dense models. This means the model users interact with is a smaller, optimized copy, not the original frontier model.

[LIVE] Anthropic Distillation & How Models Cheat (SWE-Bench Dead) | Nathan Lambert & Sebastian Raschka

Latent Space: The AI Engineer Podcast·2 months ago

Enterprise Domain Adaptation Requires a Minimum of 10 Billion Tokens After Curation

Customizing a base model with proprietary data is only effective if a company possesses a massive corpus. At least 10 billion high-quality tokens are needed *after* aggressive deduplication and filtering. This high threshold means the strategy is only viable for the largest corporations, a much higher bar than most businesses realize.

Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 months ago

Top-Tier AI Companies Are Reversing the Trend Towards Proprietary Models

Contrary to past momentum, the most advanced AI startups are increasingly adopting and fine-tuning open-source models. This shift is driven by the need for cost-effective speed and deep customization as their workloads mature and scale.

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Latent Space: The AI Engineer Podcast·9 days ago

Low Latency, Not Performance or Cost, Is the Primary Driver for Enterprise Fine-Tuning

The most compelling business reason for enterprises to adopt custom fine-tuning is the need for low latency. For real-time applications like voice bots, large frontier models are too slow. This practical constraint forces companies to use smaller, specialized open-source models.

The RL Fine-Tuning Playbook: CoreWeave's Kyle Corbitt on GRPO, Rubrics, Environments, Reward Hacking

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·21 hours ago

Enterprises Will Shift 90% of AI Tasks to Cheaper Small Language Models (SLMs)

As enterprises scale AI, the high inference costs of frontier models become prohibitive. The strategic trend is to use large models for novel tasks, then shift 90% of recurring, common workloads to specialized, cost-effective Small Language Models (SLMs). This architectural shift dramatically improves both speed and cost.

Anthropic’s Mythos is a cyber-weapon, so you can’t have it | E2273

This Week in Startups·24 days ago

Enterprises Rarely Switch LLMs Due to High Re-Optimization Costs

Despite constant new model releases, enterprises don't frequently switch LLMs. Prompts and workflows become highly optimized for a specific model's behavior, creating significant switching costs. Performance gains of a new model must be substantial to justify this re-engineering effort.

Bringing AI to Data: Agent Design, Text-2-SQL, RAG, & more, w- Snowflake VP of AI Baris Gultekin

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·4 months ago

Reflection AI CEO: Enterprises Adopt Open Source AI to Cut Costs or Boost Niche Performance

Misha Laskin, CEO of Reflection AI, states that large enterprises turn to open source models for two key reasons: to dramatically reduce the cost of high-volume tasks, or to fine-tune performance on niche data where closed models are weak.

Sam Altman LIVE on Sora, Hollywood, & the Future of Ads | Bill Peebles, Dylan Patel, Elad Gil, Robby Stein, Morgan Housel, Misha Laskin

TBPN·7 months ago

Get your free personalized podcast brief

Related Insights