We scan new podcasts and send you the top 5 insights daily.
Bloomberg spent eight figures on BloombergGPT, only for GPT-4 to make it obsolete weeks later. This is a cautionary tale: the high cost, maintenance, and opportunity cost of fine-tuning often outweigh marginal performance gains, especially as foundation models advance relentlessly. Most teams should avoid it.
Fine-tuning creates model-specific optimizations that quickly become obsolete. Blitzy favors developing sophisticated, system-level "memory" that captures enterprise-specific context and preferences. This approach is model-agnostic and more durable as base models improve, unlike fine-tuning which requires constant rework.
Early-stage AI startups should resist spending heavily on fine-tuning foundational models. With base models improving so rapidly, the defensible value lies in building the application layer, workflow integrations, and enterprise-grade software that makes the AI useful, allowing the startup to ride the wave of general model improvement.
The opportunity cost of building custom internal AI can be massive. By the time a multi-million dollar project is complete, off-the-shelf tools like ChatGPT are often far more capable, dynamic, and cost-effective, rendering the custom solution outdated on arrival.
The "bitter lesson" of AI applies to product development: complex scaffolding built around model limitations (like early vector stores or agent frameworks) will inevitably become obsolete as the models themselves get smarter and absorb those functions. Don't over-engineer solutions that a future model will solve natively.
OpenAI favors "zero gradient" prompt optimization because serving thousands of unique, fine-tuned model snapshots is operationally very difficult. Prompt-based adjustments allow performance gains without the immense infrastructure burden, making it a more practical and scalable approach for both OpenAI and developers.
The traditional wisdom to "build what's core" to your business is becoming obsolete for AI. The immense cost and rapid advancement of foundational models by major labs mean most companies are better off buying or partnering for core AI capabilities rather than attempting to build them in-house.
Fine-tuning remains relevant but is not the primary path for most enterprise use cases. It's a specialized tool for situations with unique data unseen by foundation models or when strict cost and throughput requirements for a high-volume task justify the investment. Most should start with RAG.
Richard Sutton's "Bitter Lesson" suggests general compute always wins. Applied to LLMs, building complex workflows or fine-tuning yields only temporary gains that the next-generation general model will erase. Always bet on the more general model.
For use cases demanding strict fidelity to a complex knowledge domain like Catholic theology, fine-tuning existing models proves inadequate over the long tail of user queries. This necessitates the more expensive path of training a model from scratch.
Despite constant new model releases, enterprises don't frequently switch LLMs. Prompts and workflows become highly optimized for a specific model's behavior, creating significant switching costs. Performance gains of a new model must be substantial to justify this re-engineering effort.