Over time, prompts become long and complex, accumulating contradictions from multiple contributors. Chip Huyen suggests treating them like a codebase: use another AI to analyze the prompt for inconsistencies and "refactor" it for better performance and clarity.
When an AI agent performs web searches, it generates multiple queries for a single task. These different queries often lead to the same URLs, causing the agent to revisit and process the same content repeatedly, dramatically increasing token consumption and cost.
As AI automates narrow skills like writing code snippets, the ability to think at a system level becomes paramount. Designing how different components—including classical ML models, LLMs, and traditional software—fit together is a skill that is harder to automate and increasingly valuable.
Using one LLM to evaluate another's output ("LLM as a Judge") is a common but deceptively difficult technique. Chip Huyen highlights that companies can spend up to 80% of their development time just writing and refining the complex evaluation guidelines for the judge LLM.
AI Engineering leverages pre-trained foundation models as a service for rapid integration. This contrasts with traditional Machine Learning Engineering, which involves building a model from scratch, from data collection to deployment, resulting in a much slower time-to-market.
Adopt a "start simple" approach for AI development. Master prompting first. If that fails, use Retrieval Augmented Generation (RAG). Fine-tuning should be the last resort due to its complexity in deployment, serving, and keeping up with rapidly evolving base models.
Robotic intelligence has two components. "Reasoning," which involves creating a plan, is quickly being solved by AI. The other, harder part is "movement"—the robot's physical dexterity to execute that plan reliably in a complex environment without tripping or failing.
Unlike traditional software with deterministic outputs, generative AI systems require a new paradigm. Chip Huyen calls this "evaluation-driven development," where the focus shifts from writing fixed tests to building robust systems and guidelines for evaluating ambiguous, generative outputs.
AI makes software incredibly easy to build and replicate, eroding traditional business moats. Chip Huyen argues the next frontier for durable value is in physical AI and robotics, where hardware development cycles and real-world complexities prevent instant copying.
Improving AI isn't just about better models; it's also about adapting the environment. Chip Huyen suggests making the world "AI ready" by creating APIs for physical infrastructure, such as a city offering a streetlight API so a delivery robot can request a green light.
When Chip Huyen's first book, which lacked code snippets, was released, some dismissed it as "not technical." Its massive success indicates a crucial industry shift: system design and architectural thinking are now recognized as fundamental engineering skills, separate from pure coding.
The true cost of fine-tuning isn't the initial training but the ongoing maintenance. Base foundation models experience significant capability improvements every 2-3 months. This pace means a custom fine-tuned model can quickly fall behind, forcing a continuous and expensive re-tuning cycle.
