The most valuable AI systems are built by people with deep knowledge in a specific field (like pest control or law), not by engineers. This expertise is crucial for identifying the right problems and, more importantly, for creating effective evaluations to ensure the agent performs correctly.

Related Insights

Anthropic's David Hershey states it's "deeply unsurprising" that AI is great at software engineering because the labs are filled with software engineers. This suggests AI's capabilities are skewed by its creators' expertise, and achieving similar performance in fields like law requires deeper integration with domain experts.

Instead of choosing a career based on its perceived "safety" from AI, individuals should pursue their passions to quickly become domain experts. AI tools augment this expertise, increasing the value of experienced professionals who can handle complex, nuanced situations that AI cannot.

To move beyond general knowledge, AI firms are creating a new role: the "AI Trainer." These are not contractors but full-time employees, typically PhDs with deep domain expertise and a computer science interest, tasked with systematically improving model competence in specific fields like physics or mathematics.

As AI tools become operable via plain English, the key skill shifts from technical implementation to effective management. People managers excel at providing context, defining roles, giving feedback, and reporting on performance—all crucial for orchestrating a "team" of AI agents. Their skills will become more valuable than pure AI expertise.

With AI agents automating raw code generation, an engineer's role is evolving beyond pure implementation. To stay valuable, engineers must now cultivate a deep understanding of business context and product taste to know *what* to build and *why*, not just *how*.

Despite hype in areas like self-driving cars and medical diagnosis, AI has not replaced expert human judgment. Its most successful application is as a powerful assistant that augments human experts, who still make the final, critical decisions. This is a key distinction for scoping AI products.

Building an AI application is becoming trivial and fast ("under 10 minutes"). The true differentiator and the most difficult part is embedding deep domain knowledge into the prompts. The AI needs to be taught *what* to look for, which requires human expertise in that specific field.

While choosing a leading vendor is important, the ultimate success of an AI agent hinges on the deep, continuous training you invest. An average tool with excellent, hands-on training will outperform a top-tier tool with zero effort put into its refinement.

AI evaluation shouldn't be confined to engineering silos. Subject matter experts (SMEs) and business users hold the critical domain knowledge to assess what's "good." Providing them with GUI-based tools, like an "eval studio," is crucial for continuous improvement and building trustworthy enterprise AI.

Building a functional AI agent is just the starting point. The real work lies in developing a set of evaluations ("evals") to test if the agent consistently behaves as expected. Without quantifying failures and successes against a standard, you're just guessing, not iteratively improving the agent's performance.