It's easy to get distracted by the complex capabilities of AI. By starting with a minimalistic version of an AI product (high human control, low agency), teams are forced to define the specific problem they are solving, preventing them from getting lost in the complexities of the solution.
AI tools are dramatically lowering the cost of implementation and "rote building." The value shifts, making the most expensive and critical part of product creation the design phase: deeply understanding the user pain point, exercising good judgment, and having product taste.
To avoid failure, launch AI agents with high human control and low agency, such as suggesting actions to an operator. As the agent proves reliable and you collect performance data, you can gradually increase its autonomy. This phased approach minimizes risk and builds user trust.
In the AI era, leaders' decades-old intuitions may be wrong. To lead effectively, they must become practitioners again, actively learning and using AI daily. The CEO of Rackspace blocks out 4-6 a.m. for "catching up with AI," demonstrating the required commitment to rebuild foundational knowledge.
The popular concept of multiple specialized agents collaborating in a "gossip protocol" is a misunderstanding of what currently works. A more practical and successful pattern for multi-agent systems is a hierarchical structure where a single supervisor agent breaks down a task and orchestrates multiple sub-agents to complete it.
In a world where AI implementation is becoming cheaper, the real competitive advantage isn't speed or features. It's the accumulated knowledge gained through the difficult, iterative process of building and learning. This "pain" of figuring out what truly works for a specific problem becomes a durable moat.
Teams often mistakenly debate between using offline evals or online production monitoring. This is a false choice. Evals are crucial for testing against known failure modes before deployment. Production monitoring is essential for discovering new, unexpected failure patterns from real user interactions. Both are required for a robust feedback loop.
Unlike traditional software, AI products have unpredictable user inputs and LLM outputs (non-determinism). They also require balancing AI autonomy (agency) with user oversight (control). These two factors fundamentally change the product development process, requiring new approaches to design and risk management.
The word "evals" has been stretched to mean many different things: expert-written error analysis, PM-defined test cases, performance benchmarks, and LLM-based judges. This "semantic diffusion" causes confusion. Teams need to be specific about what part of the feedback loop they're discussing instead of using the generic term.
An AI product's job is never done because user behavior evolves. As users become more comfortable with an AI system, they naturally start pushing its boundaries with more complex queries. This requires product teams to continuously go back and recalibrate the system to meet these new, unanticipated demands.
Vendors selling "one-click" AI agents that promise immediate gains are likely just marketing. Due to messy enterprise data and legacy infrastructure, any meaningful AI deployment that provides significant ROI will take at least four to six months of work to build a flywheel that learns and improves over time.
