Move beyond manual agent improvement by creating an automated loop. In this process, an agent runs, its performance is evaluated, failures are identified, and another process suggests and implements code fixes. This creates a foundation for self-improving systems.
With AI making code generation cheap, product taste is the key differentiator. In top AI teams, PMs are increasingly technical, using tools like Claude Code to build and iterate, making their role nearly identical to an engineer's.
Since AI makes coding cheap, the real advantage lies in 'product taste.' Develop this by building an agent that consumes and synthesizes feedback from all sources—GitHub, Slack, Gong transcripts, and Twitter—to identify key user pains and roadmap priorities.
Don't aim for a 100% accurate evaluation system. A good system reveals a 'healthy percentage' of incorrect outputs. Getting excited when evals are wrong is key, as each failure is a clear, actionable opportunity to improve your AI agent.
The modern product development cycle for AI is a tight, iterative loop executed within a coding agent. This involves creating the agent, tracing every step for observability, running evaluations (evals) to find weaknesses, and then improving the agent based on those findings.
While the goal is autonomous improvement, deploying these systems safely in production requires human oversight. Implement mandatory human-in-the-loop steps, specifically code reviews for any proposed changes to the agent or its evaluation logic, before shipping to users.
The new frontier for product management is diving deep into AI systems' operational data. According to Arize's CPO, PMs who regularly analyze traces and evaluations to understand agent behavior are far ahead of their peers and represent the top 1% of the field.
Eliminate the engineering bottleneck for setting up observability. Use pre-built 'skills' within coding agents like Claude Code. A single command can analyze an agent's code and automatically instrument it to send trace data to platforms like Arize, no engineer required.
Don't start building evaluations from a blank slate. Use an AI agent to analyze your production traces and automatically generate a baseline 'vibe eval.' This initial evaluation won't be perfect, but it provides a starting point for refinement and accelerates the improvement loop.
The biggest AI opportunity for large companies is breaking down data silos. By building a 'context graph,' you give AI agents access to information from different departments and systems. This enables agents to perform cross-functional tasks and surface insights that were previously impossible.
The velocity at elite AI-native companies has radically accelerated. It is now possible to identify a critical user request, have a PM or engineer prototype a solution using tools like Claude Code, and ship a production-ready feature all within the same day.
