Product managers at Anthropic use internal agents for a deeper understanding of their product. This includes tracking PRs and diagnosing field issues, significantly reducing their dependency on engineers for information and empowering them to work more autonomously.
The 'harness' provides the scaffolding for tools and memory. Anthropic's product lead argues that separating model development from harness development is impossible if you want maximum performance, as models are always tested and ultimately perform in conjunction with a harness.
A powerful workflow is to spin up temporary agents for specific, short-term needs. An Anthropic PM created a disposable agent to parse and prioritize a large feature waitlist, automating weeks of work without building a polished, long-term product.
Prioritize qualitative 'vibe testing' over quantitative evals in early agent development. The most crucial first step is getting the agent in front of users to see if it 'feels' right and is useful before investing in formal, scalable quality checks.
Many companies try automating massive, multi-team processes from day one. A better strategy is to first empower individual employees to build their own agents, fostering a culture of innovation before tackling complex, cross-functional automation.
Unlike simple prompting loops that fail on error, modern agentic systems are built to be resilient. They can identify when they've gone off-course, revise their thinking, and re-steer themselves toward the goal—a crucial capability for long-running autonomous tasks.
Traditional evals fall short for sophisticated agents. A more effective method is a built-in evaluation loop where one agent is tasked with grading the output of another. This allows for continuous, automated quality assessment, especially when done in separate context windows to avoid bias.
Instead of demanding specific JSON schemas, advanced agent prompting involves describing the final, desired outcome (e.g., 'a beautiful and interactive report'). The agent, equipped with self-correction capabilities, then figures out the necessary steps to create that rich end-product.
