Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Early on, a central AI team managed a single, complex few-shot prompt, creating a bottleneck. The key shift was to a tool-calling architecture where individual product teams own their agent's tools and definitions. This distributed ownership, enabled by strong evaluation frameworks, dramatically increased development velocity.

Related Insights

Anthropic's new "Agent Teams" feature moves beyond the single-agent paradigm by enabling users to deploy multiple AIs that work in parallel, share findings, and challenge each other. This represents a new way of working with AI, focusing on the orchestration and coordination of AI teams rather than just prompting a single model.

Getting high-quality results from AI doesn't come from a single complex command. The key is "harness engineering"—designing structured interaction patterns between specialized agents, such as creating a workflow where an engineer agent hands off work to a separate QA agent for verification.

The widespread use of coding agents at Notion has amplified engineering output, leading to what co-founder Simon Last calls a 'more messy and chaotic' environment. This 'productive chaos' manifests as more ambitious pull requests and non-engineering teams, like design, building their own sophisticated prototyping tools.

An AI coding agent's performance is driven more by its "harness"—the system for prompting, tool access, and context management—than the underlying foundation model. This orchestration layer is where products create their unique value and where the most critical engineering work lies.

Notion treats its entire evaluation process as a coding agent problem. The system is designed for an agent to download a dataset, run an eval, identify a failure, debug the issue, and implement a fix, all within an automated loop. This turns quality assurance into a meta-problem for agents to solve.

The most powerful AI systems consist of specialized agents with distinct roles (e.g., individual coaching, corporate strategy, knowledge base) that interact. This modular approach, exemplified by the Holmes, Mycroft, and 221B agents, creates a more robust and scalable solution than a single, all-knowing agent.

Instead of creating one monolithic "Ultron" agent, build a team of specialized agents (e.g., Chief of Staff, Content). This parallels existing business mental models, making the system easier for humans to understand, manage, and scale.

To fully leverage rapidly improving AI models, companies cannot just plug in new APIs. Notion's co-founder reveals they completely rebuild their AI system architecture every six months, designing it around the specific capabilities of the latest models to avoid being stuck with suboptimal implementations.

To avoid the rapid depreciation of hard-coded systems as LLMs improve, Blitzy's architecture is dynamic. Agents are generated just-in-time, with prompts written and tools selected by other agents based on the latest model capabilities and the specific task requirements.

The belief that adding people to a late project makes it later (Brooks's Law) may not apply in an AI-assisted world. Early reports from OpenAI suggest that when using agents, adding more developers actually increases velocity, a potential paradigm shift for engineering management and team scaling.