For its Custom Agents feature, Notion rejected the goal of making it "as easy as possible to use." They realized simplifying the interface would abstract away critical interpretability and diminish the tool's power, so they aligned on building a deep, sophisticated product for "the top of the class."
Instead of creating a bespoke memory or messaging protocol for agent-to-agent communication, Notion leverages its core primitives. Agents compose by writing to and reading from shared Notion pages and databases, creating a decoupled, human-editable, and transparent system for coordination.
In the rapidly evolving AI space, rebuilding systems is common. Notion actively fosters a culture where engineers are driven by company goals, not attached to their past work. This prevents friction and allows the team to swarm problems and pivot quickly as capabilities change.
To avoid saturated evaluations that only confirm existing capabilities, Notion's team creates difficult test suites they expect to fail 70% of the time. This "headroom" provides a clear signal to model providers about frontier needs and helps the team anticipate where the technology is heading.
Notion treats its entire evaluation process as a coding agent problem. The system is designed for an agent to download a dataset, run an eval, identify a failure, debug the issue, and implement a fix, all within an automated loop. This turns quality assurance into a meta-problem for agents to solve.
As AI coding agents become more capable, the primary skill for engineers is evolving. It's less about writing individual lines of code and more about the managerial skills of delegation, context switching, and designing and overseeing systems of agents, mirroring the transition managers go through.
By empowering anyone to build and ship functional prototypes, Notion's culture shifts the burden of proof. With many working demos competing for attention, product managers must have a strong, clear vision to prioritize and ensure the team is focusing on a single tower, not building a "flat hill."
Notion's AI strategy extends beyond the AI team. Every product engineering team is tasked with ensuring their features are usable by both humans and AI agents. This anticipates a future where the majority of traffic will come from agents interfacing with Notion's tools, making agent-compatibility a core requirement.
Recognizing that evaluating and steering AI models requires a unique skillset, Notion created a non-traditional technical role. These engineers, often from non-CS backgrounds, focus on qualitative analysis, defining evaluation journeys, and understanding model taste, bridging the gap between product and pure software engineering.
Notion sees value in both agent protocols. CLIs are powerful because agents can debug and extend their own tools within the same terminal environment. However, MCPs are better for narrow use cases requiring a strong, simple permission model where the agent can only call predefined tools.
Early on, a central AI team managed a single, complex few-shot prompt, creating a bottleneck. The key shift was to a tool-calling architecture where individual product teams own their agent's tools and definitions. This distributed ownership, enabled by strong evaluation frameworks, dramatically increased development velocity.
Notion's journey to a working AI agent involved multiple failed attempts. Key lessons were to stop forcing models to use Notion-specific data formats and instead provide them with familiar interfaces like Markdown and SQLite, which they are pre-trained to understand well.
