Brex's automated expense auditing employs a multi-agent system. An "audit agent" is optimized for recall, flagging every potential policy violation. A second "review agent" then applies judgment and business context to decide which cases are significant enough to pursue.

Related Insights

For its user assistant, Brex moved beyond a single agent with many tools. Instead, they built a network where specialized sub-agents (e.g., policy, travel) have multi-turn conversations with an orchestrator agent to collaboratively solve complex user requests.

In regulated industries, AI's value isn't perfect breach detection but efficiently filtering millions of calls to identify a small, ambiguous subset needing human review. This shifts the goal from flawless accuracy to dramatically improving the efficiency and focus of human compliance officers.

Rather than programming AI agents with a company's formal policies, a more powerful approach is to let them observe thousands of actual 'decision traces.' This allows the AI to discover the organization's emergent, de facto rules—how work *actually* gets done—creating a more accurate and effective world model for automation.

The effectiveness of enterprise AI agents is limited not by data access, but by the absence of context for *why* decisions were made. 'Context graphs' aim to solve this by capturing 'decision traces'—exceptions, precedents, and overrides that currently live in Slack threads and employee's heads, creating a true source of truth for automation.

Brex initially invested in a sophisticated reinforcement learning model for credit underwriting but found it was inferior to a straightforward web research agent. For operational tasks requiring auditable processes, simpler LLM applications are often superior.

Run HR, finance, and legal using AI agents that operate based on codified rules. This creates an autonomous back office where human intervention is only required for exceptions, not routine patterns. The mantra is: "patterns deserve code, exceptions deserve people."

AI agents solve the classic "recall vs. precision" problem in site reliability. Vercel's CTO explains you can set monitoring thresholds very aggressively. Instead of paging a human, an agent investigates first, filtering out false positives and only escalating true emergencies, thus eliminating alert fatigue.

Create AI agents that embody key executive personas to monitor operations. A 'CFO agent' could audit for cost efficiency while a 'brand agent' checks for compliance. This system surfaces strategic conflicts that require a human-in-the-loop to arbitrate, ensuring alignment.

To improve the quality and accuracy of an AI agent's output, spawn multiple sub-agents with competing or adversarial roles. For example, a code review agent finds bugs, while several "auditor" agents check for false positives, resulting in a more reliable final analysis.

Treat accountability as an engineering problem. Implement a system that logs every significant AI action, decision path, and triggering input. This creates an auditable, attributable record, ensuring that in the event of an incident, the 'why' can be traced without ambiguity, much like a flight recorder after a crash.