RiffOn - How Braintrust uses AI agents, evals, and CI to ship better software | Ankur Goyal

Braintrust CEO Ankur Goyal on using AI agents and rigorous evals to solve deep technical problems that are impractical for human engineers.

AI Agents Outperform Staff Engineers in Rigorous Software Benchmarking

Ankur Goyal argues that AI agents can run far more exhaustive benchmarks and test more algorithms than even the best staff engineers manually could. This eliminates the common practice of prioritizing a few key benchmarks and "bullshitting" the rest, leading to more robust and performant software.