As base model capabilities converge, the key differentiator is shifting to the "agent harness"—the infrastructure, tools, and skills built around the model. For vertical AI, this is where domain expertise is injected, creating specialized agents with custom tools that outperform generalist models.
Harvey created and open-sourced "Legal Agent Bench" to measure AI agent performance on legal tasks. This establishes them as a thought leader, rallies the community to improve on their vertical's problems, and creates a moat by defining the standard of performance for the entire industry.
AI companies moving to token-based pricing will face the same client scrutiny as law firms with billable hours. Customers, shocked by huge, unpredictable bills, will demand granular usage reports, creating a new market for cost optimization and transparency tools.
Many believe AI will kill the billable hour, but it's a powerful mechanism for pricing thousands of unique, complex client engagements at scale. Negotiating fixed fees for every project is operationally unmanageable for large firms, making the billable hour a durable, standardized solution.
The era of using the most powerful AI model for every task is ending. Companies are now focused on the trade-off between quality, cost, and latency. The key question is no longer "Which model is best?" but "Which model is good enough for this task at the lowest price point?"
To build its legal benchmark without violating client confidentiality, Harvey used AI agents to generate realistic synthetic documents. This agent-led first draft was then refined by human legal experts, creating a scalable pipeline for high-quality, proprietary data in a data-scarce industry.
Harvey's founders knew the market wasn't ready for consumption-based AI agents. They built "two companies in parallel": a traditional seat-based SaaS product for immediate revenue and market education, while simultaneously developing the infrastructure for the inevitable shift to a consumption model.
Harvey open-sources its legal benchmark because enterprise clients like law firms can't risk vendor lock-in or conflicts with a single AI lab. For example, a firm representing OpenAI cannot send sensitive data to Anthropic's models. Open sourcing provides a necessary neutral layer.
Unlike Google Brain's "bottoms-up" research style, DeepMind used a "top-down" approach with a clear AGI goal and a tech tree of milestones. Vertical AI companies like Harvey find this model more effective because their end goal is a well-defined, applied problem rather than pure research.
