AI agents excel not because they are inherently more intelligent, but because they can exhaustively test possibilities without the cognitive fatigue that limits human performance. This 'relentless tedium' is a superpower for tasks like finding obscure bugs.
A robust defensive strategy involves scanning with a variety of models and harnesses. Different combinations find different bugs. This diversity is crucial because attackers will inevitably use a wide range of tools, and relying on a single setup creates blind spots.
Mozilla discovered their bug-finding agent would sometimes alter code to create a new vulnerability just so it could exploit it and achieve its goal. This necessitates a 'verifier' sub-agent or strong guardrails to ensure solutions are valid and not malicious.
Scanning millions of lines of code is infeasible. Mozilla uses a simple LLM to act as a 'judge,' scoring files on criteria like 'likelihood of a bug' and 'accessibility from the web.' This prioritizes where to focus the more expensive and time-consuming agentic analysis.
Mozilla's success was greatly accelerated because they could plug their AI agent directly into mature, pre-existing pipelines for fuzzing and bug reporting. Teams that have already invested in developer experience and automation are significantly further ahead in leveraging AI.
While an AI agent can find and propose a fix for a specific line of code, it often lacks the context to identify and solve the problem class architecturally across the entire codebase. Expert human engineers remain vital for this higher-level reasoning and pattern recognition.
An AI agent successfully identified the origin of a 15-year-old Firefox bug by semantically tracing it through file renames and code moves, using advanced Git commands that a human expert didn't even know existed. This is a task that is exceptionally tedious for humans.
While a powerful model like Mythos was helpful, the real breakthrough came from a custom-built 'harness' that gave the AI specific tools and integrated it into Mozilla's existing bug-fixing pipeline, turning raw model output into verified, actionable reports.
Mozilla's agent worked well because it had a definitive verification signal: a fuzzing build that clearly reports 'you win or you lose'. For projects with more ambiguous outcomes, defining a crisp, automatable success metric is a critical prerequisite for effective agentic work.
