We scan new podcasts and send you the top 5 insights daily.
A former OpenAI security expert argues that even if AI makes codebases more secure, hacking won't become harder. Attackers exploit the entire system—runtime behavior, configurations, authentication—not just static code. Looking only at code is like seeing a dinosaur's bones; you miss the muscles, feathers, and behavior that define the real-world attack surface.
The rapid evolution of AI makes reactive security obsolete. The new approach involves testing models in high-fidelity simulated environments to observe emergent behaviors from the outside. This allows mapping attack surfaces even without fully understanding the model's internal mechanics.
Defenders of AI models are "fighting against infinity" because as model capabilities and complexity grow, the potential attack surface area expands faster than it can be secured. This gives attackers a persistent upper hand in the cat-and-mouse game of AI security.
Claiming a "99% success rate" for an AI guardrail is misleading. The number of potential attacks (i.e., prompts) is nearly infinite. For GPT-5, it's 'one followed by a million zeros.' Blocking 99% of a tested subset still leaves a virtually infinite number of effective attacks undiscovered.
Unlike human attackers, AI can ingest a company's entire API surface to find and exploit combinations of access patterns that individual, siloed development teams would never notice. This makes it a powerful tool for discovering hidden security holes that arise from a lack of cross-team coordination.
AI tools aren't just lowering the bar for novice hackers; they are making experts more effective, enabling attacks at a greater scale across all stages of the "cyber kill chain." AI is a universal force multiplier for offense, making even powerful reverse engineers shockingly more effective.
Unlike traditional software where a bug can be patched with high certainty, fixing a vulnerability in an AI system is unreliable. The underlying problem often persists because the AI's neural network—its 'brain'—remains susceptible to being tricked in novel ways.
Current AI safety solutions primarily act as external filters, analyzing prompts and responses. This "black box" approach is ineffective against jailbreaks and adversarial attacks that manipulate the model's internal workings to generate malicious output from seemingly benign inputs, much like a building's gate security can't stop a resident from causing harm inside.
Hackers are exploiting AI models not just to write malicious code, but by circumventing safety protocols to extract sensitive or useful information embedded within the AI's training data. This represents a novel attack surface.
The old security adage was to be better than your neighbor. AI attackers, however, will be numerous and automated, meaning companies can't just be slightly more secure than peers; they need robust defenses against a swarm of simultaneous threats.
AI agents are a security nightmare due to a "lethal trifecta" of vulnerabilities: 1) access to private user data, 2) exposure to untrusted content (like emails), and 3) the ability to execute actions. This combination creates a massive attack surface for prompt injections.