Security expert Alex Komorowski argues that current AI systems are fundamentally insecure. The lack of a large-scale breach is a temporary illusion created by the early stage of AI integration into critical systems, not a testament to the effectiveness of current defenses.
Unlike traditional software where a bug can be patched with high certainty, fixing a vulnerability in an AI system is unreliable. The underlying problem often persists because the AI's neural network—its 'brain'—remains susceptible to being tricked in novel ways.
The AI security market is ripe for a correction as enterprises realize current guardrail products don't work and that free, open-source alternatives are often superior. Companies acquired for high valuations based on selling these flawed solutions may struggle as revenue fails to materialize.
Claiming a "99% success rate" for an AI guardrail is misleading. The number of potential attacks (i.e., prompts) is nearly infinite. For GPT-5, it's 'one followed by a million zeros.' Blocking 99% of a tested subset still leaves a virtually infinite number of effective attacks undiscovered.
Instead of relying on flawed AI guardrails, focus on traditional security practices. This includes strict permissioning (ensuring an AI agent can't do more than necessary) and containerizing processes (like running AI-generated code in a sandbox) to limit potential damage from a compromised AI.
Jailbreaking is a direct attack where a user tricks a base AI model. Prompt injection is more nuanced; it's an attack on an AI-powered *application*, where a malicious user gets the AI to ignore the developer's original system prompt and follow new, harmful instructions instead.
Unlike static guardrails, Google's CAMEL framework analyzes a user's prompt to determine the minimum permissions needed. For a request to 'summarize my emails,' it grants read-only access, preventing a malicious email from triggering an unauthorized 'send' action. It's a more robust, context-aware security model.
The world's top AI researchers at labs like OpenAI, Google, and Anthropic have not solved adversarial robustness. It is therefore highly unlikely that third-party B2B security vendors, who typically lack the same level of deep research capability, possess a genuine solution.
If your AI application only reads public data (like FAQs) and cannot take actions (like sending emails or editing databases), the security risk is low. A malicious user can only cause reputational damage by making it say something bad, which they could do with any public model anyway.
