We scan new podcasts and send you the top 5 insights daily.
A pilot AI certification program revealed that even simplified criteria were interpreted inconsistently. This proves AI systems are too dynamic for static, checklist-based certification. The solution is to empower auditors with discretion and focus heavily on their specialized training and education.
AI audits are not a one-time, "risk-free" certification but an iterative process with quarterly re-audits. They quantify risk by finding vulnerabilities (which can initially have failure rates as high as 25%) and then measuring the improvement—often a 90% drop—after safeguards are implemented, giving enterprises a data-driven basis for trust.
Beyond model capabilities and process integration, a key challenge in deploying AI is the "verification bottleneck." This new layer of work requires humans to review edge cases and ensure final accuracy, creating a need for entirely new quality assurance processes that didn't exist before.
Unlike traditional software that produces identical, auditable results, AI is non-deterministic and often can't explain its reasoning. This poses a major challenge for finance, an industry where processes must be repeatable and transparent to meet regulatory and client expectations for showing work.
Treating AI evaluation like a final exam is a mistake. For critical enterprise systems, evaluations should be embedded at every step of an agent's workflow (e.g., after planning, before action). This is akin to unit testing in classic software development and is essential for building trustworthy, production-ready agents.
Formal auditing for AI systems is nascent. Only a small fraction (<5%) of clients currently demand checks on AI accuracy. It will likely take 6-12 months for this demand to reach a critical mass that compels auditors to broadly incorporate AI-specific testing.
To accelerate enterprise AI adoption, vendors should achieve verifiable certifications like ISO 42001 (AI risk management). These standards provide a common language for procurement and security, reducing sales cycles by replacing abstract trust claims with concrete, auditable proof.
Given AI's rapid evolution, eliminating risk is unrealistic. The AI assurance ecosystem—including audits and certifications—should instead focus on a more pragmatic goal: creating shared information standards that allow organizations to effectively gauge, price, and potentially transfer AI-related risks.
For tasks where a simple right/wrong answer doesn't exist, verification is a major challenge. The solution is creating detailed rubrics with thousands of criteria, often developed with AI help. This provides a granular reward signal that allows models to climb the learning curve even in highly subjective domains.
AI's value in a compliance platform isn't in answering binary audit questions (e.g., "is X encrypted?"). Instead, it should automate the messy, non-deterministic work around them, like finding compliance obligations hidden in legal contracts, a task previously impossible to do at scale.
Faced with non-deterministic AI models, UL's approach to safety certification isn't to test the code's output. It audits the development process, focusing on over 200 criteria for how humans make decisions about data veracity, bias, transparency, and privacy.