Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A 99.3% uptime for an LLM translates to roughly 87 hours of downtime per year. This is critically insufficient for the airline industry, where systems like Amadeus must maintain 99.99% uptime (around 15 minutes of downtime annually) to avoid grounding planes and losing revenue.

Related Insights

While AI can attempt complex, hour-long tasks with 50% success, its reliability plummets for longer operations. For mission-critical enterprise use requiring 99.9% success, current AI can only reliably complete tasks taking about three seconds. This necessitates breaking large problems into many small, reliable micro-tasks.

Consumers can easily re-prompt a chatbot, but enterprises cannot afford mistakes like shutting down the wrong server. This high-stakes environment means AI agents won't be given autonomy for critical tasks until they can guarantee near-perfect precision and accuracy, creating a major barrier to adoption.

The current cost of using LLMs for inference is approximately 30 times higher than using a traditional, deterministic API for flight data. This significant cost disadvantage makes it economically unviable for AI-native challengers to replace the existing airline distribution business model.

While consumer AI tolerates some inaccuracy, enterprise systems like customer service chatbots require near-perfect reliability. Teams get frustrated because out-of-the-box RAG templates don't meet this high bar. Achieving business-acceptable accuracy requires deep, iterative engineering, not just a vanilla implementation.

AI product quality is highly dependent on infrastructure reliability, which is less stable than traditional cloud services. Jared Palmer's team at Vercel monitored key metrics like 'error-free sessions' in near real-time. This intense, data-driven approach is crucial for building a reliable agentic product, as inference providers frequently drop requests.

While businesses accept that employees make mistakes, their expectation for software is absolute reliability. This unforgiving standard creates a durable moat for enterprise platforms that provide deterministic outcomes, a key challenge for probabilistic AI models in critical workflows.

Unlike deterministic SaaS software that works consistently, AI is probabilistic and doesn't work perfectly out of the box. Achieving 'human-grade' performance (e.g., 99.9% reliability) requires continuous tuning and expert guidance, countering the hype that AI is an immediate, hands-off solution.

For critical enterprise functions like financial modeling, 99.9% accuracy from a probabilistic LLM is unacceptable. Platforms like Salesforce's Agent Force 360 solve this by layering deterministic logic and guardrails on top of the AI, ensuring compliance and preventing costly errors where even a 0.1% failure rate is too high.

While many AI agents produce impressive demos, their real-world utility hinges on reliability. Amazon's Nova Act team argues that for production use cases like UI automation, an agent that works only 60% of the time is effectively useless for business. The critical threshold for value is achieving over 90% reliability, making it the core engineering challenge.

Amadeus provides core IT systems for airlines (Air IT) that are deterministic and mission-critical. A failure means planes don't fly, making airlines extremely risk-averse to switching to new, probabilistic AI-based systems and insulating Amadeus from disruption.