We scan new podcasts and send you the top 5 insights daily.
For critical enterprise functions like financial modeling, 99.9% accuracy from a probabilistic LLM is unacceptable. Platforms like Salesforce's Agent Force 360 solve this by layering deterministic logic and guardrails on top of the AI, ensuring compliance and preventing costly errors where even a 0.1% failure rate is too high.
Contrary to the vision of free-wheeling autonomous agents, most business automation relies on strict Standard Operating Procedures (SOPs). Products like OpenAI's Agent Builder succeed by providing deterministic, node-based workflows that enforce business logic, which is more valuable than pure autonomy.
Consumers can easily re-prompt a chatbot, but enterprises cannot afford mistakes like shutting down the wrong server. This high-stakes environment means AI agents won't be given autonomy for critical tasks until they can guarantee near-perfect precision and accuracy, creating a major barrier to adoption.
AI will not replace enterprise software because AI models are non-deterministic (probabilistic), while enterprise systems require deterministic (100% reliable) execution for critical functions. Enterprise software will act as the execution layer that harnesses AI's "thinking" capabilities within safe, predictable workflows.
Unlike deterministic SaaS software that works consistently, AI is probabilistic and doesn't work perfectly out of the box. Achieving 'human-grade' performance (e.g., 99.9% reliability) requires continuous tuning and expert guidance, countering the hype that AI is an immediate, hands-off solution.
Building reliable AI agents for finance, where accuracy is critical, requires moving beyond pure LLMs. Xero uses a hybrid system combining LLM-driven workflows with programmatic code and deep domain knowledge to ensure control and reliability that LLMs inherently lack.
Relying solely on natural language prompts like 'always do this' is unreliable for enterprise AI. LLMs struggle with deterministic logic. Salesforce developed 'AgentForce Script,' a dedicated language to enforce rules and ensure consistent, repeatable performance for critical business workflows, blending it with LLM reasoning.
Salesforce is reintroducing deterministic automation because its generative AI agents struggle with reliability, dropping instructions when given more than eight commands. This pullback signals current LLMs are not ready for high-stakes, consistent enterprise workflows.
To deploy LLMs in high-stakes environments like finance, combine them with deterministic checks. For example, use a traditional algorithm to calculate cash flow and only surface the LLM's answer if it falls within an acceptable range. This prevents hallucinations and ensures reliability.
Fully autonomous AI agents are not yet viable in enterprises. Alloy Automation builds "semi-deterministic" agents that combine AI's reasoning with deterministic workflows, escalating to a human when confidence is low to ensure safety and compliance.
Companies struggle with AI adoption not because of technology, but because of a lack of trust in probabilistic systems. Platforms like Jetstream are emerging to solve this by creating "AI blueprints"—an operational contract that defines what an AI workflow is supposed to do and flags any deviation, providing necessary control and observability.