Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Building reliable AI agents for finance, where accuracy is critical, requires moving beyond pure LLMs. Xero uses a hybrid system combining LLM-driven workflows with programmatic code and deep domain knowledge to ensure control and reliability that LLMs inherently lack.

Related Insights

To avoid AI hallucinations, Square's AI tools translate merchant queries into deterministic actions. For example, a query about sales on rainy days prompts the AI to write and execute real SQL code against a data warehouse, ensuring grounded, accurate results.

Fully autonomous agents are not yet reliable for complex production use cases because accuracy collapses when chaining multiple probabilistic steps. Zapier's CEO recommends a hybrid "agentic workflow" approach: embed a single, decisive agent within an otherwise deterministic, structured workflow to ensure reliability while still leveraging LLM intelligence.

Purely agentic systems can be unpredictable. A hybrid approach, like OpenAI's Deep Research forcing a clarifying question, inserts a deterministic workflow step (a "speed bump") before unleashing the agent. This mitigates risk, reduces errors, and ensures alignment before costly computation.

To ensure reliability in healthcare, ZocDoc doesn't give LLMs free rein. It wraps them in a hybrid system where traditional, deterministic code orchestrates the AI's tasks, sets firm boundaries, and knows when to hand off to a human, preventing the 'praying for the best' approach common with direct LLM use.

The true building block of an AI feature is the "agent"—a combination of the model, system prompts, tool descriptions, and feedback loops. Swapping an LLM is not a simple drop-in replacement; it breaks the agent's behavior and requires re-engineering the entire system around it.

Relying solely on natural language prompts like 'always do this' is unreliable for enterprise AI. LLMs struggle with deterministic logic. Salesforce developed 'AgentForce Script,' a dedicated language to enforce rules and ensure consistent, repeatable performance for critical business workflows, blending it with LLM reasoning.

The most effective AI architecture for complex tasks involves a division of labor. An LLM handles high-level strategic reasoning and goal setting, providing its intent in natural language. Specialized, efficient algorithms then translate that strategic intent into concrete, tactical actions.

Companies like Ramp are developing financial AI agents using a tiered autonomy model akin to self-driving cars (L1-L5). By implementing robust guardrails and payment controls first, they can gradually increase an agent's decision-making power. This allows a progression from simple, supervised tasks to fully unsupervised financial operations, mirroring the evolution from highway assist to full self-driving.

Fully autonomous AI agents are not yet viable in enterprises. Alloy Automation builds "semi-deterministic" agents that combine AI's reasoning with deterministic workflows, escalating to a human when confidence is low to ensure safety and compliance.

Salesforce's Chief AI Scientist explains that a true enterprise agent comprises four key parts: Memory (RAG), a Brain (reasoning engine), Actuators (API calls), and an Interface. A simple LLM is insufficient for enterprise tasks; the surrounding infrastructure provides the real functionality.

High-Stakes Financial AI Agents Require Hybrid Systems, Not Just LLMs | RiffOn