Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

For an LLM's output to be useful in a software system, it cannot be treated as ambiguous text. It must be forced through a "hard boundary"—a strict schema or contract—that constrains, validates, and types the data, making it observable and safe for downstream services to trust and consume.

Related Insights

Don't give LLMs full control. Use deterministic code for core logic, validation, and enforcing rules. Delegate only tasks requiring flexibility or understanding of unstructured input to the LLM, treating it as a specialized component, not the entire system.

You can't just deploy a probabilistic model like an LLM in a high-stakes field like healthcare. The key is to build a deterministic infrastructure (e.g., a rules engine with clinical guidelines) that governs the AI's operation, ensuring it operates safely within predefined constraints.

Don't let LLMs make raw HTTP calls. Instead, provide a code execution tool with a statically typed SDK. This environment can run a type-checker, instantly catching errors when the model hallucinates a non-existent endpoint or parameter, then provide helpful, in-context documentation to correct its mistake.

To ensure reliability in healthcare, ZocDoc doesn't give LLMs free rein. It wraps them in a hybrid system where traditional, deterministic code orchestrates the AI's tasks, sets firm boundaries, and knows when to hand off to a human, preventing the 'praying for the best' approach common with direct LLM use.

To solve for AI hallucinations in high-stakes decisions, advanced platforms use the LLM as an interpreter that writes code to query raw data. If data is unavailable, it returns an error instead of fabricating an answer, making every analysis fully auditable and grounded in verifiable data.

Unlike LLMs, which can hallucinate and behave unpredictably in novel situations, EBMs have an architecture designed to be constrained. A human can define a set of rules or constraints, and the EBM is forced to follow them, making it a more reliable choice for mission-critical systems like autonomous vehicles or financial trading.

For critical enterprise functions like financial modeling, 99.9% accuracy from a probabilistic LLM is unacceptable. Platforms like Salesforce's Agent Force 360 solve this by layering deterministic logic and guardrails on top of the AI, ensuring compliance and preventing costly errors where even a 0.1% failure rate is too high.

Relying solely on natural language prompts like 'always do this' is unreliable for enterprise AI. LLMs struggle with deterministic logic. Salesforce developed 'AgentForce Script,' a dedicated language to enforce rules and ensure consistent, repeatable performance for critical business workflows, blending it with LLM reasoning.

To deploy LLMs in high-stakes environments like finance, combine them with deterministic checks. For example, use a traditional algorithm to calculate cash flow and only surface the LLM's answer if it falls within an acceptable range. This prevents hallucinations and ensures reliability.

The industry's critical need is for engineers who can build the entire support system for an LLM: contracts, validation, observability, cost controls, and failure handling. This "AI systems" skill set is more valuable than simply being able to craft a clever prompt for a single input.

LLM Outputs Require a Hard Contract Before Integration into Deterministic Systems | RiffOn