While a general-purpose model like Llama can serve many businesses, their safety policies are unique. A company might want to block mentions of competitors or enforce industry-specific compliance—use cases model creators cannot pre-program. This highlights the need for a customizable safety layer separate from the base model.
Recognizing there is no single "best" LLM, AlphaSense built a system to test and deploy various models for different tasks. This allows them to optimize for performance and even stylistic preferences, using different models for their buy-side finance clients versus their corporate users.
The current industry approach to AI safety, which focuses on censoring a model's "latent space," is flawed and ineffective. True safety work should reorient around preventing real-world, "meatspace" harm (e.g., data breaches). Security vulnerabilities should be fixed at the system level, not by trying to "lobotomize" the model itself.
For specialized, high-stakes tasks like insurance underwriting, enterprises will favor smaller, on-prem models fine-tuned on proprietary data. These models can be faster, more accurate, and more secure than general-purpose frontier models, creating a lasting market for custom AI solutions.
Universal safety filters for "bad content" are insufficient. True AI safety requires defining permissible and non-permissible behaviors specific to the application's unique context, such as a banking use case versus a customer service setting. This moves beyond generic harm categories to business-specific rules.
A critical hurdle for enterprise AI is managing context and permissions. Just as people silo work friends from personal friends, AI systems must prevent sensitive information from one context (e.g., CEO chats) from leaking into another (e.g., company-wide queries). This complex data siloing is a core, unsolved product problem.
Off-the-shelf AI models can only go so far. The true bottleneck for enterprise adoption is "digitizing judgment"—capturing the unique, context-specific expertise of employees within that company. A document's meaning can change entirely from one company to another, requiring internal labeling.
The "agentic revolution" will be powered by small, specialized models. Businesses and public sector agencies don't need a cloud-based AI that can do 1,000 tasks; they need an on-premise model fine-tuned for 10-20 specific use cases, driven by cost, privacy, and control requirements.
Using a large language model to police another is computationally expensive, sometimes doubling inference costs and latency. Ali Khatri of Rinks calls this like "paying someone $1,000 to guard a $100 bill." This poor economic model, especially for video and audio, leads many companies to forgo robust safety measures, leaving them vulnerable.
For enterprises, scaling AI content without built-in governance is reckless. Rather than manual policing, guardrails like brand rules, compliance checks, and audit trails must be integrated from the start. The principle is "AI drafts, people approve," ensuring speed without sacrificing safety.
Companies are becoming wary of feeding their unique data and customer queries into third-party LLMs like ChatGPT. The fear is that this trains a potential future competitor. The trend will shift towards running private, open-source models on their own cloud instances to maintain a competitive moat and ensure data privacy.