Hunt reveals their initial, hand-built models were like a small net that missed most signals. The probabilistic approach of modern LLMs allowed them to build a vastly more effective system, exceeding their 5-6x improvement estimate by orders of magnitude.
Recognizing there is no single "best" LLM, AlphaSense built a system to test and deploy various models for different tasks. This allows them to optimize for performance and even stylistic preferences, using different models for their buy-side finance clients versus their corporate users.
The current limitation of LLMs is their stateless nature; they reset with each new chat. The next major advancement will be models that can learn from interactions and accumulate skills over time, evolving from a static tool into a continuously improving digital colleague.
Before generative AI, AlphaSense built its sentiment analysis model by employing a large team for years to manually tag financial statements. This highly specialized, narrow AI still surpasses the performance of today's more generalized Large Language Models for that specific task, proving the enduring value of focused training data.
The vast majority of Intercom Fin's resolution rate increase came from optimizing retrieval, re-ranking, and prompting. GPT-4 was already intelligent enough for the task; the real gains were unlocked by improving the surrounding architecture, not waiting for better foundation models.
An LLM's core function is predicting the next word. Therefore, when it encounters information that defies its prediction, it flags it as surprising. This mechanism gives it an innate ability to identify "interesting" or novel concepts within a body of text.
Google's Titans architecture for LLMs mimics human memory by applying Claude Shannon's information theory. It scans vast data streams and identifies "surprise"—statistically unexpected or rare information relative to its training data. This novel data is then prioritized for long-term memory, preventing clutter from irrelevant information.
The binary distinction between "reasoning" and "non-reasoning" models is becoming obsolete. The more critical metric is now "token efficiency"—a model's ability to use more tokens only when a task's difficulty requires it. This dynamic token usage is a key differentiator for cost and performance.
AI development has evolved to where models can be directed using human-like language. Instead of complex prompt engineering or fine-tuning, developers can provide instructions, documentation, and context in plain English to guide the AI's behavior, democratizing access to sophisticated outcomes.
IBM's CEO explains that previous deep learning models were "bespoke and fragile," requiring massive, costly human labeling for single tasks. LLMs are an industrial-scale unlock because they eliminate this labeling step, making them vastly faster and cheaper to tune and deploy across many tasks.
Instead of running hundreds of brute-force experiments, machine learning models analyze historical data to predict which parameter combinations will succeed. This allows teams to focus on a few dozen targeted experiments to achieve the same process confidence, compressing months of work into weeks.