Companies with valuable proprietary data should not license it away. A better strategy to guide foundation model development is to keep the data private but release public benchmarks and evaluations based on it. This incentivizes LLM providers to train their models on the specific tasks you care about, improving their performance for your product.
The industry has already exhausted the public web data used to train foundational AI models, a point underscored by the phrase "we've already run out of data." The next leap in AI capability and business value will come from harnessing the vast, proprietary data currently locked behind corporate firewalls.
Public leaderboards like LM Arena are becoming unreliable proxies for model performance. Teams implicitly or explicitly "benchmark" by optimizing for specific test sets. The superior strategy is to focus on internal, proprietary evaluation metrics and use public benchmarks only as a final, confirmatory check, not as a primary development target.
Enterprise SaaS companies (the 'henhouse') should be cautious when partnering with foundation model providers (the 'fox'). While offering powerful features, these models have a core incentive to consume proprietary data for training, potentially compromising customer trust, data privacy, and the incumbent's long-term competitive moat.
When approached by large labs for licensing deals, GI's founder advises against simply selling the data. He argues the only way to accurately value a unique dataset is to model it yourself to understand its true capabilities. Without this, founders risk massively undervaluing their core asset, as its potential is unknown.
Instead of gating its valuable review data like traditional analyst firms, G2 strategically chose to syndicate it and make it available to LLMs. This ensures G2 remains a trusted, cited source within AI-generated answers, maintaining brand influence and relevance where buyers are now making decisions.
Ali Ghodsi argues that while public LLMs are a commodity, the true value for enterprises is applying AI to their private data. This is impossible without first building a modern data foundation that allows the AI to securely and effectively access and reason on that information.
AI models are becoming commodities; the real, defensible value lies in proprietary data and user context. The correct strategy is for companies to use LLMs to enhance their existing business and data, rather than selling their valuable context to model providers for pennies on the dollar.
If a company and its competitor both ask a generic LLM for strategy, they'll get the same answer, erasing any edge. The only way to generate unique, defensible strategies is by building evolving models trained on a company's own private data.
Companies are becoming wary of feeding their unique data and customer queries into third-party LLMs like ChatGPT. The fear is that this trains a potential future competitor. The trend will shift towards running private, open-source models on their own cloud instances to maintain a competitive moat and ensure data privacy.
Instead of waiting for external reports, companies should develop their own AI model evaluations. By defining key tasks for specific roles and testing new models against them with standard prompts, businesses can create a relevant, internal benchmark.