The popular theory that the market for raw data would explode has not proven correct. The number of companies buying data has not grown significantly, and in some sectors like hedge funds, it has even shrunk. The boom in data-oriented roles has not translated to a boom in data purchasing.

Related Insights

The industry has already exhausted the public web data used to train foundational AI models, a point underscored by the phrase "we've already run out of data." The next leap in AI capability and business value will come from harnessing the vast, proprietary data currently locked behind corporate firewalls.

Hedge funds have a constant, daily need to make informed buy, sell, or hold decisions, creating a clear business problem that data solves. Corporations often lack this frequent, high-stakes decision-making cycle, making the value proposition of external data less immediate and harder to justify.

Public internet data has been largely exhausted for training AI models. The real competitive advantage and source for next-generation, specialized AI will be the vast, untapped reservoirs of proprietary data locked inside corporations, like R&D data from pharmaceutical or semiconductor companies.

Unlike traditional B2B markets where only ~5% of customers are buying at any time, the AI boom has pushed nearly 100% of companies to seek solutions at once. This temporary gold rush warps perception of market size, creating a risk of over-investment similar to the COVID-era software bubble.

Despite a long-standing data-science-driven investment thesis, Foresight Capital's founder Jim Tananbaum states that AI tools have not yet objectively led to increased investment returns. The technology is still maturing, highlighting a reality gap between the hype around AI in VC and its current practical impact.

Many leaders focus on data for backward-looking reporting, treating it like infrastructure. The real value comes from using data strategically for prediction and prescription. This requires foundational investment in technology, architecture, and machine learning capabilities to forecast what will happen and what actions to take.

For years, access to compute was the primary bottleneck in AI development. Now, as public web data is largely exhausted, the limiting factor is access to high-quality, proprietary data from enterprises and human experts. This shifts the focus from building massive infrastructure to forming data partnerships and expertise.

While data labeling companies show massive revenue growth, their customer base is often limited to a few frontier AI labs. This creates a lopsided market where providers have little leverage, compete on price, and are heavily dependent on a handful of clients, making the ecosystem potentially unstable.

While AI investment has exploded, US productivity has barely risen. Valuations are priced as if a societal transformation is complete, yet 95% of GenAI pilots fail to positively impact company P&Ls. This gap between market expectation and real-world economic benefit creates systemic risk.

The boom in tools for data teams faded because the Total Addressable Market (TAM) was overestimated. Investors and founders pattern-matched the data space to larger markets like cloud and dev tools, but the actual number of teams with the budget and need for sophisticated data tooling proved to be much smaller.