Long before the AI boom, Novonesis began creating structured data repositories in the 2000s to manage high-throughput screening data. This decades-long data discipline is now a massive competitive advantage, providing the clean foundation necessary for effective machine learning and digital twins.

Related Insights

The power of AI for Novonesis isn't the algorithm itself, but its application to a massive, well-structured proprietary dataset. Their organized library of 100,000 strains allows AI to rapidly predict protein shapes and accelerate R&D in ways competitors cannot match.

Public internet data has been largely exhausted for training AI models. The real competitive advantage and source for next-generation, specialized AI will be the vast, untapped reservoirs of proprietary data locked inside corporations, like R&D data from pharmaceutical or semiconductor companies.

The effectiveness of AI and machine learning models for predicting patient behavior hinges entirely on the quality of the underlying real-world data. Walgreens emphasizes its investment in data synthesis and validation as the non-negotiable prerequisite for generating actionable insights.

Instead of building AI models, a company can create immense value by being 'AI adjacent'. The strategy is to focus on enabling good AI by solving the foundational 'garbage in, garbage out' problem. Providing high-quality, complete, and well-understood data is a critical and defensible niche in the AI value chain.

Since LLMs are commodities, sustainable competitive advantage in AI comes from leveraging proprietary data and unique business processes that competitors cannot replicate. Companies must focus on building AI that understands their specific "secret sauce."

The winning strategy in the AI data market has evolved beyond simply finding smart people. Leading companies differentiate with research teams that anticipate the future data requirements of models, innovating on data types for reasoning and STEM before being asked.

The vague concept of a 'data network effect' is now a real defensibility strategy in AI. The key is having a *live*, constantly updating proprietary dataset (e.g., real-time health data). This allows a commodity model to deliver superior results compared to a state-of-the-art model without access to that live data.

A new 'Tech Bio' model inverts traditional biotech by first building a novel, highly structured database designed for AI analysis. Only after this computational foundation is built do they use it to identify therapeutic targets, creating a data-first moat before any lab work begins.

As AI's bottleneck shifts from compute to data, the key advantage becomes low-cost data collection. Industrial incumbents have a built-in moat by sourcing messy, multimodal data from existing operations—a feat startups cannot replicate without paying a steep marginal cost for each data point.

The ultimate value of AI will be its ability to act as a long-term corporate memory. By feeding it historical data—ICPs, past experiments, key decisions, and customer feedback—companies can create a queryable "brain" that dramatically accelerates onboarding and institutional knowledge transfer.

Novonesis's AI Edge Comes from a 20-Year Focus on Building Structured Data | RiffOn