Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

Specialized AI for legacy industries must decode highly contextual, non-standardized data, such as handwritten field notes that use folk units of measurement like the time it takes to smoke cigarettes. This illustrates the deep domain expertise required for effective data curation.

Related Insights

LLMs have hit a wall by scraping nearly all available public data. The next phase of AI development and competitive differentiation will come from training models on high-quality, proprietary data generated by human experts. This creates a booming "data as a service" industry for companies like Micro One that recruit and manage these experts.

Data is only truly "AI-ready" when it is not just technically accurate but also compliant with business context hidden in unstructured documents like policies. This involves vectorizing business logic and verifying it against facts in data warehouses.

Generic tech companies can't easily dominate industrial AI. Training models requires proprietary operational data that isn't public, creating "data friction." Furthermore, solving problems in a refinery versus a hospital requires deep, sector-specific domain knowledge, preventing a one-size-fits-all approach.

AI can easily write code for system integrations, but the primary bottleneck isn't coding—it's context. The real work involves tracking down employees to understand what ambiguous, legacy data fields actually mean, a fundamentally human task of institutional knowledge discovery.

Off-the-shelf AI models can only go so far. The true bottleneck for enterprise adoption is "digitizing judgment"—capturing the unique, context-specific expertise of employees within that company. A document's meaning can change entirely from one company to another, requiring internal labeling.

When building data platforms for industries with legacy hardware like automotive, the real work is data normalization. Different product lines use inconsistent signal names and units (e.g., speed as MPH vs. radians/sec), requiring a complex 'decoder' layer to create usable, standardized data.

A massive opportunity for AI lies in unearthing and recording experts' tacit, unwritten knowledge—the "knack" for doing things that is lost when they die. This "dark data," once fed into models, will unlock immense, currently inaccessible value.

As AI capabilities become commoditized, the key to superior output is the user's domain expertise. An expert with precise vocabulary can guide an AI to produce better results in one attempt than a novice can in many, because they can articulate the desired outcome more effectively.

AI tools like LLMs thrive on large, structured datasets. In manufacturing, critical information is often unstructured 'tribal knowledge' in workers' heads. Dirac’s strategy is to first build a software layer that captures and organizes this human expertise, creating the necessary context for AI to then analyze and add value.

Before complex modeling, the main challenge for AI in biomanufacturing is dealing with unstructured data like batch records, investigation reports, and operator notes. The initial critical task for AI is to read, summarize, and connect these sources to identify patterns and root causes, transforming raw information into actionable intelligence.