GitHub Found Training AI on More Enterprise Code Yields Marginal Gains, Except for Legacy Languages like COBOL

Related Insights

AI Model Progress Now Hinges on Unlocking Trapped Enterprise Data

The industry has already exhausted the public web data used to train foundational AI models, a point underscored by the phrase "we've already run out of data." The next leap in AI capability and business value will come from harnessing the vast, proprietary data currently locked behind corporate firewalls.

AI Exchanges: The Role of Data

Exchanges·5 months ago

LinkedIn CPO Warns Off-the-Shelf AI Development Tools "Never Work" at Scale

Despite the hype, LinkedIn found that third-party AI tools for coding and design don't work out-of-the-box on their complex, legacy stack. Success requires deep customization, re-architecting internal platforms for AI reasoning, and working in "alpha mode" with vendors to adapt their tools.

Why LinkedIn is turning PMs into AI-powered "full stack builders” | Tomer Cohen (LinkedIn CPO)

Lenny's Podcast: Product | Career | Growth·3 months ago

LLMs Fail at Low-Level GPU Programming Due to Scarce Data and Debugging Complexity

AI coding assistants struggle with deep kernel work (CUDA, PTX) because there's little public code to learn from. Furthermore, debugging AI-generated parallel code is extremely difficult because the developer lacks the original mental model, making it less efficient than writing it themselves.

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony

Latent Space: The AI Engineer Podcast·4 months ago

Agentic AI's Most Valuable Use is Untangling Legacy Systems, Not Just Creating New Ones

Enterprises are trapped by decades of undocumented code. Rather than ripping and replacing, agentic AI can analyze and understand these complex systems. This enables redesign from the inside out and modernizes the core of the business, bridging the gap between business and IT.

#763: Pega CTO Don Schuerman on how AI can pay down tech debt and accelerate digital transformation

The Agile Brand with Greg Kihlström®: Expert Mode Marketing Technology, AI, & CX·3 months ago

GitHub's Copilot Felt Revolutionary Even When It Failed 80% of the Time

The initial magic of GitHub's Copilot wasn't its accuracy but its profound understanding of natural language. Early versions had a code completion acceptance rate of only 20%, yet the moments it correctly interpreted human intent were so powerful they signaled a fundamental technology shift.

Building AI-Powered Products at Scale with Mario Rodriguez, CPO of GitHub

Product Chats Podcast·4 months ago

LLM Improvements Offer Diminishing Returns For Consumer Apps But Not Enterprise

For consumer products like ChatGPT, models are already good enough for common queries. However, for complex enterprise tasks like coding, performance is far from solved. This gives model providers a durable path to sustained revenue growth through continued quality improvements aimed at professionals.

Anthropic, Glean & OpenRouter: How AI Moats Are Built with Deedy Das of Menlo Ventures

Latent Space: The AI Engineer Podcast·3 months ago

Enterprise Domain Adaptation Requires a Minimum of 10 Billion Tokens After Curation

Customizing a base model with proprietary data is only effective if a company possesses a massive corpus. At least 10 billion high-quality tokens are needed *after* aggressive deduplication and filtering. This high threshold means the strategy is only viable for the largest corporations, a much higher bar than most businesses realize.

Sovereign AI in Poland: Language Adaptation, Local Control & Cost Advantages with Marek Kozlowski

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 months ago

Legacy Code Migration Is AI Coding's Biggest Enterprise ROI Driver Today

Enterprises are finding immediate, high return on investment by using AI to port legacy codebases (like COBOL) to modern languages. This mundane task offers a 2x speed-up over traditional methods, unlocking significant infrastructure savings and even driving new developer hiring.

The $3 Trillion AI Coding Opportunity

a16z Show·2 months ago

AI-Generated Code Creates a Hidden "Rework Tax" Inflating Productivity Metrics

While AI coding assistants appear to boost output, they introduce a "rework tax." A Stanford study found AI-generated code leads to significant downstream refactoring. A team might ship 40% more code, but if half of that increase is just fixing last week's AI-generated "slop," the real productivity gain is much lower than headlines suggest.

From Chaos to Code: HumanLayer’s Playbook for Agent-Driven Dev

The Lobster Talks Podcast by Lobster Capital·5 months ago

Reflection AI CEO: Enterprises Adopt Open Source AI to Cut Costs or Boost Niche Performance

Misha Laskin, CEO of Reflection AI, states that large enterprises turn to open source models for two key reasons: to dramatically reduce the cost of high-volume tasks, or to fine-tune performance on niche data where closed models are weak.

Sam Altman LIVE on Sora, Hollywood, & the Future of Ads | Bill Peebles, Dylan Patel, Elad Gil, Robby Stein, Morgan Housel, Misha Laskin

TBPN·4 months ago