We scan new podcasts and send you the top 5 insights daily.
Judgment Labs CEO Alex Shan argues that AI agents will first dominate domains with easily verifiable results, like coding, where a solution's correctness can be quickly checked. Progress will be slower in non-verifiable fields like law or complex drug discovery, where feedback loops are long and ambiguous.
Andrej Karpathy's 'Software 2.0' framework posits that AI automates tasks that are easily *verifiable*. This explains the 'jagged frontier' of AI progress: fields like math and code, where correctness is verifiable, advance rapidly. In contrast, creative and strategic tasks, where success is subjective and hard to verify, lag significantly behind.
Agentic AI is most advanced in software engineering because code provides a constrained, text-based, and verifiable environment. AI agents can now operate for hours, understanding codebases and fixing errors. This iterative reasoning process is a direct preview of how AI will eventually perform long-running, complex investment research tasks.
Software engineering is a prime target for AI because code provides instant feedback (it works or it doesn't). In contrast, fields like medicine have slow, expensive feedback loops (e.g., clinical trials), which throttles the pace of AI-driven iteration and adoption. This heuristic predicts where AI will make the fastest inroads.
Unlike coding, where context is centralized (IDE, repo) and output is testable, general knowledge work is scattered across apps. AI struggles to synthesize this fragmented context, and it's hard to objectively verify the quality of its output (e.g., a strategy memo), limiting agent effectiveness.
While AI has mastered verifiable tasks with clear right answers, its future growth depends on human experts training models in subjective fields where 'good' is not easily defined. Companies are now sourcing professionals to act as 'verifiers' that teach AI nuanced, domain-specific judgment.
AI excels at solving problems with clear, verifiable answers, like advanced math, allowing for effective training. It struggles with complex societal issues like unemployment because there is no single, universally agreed-upon "correct" solution to train against, making it difficult to evaluate the AI's path.
Demis Hassabis identifies a key obstacle for AGI. Unlike in math or games where answers can be verified, the messy real world lacks clear success metrics. This makes it difficult for AI systems to use self-improvement loops, limiting their ability to learn and adapt outside of highly structured domains.
AI can generate vast amounts of content, but its value is limited by our ability to verify its accuracy. This is fast for visual outputs (images, UI) where our eyes instantly spot flaws, but slow and difficult for abstract domains like back-end code, math, or financial data, which require deep expertise to validate.
Agentic loops are not a universal solution. They are most effective in domains where success can be measured by a clear, objective score and where failed experiments are cheap and quick. This framework helps identify the best business processes to automate, starting with areas like code generation or ad testing, not subjective, slow-moving tasks like political negotiation.
The tech industry mistakenly assumes AI's rapid success in coding will replicate across all knowledge work. Coding is an ideal use case: text-based, easily verifiable, and used by technical experts. Other fields lack this perfect setup, meaning widespread AI agent adoption will be much slower.