We scan new podcasts and send you the top 5 insights daily.
Instead of solely focusing on AI fallibility, a major application is using AI agents to audit human work. Perplexity's "Final Pass" feature analyzes documents for factual errors and internal inconsistencies, finding glaring mistakes in things like Gartner's earnings press releases and work done by professional accountants.
A powerful, practical use of AI in investment research is to verify management's track record. By feeding all historical earnings call transcripts into a large language model, an analyst can quickly ask whether management's past promises and guidance materialized, automating a crucial but time-consuming due diligence step.
A fundamental divide exists between consumer and enterprise AI. While consumer products often reward novelty and creativity, enterprise applications are worthless without correctness. This requires building systems grounded in truth that can extract what is verifiably correct from complex organizations.
Journalist Casey Newton uses AI tools not to write his columns, but to fact-check them after they're written. He finds that feeding his completed text into an LLM is a surprisingly effective way to catch factual errors, a significant improvement in model capability over the past year.
After an initial analysis, use a "stress-testing" prompt that forces the LLM to verify its own findings, check for contradictions, and correct its mistakes. This verification step is crucial for building confidence in the AI's output and creating bulletproof insights.
A powerful and simple method to ensure the accuracy of AI outputs, such as market research citations, is to prompt the AI to review and validate its own work. The AI will often identify its own hallucinations or errors, providing a crucial layer of quality control before data is used for decision-making.
AI models have an emergent "human laziness factor," often doing the minimum work necessary to provide an answer. To ensure correctness, Genesis builds harnesses that force agents to provide proof for their work, then uses a second AI to review and validate those outputs, preventing corner-cutting.
Unlike consumer chatbots, AlphaSense's AI is designed for verification in high-stakes environments. The UI makes it easy to see the source documents for every claim in a generated summary. This focus on traceable citations is crucial for building the user confidence required for multi-billion dollar decisions.
AI can generate vast amounts of content, but its value is limited by our ability to verify its accuracy. This is fast for visual outputs (images, UI) where our eyes instantly spot flaws, but slow and difficult for abstract domains like back-end code, math, or financial data, which require deep expertise to validate.
AI excels at generating code, making that task a commodity. The new high-value work for engineers is "verification”—ensuring the AI's output is not just bug-free, but also valuable to customers, aligned with business goals, and strategically sound.
Flexport implemented an AI agent to audit 100% of customs entries, a task previously sampled by humans. This slashed their error rate from an industry-leading 1.8% to just 0.2%. The insight is that AI’s primary value can be achieving a superhuman level of quality and comprehensiveness, far beyond simple cost-cutting or efficiency gains.