We scan new podcasts and send you the top 5 insights daily.
To test Codex's capabilities, Abhi Muchhal built a web app that could ingest tax documents and output a completed 1040 form. When he compared its output to his professional accountant's work, Codex's version was more accurate, identifying a forgotten income source.
Even if you use a professional accountant, running your draft tax return through an LLM can serve as a valuable final check. The AI can identify potential errors, inconsistencies, or missed deductions that human experts might overlook, potentially leading to thousands of dollars in savings.
Michael Bolin, a tech lead on OpenAI's Codex, says models now generate 80-90% of his code. He reserves manual coding for critical, low-level tasks like security sandboxing. For most work, including debugging and refactoring, he relies on the AI agent to maximize his throughput.
Inspired by standalone sites like bankstatementconverter.com, a major opportunity in the ChatGPT store is building apps that solve highly specific, painful business problems. An app that automatically finds all K1 tax forms in a user's Gmail is a prime example of a simple tool with massive value for a specific audience.
The key value of Codex for a growth PM at OpenAI wasn't just viewing a single dashboard, but building a unified web app that pulls from multiple scattered sources (Databricks, Tableau). This combines data synthesis with a TLDR summary, overcoming cognitive overload.
Journalist Casey Newton uses AI tools not to write his columns, but to fact-check them after they're written. He finds that feeding his completed text into an LLM is a surprisingly effective way to catch factual errors, a significant improvement in model capability over the past year.
The podcast team used Claude Code to cross-check every number and chart in a 50+ page report against the source data, as well as proofread the text. This is a powerful use case for AI in tedious verification tasks where human attention wanes and errors can easily slip through.
Instead of solely focusing on AI fallibility, a major application is using AI agents to audit human work. Perplexity's "Final Pass" feature analyzes documents for factual errors and internal inconsistencies, finding glaring mistakes in things like Gartner's earnings press releases and work done by professional accountants.
OpenAI is combining Codex with ChatGPT, recognizing that the software "harness" enabling Codex's actions is more effective for all knowledge work tasks. This success stems from building the model and its action-taking software together in one team, a key lesson for developing capable AI agents.
An OpenAI team developed an internal application with one million lines of code, all generated by an AI agent. Engineers were forbidden from writing code directly, instead shifting their role to diagnosing AI failures and improving the underlying system to prevent repeat mistakes.
An ex-Google data analyst demonstrates using OpenAI's Codex to analyze a CSV file of customer data. She prompts the AI to perform a root cause and cohort analysis for a retention drop, then automatically generates a leadership presentation, condensing a multi-day task into a two-hour project.