Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A new feature from Codex allows AI to learn and automate tasks by observing a user's on-screen actions. This is a breakthrough for enterprises, enabling automation of workflows involving old, legacy software that lacks modern APIs—a common and significant barrier to AI integration.

Related Insights

Originally a code-writing assistant, OpenAI's Codex is being merged into ChatGPT and expanded into a versatile work agent. Through new plugins for tools like Salesforce and Figma, it can now automate complex tasks in data analysis, sales preparation, and marketing asset creation, not just programming.

The next major leap for AI agents isn't just better models, but deeply integrated, stateful browsers like OpenAI's Atlas within Codex. When an AI can operate within a browser that remembers logins and context, it removes a major barrier to automating almost any web-based task.

Counterintuitively, the path to full automation isn't just analyzing conversation transcripts. Cresta's CEO found that you must first observe and instrument what human agents are doing on their desktops—navigating legacy systems and UIs—to truly understand and automate the complete workflow.

Claude's ability to control the user's screen, mouse, and keyboard is a breakthrough for enterprises. It allows the AI to operate legacy or custom-built applications that lack modern APIs. This circumvents a major roadblock to AI adoption, breathing new life into older, business-critical software systems.

Scientists won't adopt automation if they have to code or use clunky visual programmers. The breakthrough is using AI models to translate natural language protocols into robot commands. This removes the primary usability barrier and prevents common user errors, enabling adoption.

A new wave of AI automation is being driven by non-technical staff using agent-based platforms. These knowledge workers are building custom AI solutions for complex business processes, bypassing the need for new software purchases or dedicated engineering resources.

To begin automating work with AI, record yourself performing a task on video (e.g., using Loom) while narrating the process. An AI can then analyze the transcript to identify the repeatable steps and logic, which forms the basis for building a custom, automated "skill" that mirrors your workflow.

Instead of pre-designing a complex AI system, first achieve your desired output through a manual, iterative conversation. Then, instruct the AI to review the entire session and convert that successful workflow into a reusable "skill." This reverse-engineers a perfect system from a proven process.

Features like Codex's Chronicle, which passively watches a user's screen, represent the next frontier in AI productivity. The agent gains context without explicit instruction, reducing repetitive explanations and forcing users to trade privacy for significant gains in workflow efficiency.

Future AI models will learn complex, multi-step tasks by watching screen recordings. Companies should begin capturing video of their key internal workflows now. This data, which is currently discarded, will become a valuable proprietary asset for training AI agents to automate bespoke business processes.