Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A practical framework for AI delegation suggests models are ~30% effective on a new task in one generation, ~70-80% in the next, and ~90%+ two generations later. Meaningful delegation with human verification should only begin once a model crosses the 70% capability threshold for that specific task.

Related Insights

Use a two-axis framework to determine if a human-in-the-loop is needed. If the AI is highly competent and the task is low-stakes (e.g., internal competitor tracking), full autonomy is fine. For high-stakes tasks (e.g., customer emails), human review is essential, even if the AI is good.

Engineers should define an "agent line": the threshold of tasks an AI agent can handle. By continuously re-evaluating what fits "below the agent line" and delegating it, senior engineers can free up significant time for more strategic, high-level work and creative problem-solving.

Contrary to the belief that humans should always be 'in the loop,' strategic disengagement is key. By handing off well-defined 'middle' tasks entirely to AI, humans can conserve cognitive energy for high-leverage activities like initial problem-framing and final quality assurance, where their input is most valuable.

Your mental model for AI must evolve from "chatbot" to "agent manager." Systematically test specialized agents against base LLMs on standardized tasks to learn what can be reliably delegated versus what requires oversight. This is a critical skill for managing future workflows.

Don't wait for AI to be perfect. The correct strategy is to apply current AI models—which are roughly 60-80% accurate—to business processes where that level of performance is sufficient for a human to then review and bring to 100%. Chasing perfection in-house is a waste of resources given the pace of model improvement.

Leading LLMs can now replicate a two-hour human software engineering task with 50% accuracy. This capability is doubling every seven months, signaling an urgent need for organizations to adapt their data infrastructure, security, and governance to leverage this exponential growth.

Avoid deploying AI directly into a fully autonomous role for critical applications. Instead, begin with a human-in-the-loop, advisory function. Only after the system has proven its reliability in a real-world environment should its autonomy be gradually increased, moving from supervised to unsupervised operation.

A simple framework for AI adoption: If you enjoy a task and are good at it, do it yourself. If you enjoy it but are unskilled, use AI as a coach. If you dislike it but are good, let AI draft and you review. If you dislike it and are unskilled, let AI draft but have a human expert review.

To determine the boundary between human and AI tasks, ask: "Would I feel comfortable telling my CEO or a customer that an AI made this decision?" If the answer is no, the task involves too much context, consequence, or trust to be fully delegated and should remain under human control.

To stay on the cutting edge, maintain a list of complex tasks that current AI models can't perform well. Whenever a new model is released, run it against this suite. This practice provides an intuitive feel for the model's leap in capability and helps you identify when a previously impossible workflow becomes feasible.