Even for complex, multi-hour tasks requiring millions of tokens, current AI agents are at least an order of magnitude cheaper than paying a human with relevant expertise. This significant cost advantage suggests that economic viability will not be a near-term bottleneck for deploying AI on increasingly sophisticated tasks.
Early AI training involved simple preference tasks. Now, training frontier models requires PhDs and top professionals to perform complex, hours-long tasks like building entire websites or explaining nuanced cancer topics. The demand is for deep, specialized expertise, not just generalist labor.
AI's ability to generate ideas and initial drafts for a few dollars removes the high cost of entry for new projects. This "ideation" phase, once proven successful, often justifies hiring human experts for full execution, creating net-new work that was previously unaffordable.
Models like Gemini 3 Flash show a key trend: making frontier intelligence faster, cheaper, and more efficient. The trajectory is for today's state-of-the-art models to become 10x cheaper within a year, enabling widespread, low-latency, and on-device deployment.
OpenAI's new GDPVal framework evaluates AI on real-world knowledge work. It found frontier models produce work rated equal to or better than human experts nearly 50% of the time, while being 100 times faster and cheaper. This provides a direct measure of impending economic transformation.
The narrative of AI destroying jobs misses a key point: AI allows companies to 'hire software for a dollar' for tasks that were never economical to assign to humans. This will unlock new services and expand the economy, creating demand in areas that previously didn't exist.
Flexport uses AI agents for tasks that were previously skipped because they were too costly for human employees, like calling warehouses to confirm addresses. This shows that AI's value isn't just in replacing existing work, but in performing new, marginally valuable tasks at a scale that is finally economical.
Contrary to the idea that technology always gets cheaper, building on AI is less expensive now. The current phase is characterized by abundant venture capital and intense competition among AI tool providers, which subsidizes costs for developers. As the market consolidates, these costs will rise.
A major challenge for the 'time horizon' metric is its cost. As AI capabilities improve, the tasks needed to benchmark them grow from hours to weeks or months. The cost of paying human experts for these long durations to establish a baseline becomes extremely high, threatening the long-term viability of this evaluation method.
The true commercial impact of AI will likely come from small, specialized "micro models" solving boring, high-volume business tasks. While highly valuable, these models are cheap to run and cannot economically justify the current massive capital expenditure on AGI-focused data centers.
As AI systems become infinitely scalable and more capable, humans will become the weakest link in any cognitive team. The high risk of human error and incorrect conclusions means that, from a purely economic perspective, human cognitive input will eventually detract from, rather than add to, value creation.