The cost to achieve a specific performance benchmark dropped from $60 per million tokens with GPT-3 in 2021 to just $0.06 with Llama 3.2-3b in 2024. This dramatic cost reduction makes sophisticated AI economically viable for a wider range of enterprise applications, shifting the focus to on-premise solutions.
The cost for a given level of AI performance halves every 3.5 months—a rate 10 times faster than Moore's Law. This exponential improvement means entrepreneurs should pursue ideas that seem financially or computationally unfeasible today, as they will likely become practical within 12-24 months.
Models like Gemini 3 Flash show a key trend: making frontier intelligence faster, cheaper, and more efficient. The trajectory is for today's state-of-the-art models to become 10x cheaper within a year, enabling widespread, low-latency, and on-device deployment.
Instead of relying solely on massive, expensive, general-purpose LLMs, the trend is toward creating smaller, focused models trained on specific business data. These "niche" models are more cost-effective to run, less likely to hallucinate, and far more effective at performing specific, defined tasks for the enterprise.
The cost for a given level of AI capability has decreased by a factor of 100 in just one year. This radical deflation in the price of intelligence requires a complete rethinking of business models and future strategies, as intelligence becomes an abundant, cheap commodity.
A paradox exists where the cost for a fixed level of AI capability (e.g., GPT-4 level) has dropped 100-1000x. However, overall enterprise spend is increasing because applications now use frontier models with massive contexts and multi-step agentic workflows, creating huge multipliers on token usage that drive up total costs.
Even for complex, multi-hour tasks requiring millions of tokens, current AI agents are at least an order of magnitude cheaper than paying a human with relevant expertise. This significant cost advantage suggests that economic viability will not be a near-term bottleneck for deploying AI on increasingly sophisticated tasks.
IBM's CEO explains that previous deep learning models were "bespoke and fragile," requiring massive, costly human labeling for single tasks. LLMs are an industrial-scale unlock because they eliminate this labeling step, making them vastly faster and cheaper to tune and deploy across many tasks.
Arvind Krishna forecasts a 1000x drop in AI compute costs over five years. This won't just come from better chips (a 10x gain). It will be compounded by new processor architectures (another 10x) and major software optimizations like model compression and quantization (a final 10x).
Countering the narrative of insurmountable training costs, Jensen Huang argues that architectural, algorithmic, and computing stack innovations are driving down AI costs far faster than Moore's Law. He predicts a billion-fold cost reduction for token generation within a decade.
While the cost for GPT-4 level intelligence has dropped over 100x, total enterprise AI spend is rising. This is driven by multipliers: using larger frontier models for harder tasks, reasoning-heavy workflows that consume more tokens, and complex, multi-turn agentic systems.