Many AI startups are "wrappers" whose service cost is tied to an upstream LLM. Since LLM prices fluctuate, these startups risk underwater unit economics. Stripe's token billing API allows them to track and price their service based on real-time inference costs, protecting their margins from volatility.

Related Insights

The "AI wrapper" concern is mitigated by a multi-model strategy. A startup can integrate the best models from various providers for different tasks, creating a superior product. A platform like OpenAI is incentivized to only use its own models, creating a durable advantage for the startup.

The notion of building a business as a 'thin wrapper' around a foundational model like GPT is flawed. Truly defensible AI products, like Cursor, build numerous specific, fine-tuned models to deeply understand a user's domain. This creates a data and performance moat that a generic model cannot easily replicate, much like Salesforce was more than just a 'thin wrapper' on a database.

Historically, a developer's primary cost was salary. Now, the constant use of powerful AI coding assistants creates a new, variable infrastructure expense for LLM tokens. This changes the economic model of software development, with costs per engineer potentially rising by dollars per hour.

Standard SaaS pricing fails for agentic products because high usage becomes a cost center. Avoid the trap of profiting from non-use. Instead, implement a hybrid model with a fixed base and usage-based overages, or, ideally, tie pricing directly to measurable outcomes generated by the AI.

AI companies operate under the assumption that LLM prices will trend towards zero. This strategic bet means they intentionally de-prioritize heavy investment in cost optimization today, focusing instead on capturing the market and building features, confident that future, cheaper models will solve their margin problems for them.

Pega's CTO advises using the powerful reasoning of LLMs to design processes and marketing offers. However, at runtime, switch to faster, cheaper, and more consistent predictive models. This avoids the unpredictability, cost, and risk of calling expensive LLMs for every live customer interaction.

OpenPipe's initial value was clear: GPT-4 was powerful but prohibitively expensive for production. They offered a managed flow to distill expensive workflows into cheaper, smaller models, resonating with early customers facing massive OpenAI bills and helping them reach $1M ARR in eight months.

Unlike the cloud market with high switching costs, LLM workloads can be moved between providers with a single line of code. This creates insane market dynamics where millions in spend can shift overnight based on model performance or cost, posing a huge risk to the LLM providers themselves.

Traditional SaaS metrics like 80%+ gross margins are misleading for AI companies. High inference costs lower margins, but if the absolute gross profit per customer is multiples higher than a SaaS equivalent, it's a superior business. The focus should shift from margin percentages to absolute gross profit dollars and multiples.

The AI value chain flows from hardware (NVIDIA) to apps, with LLM providers currently capturing most of the margin. The long-term viability of app-layer businesses depends on a competitive model layer. This competition drives down API costs, preventing model providers from having excessive pricing power and allowing apps to build sustainable businesses.