AI Agent Web Search is Painfully Expensive Due to Redundant URL Visits

Related Insights

The High Cost of Vector Search Creates an Economic Bottleneck for AI Products

AI's hunger for context is making search a critical but expensive component. As illustrated by Turbo Puffer's origin, a single recommendation feature using vector embeddings can cost tens of thousands per month, forcing companies to find cheaper solutions to make AI features economically viable at scale.

Sora 2 Launch Reactions, DoorDash CEO Live in The Ultradome | Tony Xu, Simon Eskildsen, Patrick O’Shaughnessy, Zach Abrams, Andrew Feldman, Brandon Millman, Stanley Tang, Alex Albert, Arthur Querou

TBPN·10 months ago

AI Agents Use Long, Multi-Word Queries, Forcing a Rethink of Search Engine Design

Unlike humans who type 2-3 words, LLMs generate long, sentence-like queries (e.g., eight words or more) to gather comprehensive context. This shift in user behavior from human to AI requires search engines to be optimized for these detailed, descriptive inputs.

AI in the AM: 99% off search, GPT-5.5 is "clean", model welfare analysis, & efficient analog compute

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·3 months ago

Anthropic's 'Agent Teams' Feature Drives Massive Token Usage, Aligning Product with Business Model

The new multi-agent architecture in Opus 4.6, while powerful, dramatically increases token consumption. Each agent runs its own process, multiplying token usage for a single prompt. This is a savvy business strategy, as the model's most advanced feature is also its most lucrative for Anthropic.

Claude Opus 4.6 vs GPT-5.3 Codex: Live Build, Clear Winner

The Startup Ideas Podcast·6 months ago

AI Agents Require Comprehensive Search, Not the 10 Blue Links Humans Prefer

AI agents, unlike humans, need complete and exhaustive information (thousands of results) and use complex, controllable queries. A search engine built for human keyword simplicity and limited results will fail to serve them effectively.

Building Search for AI Agents with Exa CEO Will Bryk

The a16z Show·2 months ago

AI Context Windows Have Plateaued Due to Prohibitive User Costs, Not Just Technical Limits

The growth of LLM context windows has stalled not primarily due to technical barriers, but because multi-million token requests can cost users several dollars per query, leading to low demand. The industry is shifting focus to "smart context" techniques like compaction and retrieval to provide relevant information without the prohibitive cost of massive context.

The Model Eats the Scaffolding: DeepMind's Logan Kilpatrick & Tulsee Doshi on 3.5 Flash, Omni & More

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·2 months ago

Offload Raw Tool Call Data to a File System to Drastically Cut Agent Token Costs

Don't pass the full, token-heavy output of every tool call back into an agent's message history. Instead, save the raw data to an external system (like a file system or agent state) and only provide the agent with a summary or pointer.

Context Engineering for Agents - Lance Martin, LangChain

Latent Space: The AI Engineer Podcast·a year ago

Delegate AI Tasks to Sub-Agents to Preserve Your Main Context Window

When an AI assistant performs a task like web research, it consumes a large amount of context. Instructing it to use a sub-agent offloads this work, keeping the main chat session lean and focused by only returning the final result, dramatically conserving your context window.

How to Turn Claude Code into an Operating System with Carl Vellotti

The Growth Podcast·4 months ago

Google's AI Search Uses "Query Fanout" to Run Dozens of Background Searches for a Single Prompt

Unlike chatbots that rely solely on their training data, Google's AI acts as a live researcher. For a single user query, the model executes a 'query fanout'—running multiple, targeted background searches to gather, synthesize, and cite fresh information from across the web in real-time.

Inside Google's AI turnaround: The rise of AI Mode, strategy behind AI Overviews, and their vision for AI-powered search | Robby Stein (VP of Product, Google Search)

Lenny's Podcast: Product | Career | Growth·10 months ago

Naive Agent Loops Rack Up Huge Costs and Hit Context Limits from Excessive Tool Call Data

The simple "tool calling in a loop" model for agents is deceptive. Without managing context, token-heavy tool calls quickly accumulate, leading to high costs ($1-2 per run), hitting context limits, and performance degradation known as "context rot."

Context Engineering for Agents - Lance Martin, LangChain

Latent Space: The AI Engineer Podcast·a year ago

AI Agents Are Shifting RAG Workloads to Massive Parallel Searches

The nature of Retrieval-Augmented Generation (RAG) is evolving. Instead of a single search to populate an initial context window, AI agents are now performing numerous concurrent queries in a single turn. This allows them to explore diverse information paths simultaneously, driving new database requirements.

Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer

Latent Space: The AI Engineer Podcast·5 months ago

Get your free personalized podcast brief

Related Insights