AdTech AI's Millisecond Constraint Forces Heavy Reliance on Pre-Computed Embeddings

Related Insights

Future Hyper-Personalization Will Be a Hybrid of Cloud and On-Device AI

The future of personalization may involve a two-step process. A centralized AI (like Criteo's) will provide strong recommendations. Then, a smaller, privacy-centric model running locally on the user's device (e.g., in their glasses) will perform the final, hyper-personalized adjustments, keeping the most sensitive data private.

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 days ago

AI Competition Is Shifting from Model 'IQ' to User-Perceived Speed

As frontier AI models reach a plateau of perceived intelligence, the key differentiator is shifting to user experience. Low-latency, reliable performance is becoming more critical than marginal gains on benchmarks, making speed the next major competitive vector for AI products like ChatGPT.

2025 in Review, Cursor Acquires Graphite, TikTok's $50B Profit | Michael Truell & Merrill Lutsky, Pranav Myana, Anna Goldie, Edward Mehr

TBPN·5 months ago

Criteo's OpenAI Partnership Creates Hybrid AI by Fusing LLM Knowledge with Real-Time Commerce Data

Criteo’s strategy with OpenAI is to create a hybrid system. LLMs provide general reasoning and conversational ability, but their knowledge quickly becomes stale for dynamic commerce data like pricing and stock. Criteo provides the real-time data layer to ensure accuracy and avoid bad user experiences.

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 days ago

Employ a 'Small, Big, Small' Process for Developing Performant Real-Time AI Models

For low-latency applications, start with a small model to rapidly iterate on data quality. Then, use a large, high-quality model for optimal tuning with the cleaned data. Finally, distill the capabilities of this large, specialized model back into a small, fast model for production deployment.

971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It

Super Data Science: ML & AI Podcast with Jon Krohn·2 months ago

Shopify Uses Non-Transformer Liquid AI Models in Production for 30ms Low-Latency Search

Breaking from transformer dominance, Shopify leverages Liquid AI's state-space-like models for high-value tasks. For search query understanding, they run a 300M parameter Liquid model with an impressive 30ms end-to-end latency, a feat difficult to achieve with traditional architectures.

Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO

Latent Space: The AI Engineer Podcast·22 days ago

AdTech AI Evolved from Sparse Handcrafted Features to Dense Deep Learning Embeddings

Criteo's models moved from using manually crafted, extremely high-dimensional sparse vectors (e.g., 2^12 features) with linear models to dense vectors (a few hundred features) automatically computed by deep learning algorithms. This shift eliminated manual feature engineering and improved model adaptability.

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 days ago

Criteo's Modular AI Uses Multiple Foundation Models to Power Experimentation

Criteo builds multiple, specialized foundation models (for products, user timelines, etc.) rather than a single monolithic one. The embeddings from these models are made available across the company, serving as a "warm start" to accelerate the development and improve the performance of new AI products.

Milliseconds to Match: Criteo's AdTech AI & the Future of Commerce w/ Diarmuid Gill & Liva Ralaivola

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis·5 days ago

AI's Compute Bottleneck Has Shifted From Model Training to User Inference

Previously, the biggest constraint in AI was compute for training next-gen models. Now, the critical bottleneck is providing enough compute for *inference*—the real-time processing of queries from a rapidly growing user base.

The AI industry's existential race for profits

Decoder with Nilay Patel·a month ago

OpenAI's $10B Cerebrus Deal Signals AI's Bottleneck Is Shifting to Inference Speed

While training has been the focus, user experience and revenue happen at inference. OpenAI's massive deal with chip startup Cerebrus is for faster inference, showing that response time is a critical competitive vector that determines if AI becomes utility infrastructure or remains a novelty.

AI's Battle for Your Context

The AI Daily Brief: Artificial Intelligence News and Analysis·4 months ago

AI Compute Speed is the New Moat as Models Reach Reasoning Parity

As AI models become commodities, the underlying hardware's speed and efficiency for inference is the true differentiator. The company that powers the fastest AI experiences will win, similar to how Google won with fast search, because there is no market for slow AI.

How AI Is Rewriting the Sales Playbook and Raising the Bar on Human Performance with Alex Varel

Revenue Builders·14 days ago

Get your free personalized podcast brief

Related Insights