We scan new podcasts and send you the top 5 insights daily.
Criteo has just milliseconds to respond to an ad request. This extreme speed requirement dictates their AI architecture, forcing them to pre-compute and cache user and product embeddings. Real-time inference is limited to fast operations with only marginal updates for the user's latest action.
The future of personalization may involve a two-step process. A centralized AI (like Criteo's) will provide strong recommendations. Then, a smaller, privacy-centric model running locally on the user's device (e.g., in their glasses) will perform the final, hyper-personalized adjustments, keeping the most sensitive data private.
As frontier AI models reach a plateau of perceived intelligence, the key differentiator is shifting to user experience. Low-latency, reliable performance is becoming more critical than marginal gains on benchmarks, making speed the next major competitive vector for AI products like ChatGPT.
Criteo’s strategy with OpenAI is to create a hybrid system. LLMs provide general reasoning and conversational ability, but their knowledge quickly becomes stale for dynamic commerce data like pricing and stock. Criteo provides the real-time data layer to ensure accuracy and avoid bad user experiences.
For low-latency applications, start with a small model to rapidly iterate on data quality. Then, use a large, high-quality model for optimal tuning with the cleaned data. Finally, distill the capabilities of this large, specialized model back into a small, fast model for production deployment.
Breaking from transformer dominance, Shopify leverages Liquid AI's state-space-like models for high-value tasks. For search query understanding, they run a 300M parameter Liquid model with an impressive 30ms end-to-end latency, a feat difficult to achieve with traditional architectures.
Criteo's models moved from using manually crafted, extremely high-dimensional sparse vectors (e.g., 2^12 features) with linear models to dense vectors (a few hundred features) automatically computed by deep learning algorithms. This shift eliminated manual feature engineering and improved model adaptability.
Criteo builds multiple, specialized foundation models (for products, user timelines, etc.) rather than a single monolithic one. The embeddings from these models are made available across the company, serving as a "warm start" to accelerate the development and improve the performance of new AI products.
Previously, the biggest constraint in AI was compute for training next-gen models. Now, the critical bottleneck is providing enough compute for *inference*—the real-time processing of queries from a rapidly growing user base.
While training has been the focus, user experience and revenue happen at inference. OpenAI's massive deal with chip startup Cerebrus is for faster inference, showing that response time is a critical competitive vector that determines if AI becomes utility infrastructure or remains a novelty.
As AI models become commodities, the underlying hardware's speed and efficiency for inference is the true differentiator. The company that powers the fastest AI experiences will win, similar to how Google won with fast search, because there is no market for slow AI.