The sheer volume of AI-generated code is causing Shopify's CI/CD pipelines to "start creaking." This bottleneck suggests that the entire paradigm of pull requests and Git—designed for human-scale development—may be obsolete in an "agentic world" and require a completely new design.
Analysis of Shopify's internal AI usage reveals a significant trend: the top percentile of users are increasing their token consumption much faster than others. The CTO finds this skew "not ideal," fearing it could lead to extreme imbalances in resource utilization.
Shopify built "Tangent," an auto-research system that runs experiments, analyzes results, and modifies pipelines to maximize a goal. This has democratized ML development, with a Product Manager becoming the tool's top user, effectively cutting out the ML engineer for many optimization tasks.
While powerful, Shopify's auto-research tool has limitations. It excels at performing tasks that are "obvious" but tedious for humans, like finding derivative datasets or suboptimal code. However, it's not yet capable of generating completely out-of-the-box solutions that require deep, multi-day thinking.
Breaking from transformer dominance, Shopify leverages Liquid AI's state-space-like models for high-value tasks. For search query understanding, they run a 300M parameter Liquid model with an impressive 30ms end-to-end latency, a feat difficult to achieve with traditional architectures.
Shopify's CTO, who led the Bing team, reveals that Sydney's controversial personality was intentionally crafted. Drawing on experience from Yandex's "Alice" assistant, the team spent significant effort on "personality shaping" to create a character that was polite but "a little bit on edge" to increase user engagement.
The explosion in AI-generated code creates a new quality assurance bottleneck. Shopify's CTO insists that pull request reviews must use the largest, most expensive models to maintain quality and prevent a surge in bugs, noting that smaller, faster models are insufficient for the task.
Shopify's SimGym successfully simulates customer behavior because it's trained on a decade of historical data linking store changes to sales outcomes. The CTO emphasizes that without this vast, proprietary dataset, any similar simulation would fail, as the AI agents would merely act out their prompts.
Shopify encourages widespread AI adoption by providing an unlimited token budget for all employees. To ensure quality, they implement bottom-up control, discouraging the use of models less capable than top-tier ones like Claude 3 Opus, setting a high performance floor for tooling.
Shopify's CTO reveals that AI tool usage by employees surged dramatically around December, reaching nearly 100% daily active users. Interestingly, command-line interface (CLI) based tools are seeing faster growth than traditional integrated development environment (IDE) tools like GitHub Copilot.
Shopify's CTO clarifies that Liquid AI models don't compete with frontier models like GPT-4. Instead, their key advantage is serving as a highly effective target for knowledge distillation. This allows Shopify to compress a huge model's capabilities into a smaller, faster, cheaper Liquid AI model for specific tasks.
Shopify's CTO argues against running many AI agents in parallel. A more effective, higher-quality method is a "critique loop," where one agent (ideally using a different model) reviews and suggests improvements to another's work. Though slower, this process significantly boosts code quality.
