IA2's preprocessing creates a rich workload model for its deep reinforcement learning task. This model doesn't just analyze queries; it integrates query plans, current indexes, database metadata, and tokenized queries. This holistic state representation is key to its ability to generalize across diverse database workloads, providing a more accurate view of the system's state.
To avoid AI hallucinations, Square's AI tools translate merchant queries into deterministic actions. For example, a query about sales on rainy days prompts the AI to write and execute real SQL code against a data warehouse, ensuring grounded, accurate results.
Pre-training on internet text data is hitting a wall. The next major advancements will come from reinforcement learning (RL), where models learn by interacting with simulated environments (like games or fake e-commerce sites). This post-training phase is in its infancy but will soon consume the majority of compute.
To build a multi-billion dollar database company, you need two things: a new, widespread workload (like AI needing data) and a fundamentally new storage architecture that incumbents can't easily adopt. This framework helps identify truly disruptive infrastructure opportunities.
Training AI agents to execute multi-step business workflows demands a new data paradigm. Companies create reinforcement learning (RL) environments—mini world models of business processes—where agents learn by attempting tasks, a more advanced method than simple prompt-completion training (SFT/RLHF).
To enable AI tools like Cursor to write accurate SQL queries with minimal prompting, data teams must build a "semantic layer." This file, often a structured JSON, acts as a translation layer defining business logic, tables, and metrics, dramatically improving the AI's zero-shot query generation ability.
Unlike pre-training's simpler data pipeline, RL involves many "moving parts" because each task can have a unique grading setup and infrastructure. This complexity, not just the algorithm itself, is the primary challenge for researchers managing live training runs at scale.
The transition from supervised learning (copying internet text) to reinforcement learning (rewarding a model for achieving a goal) marks a fundamental breakthrough. This method, used in Anthropic's Opus 3 model, allows AI to develop novel problem-solving capabilities beyond simple data emulation.
When determining what data an RL model should consider, resist including every available feature. Instead, observe how experienced human decision-makers reason about the problem. Their simplified mental models reveal the core signals that truly drive outcomes, leading to more stable, faster-learning, and more interpretable AI systems.
To make agents useful over long periods, Tasklet engineers an "illusion" of infinite memory. Instead of feeding a long chat history, they use advanced context engineering: LLM-based compaction, scoping context for sub-agents, and having the LLM manage its own state in a SQL database to recall relevant information efficiently.
Instead of exhaustively listing all possible database indexes, the IA2 system uses a smarter approach. It employs validation rules, permutations, and heuristics to generate a refined set of high-potential index candidates. This creates a more focused and relevant "action space" for the reinforcement learning agent to explore, leading to more efficient training and better index selection.