Despite significant investment and hype in robotics, the path to value creation is slowed by challenges in unit economics and reliability. In contrast, LLM agents are already delivering tangible value, suggesting a much faster and larger market trajectory.
Current LLMs are intelligent enough for many tasks but fail because they lack access to complete context—emails, Slack messages, past data. The next step is building products that ingest this real-world context, making it available for the model to act upon.
The media portrays AI development as volatile, with huge breakthroughs and sudden plateaus. The reality inside labs like OpenAI is a steady, continuous process of experimentation, stacking small wins, and consistent scaling. The internal experience is one of "chugging along."
Experience in robotics, where systems often fail, cultivates resilience and a deep focus on analyzing data to debug problems. This "gritty" skill set is highly transferable and valuable in the world of large language models, where perseverance and data intuition are key.
OpenAI's pivot to specialized models is heavily influenced by organizational realities: different teams possess different datasets and goals, making a unified model difficult. This tendency to "ship the org chart" can be mistaken for a fundamental scientific conclusion.
When AI models achieve superhuman performance on specific benchmarks like coding challenges, it doesn't solve real-world problems. This is because we implicitly optimize for the benchmark itself, creating "peaky" performance rather than broad, generalizable intelligence.
Much RL research from 2015-2022 has not proven useful in practice because academia rewards complex, math-heavy ideas. These provide implicit "knobs" to overfit benchmarks, while ignoring simpler, more generalizable approaches that may lack intellectual novelty.
Large labs often suffer from organizational friction between product and research. A small, focused startup like Cursor can co-design its product and model in a tight loop, enabling rapid innovations like near-real-time policy updates that are organizationally difficult for incumbents.
Previously, labs like OpenAI would use models like GPT-4 internally long before public release. Now, the competitive landscape forces them to release new capabilities almost immediately, reducing the internal-to-external lead time from many months to just one or two.
![[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor](https://assets.flightcast.com/V2Uploads/nvaja2542wefzb8rjg5f519m/01K4D8FB4MNA071BM5ZDSMH34N/square.jpg)