The jump from 2D to 3D gaming created a new user interface problem: how to move independently from where you are looking. A single stick couldn't manage this complex interaction. This led to the dual-thumbstick controller, with one stick for movement ('legs') and one for camera/viewpoint ('eyes'), a design standard that persists today.
Instead of developing proprietary systems, the military adopts video game controllers because gaming companies have already invested billions perfecting an intuitive, easy-to-learn interface. This strategy leverages decades of private-sector R&D, providing troops with a familiar, optimized tool for complex, high-stakes operations.
The Wright brothers' first plane required a 'full-body activity' to fly, with hip movements controlling wing tilt and a lever for pitch—a system compared to 'patting your head and rubbing your stomach.' The invention of the single joystick radically simplified this complex, non-intuitive interface, consolidating multi-axis control into one hand.
When the gaming industry pivoted from 2D pixel art to 3D graphics, it wasn't just a technological change. For developers like Tez Okano, it was an emotional collapse. He described it as "watching the collapse of an empire," where the meticulous craft of pixel art was suddenly deemed obsolete, forcing artists to adapt or face unemployment.
GI is not trying to solve robotics in general. Their strategy is to focus on robots whose actions can be mapped to a game controller. This constraint dramatically simplifies the problem, allowing their foundation models trained on gaming data to be directly applicable, shifting the burden for robotics companies from expensive pre-training to more manageable fine-tuning.
The ultimate goal of interface design, exemplified by the joystick, is for the tool to 'disappear.' The user shouldn't think about the controller, but only their intention. This concept, known as 'affordance,' creates a seamless connection between thought and action, making the machine feel like an extension of the self.
A "frontier interface" is one where the interaction model is completely unknown. Historically, from light pens to cursors to multi-touch, the physical input mechanism has dictated the entire scope of what a computer can do. Brain-computer interfaces represent the next fundamental shift, moving beyond physical manipulation.
Products like a joystick possess strong "affordance"—their design inherently communicates how they should be used. This intuitive quality, where a user can just "grok" it, is a key principle of effective design often missing in modern interfaces like touchscreens, which require learned behavior.
Current multimodal models shoehorn visual data into a 1D text-based sequence. True spatial intelligence is different. It requires a native 3D/4D representation to understand a world governed by physics, not just human-generated language. This is a foundational architectural shift, not an extension of LLMs.
The dual-stick controller design has been functionally stable for nearly three decades, suggesting it is a 'peak interface' for 3D navigation. This reliability and widespread familiarity are precisely what allowed for its adoption in high-stakes fields like remote surgery and military operations, as the interface itself was a solved problem.
A joystick has 'perceived affordance'—its physical form communicates how to use it. In contrast, a touchscreen is a 'flat piece of glass' with zero inherent usability. Its function is entirely defined by software, making it versatile but less intuitive and physically disconnected compared to tactile hardware controls.