The cost of generating code with AI is trivial, shifting the primary expense to its maintenance, validation, and deployment. This inverts the traditional software engineering model where human code production was the main bottleneck, making code's complexity a liability.
At OpenAI, a product manager wrote a Product Requirements Document (PRD) in Markdown, which an AI agent then used to produce a fully functional, production-ready feature within a week. This was achieved without any engineers writing code or translating requirements.
In an AI-first world, an engineer's role shifts from writing feature code to building leverage. They become akin to staff engineers for AI agents, creating the systems, documentation, and automated tests (the "harness") that empower AI to produce high-quality work autonomously.
To prevent a "ball of mud" codebase, OpenAI's system defines strict architectural layers using package boundaries and folder structures. By convention and tooling, different roles are restricted to specific layers—designers to the UI, PMs to business logic—ensuring modularity and preventing architectural decay.
High token consumption is framed as a key metric for AI leverage, not a cost. This goal forces teams to find ways to delegate more complex, long-running, and parallel tasks to AI agents, thus maximizing the intelligence and autonomous work extracted from the models.
An OpenAI team developed an internal application with one million lines of code, all generated by an AI agent. Engineers were forbidden from writing code directly, instead shifting their role to diagnosing AI failures and improving the underlying system to prevent repeat mistakes.
OpenAI structures its repositories to be a complete, self-contained knowledge base for AI agents. All project artifacts—design docs, historical implementation plans, and even text versions of external library documentation—are checked in, allowing the agent to find any needed context via simple search.
Instead of relying on prompts, OpenAI embeds team standards into the test suite. When an agent violates a rule (e.g., incorrect typography), a test fails with an explicit error message. This leverages the agent's training to pass tests, forcing it to self-correct using the failure as just-in-time context.
To validate user interaction patterns without premature backend complexity, OpenAI designers build fully interactive UI prototypes directly in the codebase. These connect to a non-functional "painted door" backend, allowing the team to gather real usage data before committing engineering resources to full implementation.
