RiffOn - ⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

OpenAI's team discusses training coding agents like Codex for personality, the shift to higher-level agent abstractions, and the future of evals.

Coding Agents Are Evolving Into General-Purpose Computer Automation Tools

The power of tools like Codex lies beyond writing software; they are becoming general 'computer use agents' that leverage the command line to automate personal tasks. This includes organizing messy file directories, managing desktop files, or sorting emails, reclaiming the power of the terminal for everyday automation.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago

OpenAI Designs 'Job Interview Evals' to Test Complex Agent Capabilities

Standard benchmarks fall short for multi-turn AI agents. A new approach is the 'job interview eval,' where an agent is given an underspecified problem. It is then graded not just on the solution, but on its ability to ask clarifying questions and handle changing requirements, mimicking how a human developer is evaluated.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago

OpenAI Trains Coding Models on 'Personality' Traits Like Planning to Build Developer Trust

To increase developer adoption, OpenAI intentionally trained its models on specific behavioral characteristics, not just coding accuracy. These 'personality' traits include communication (explaining its steps), planning, and self-checking, mirroring best practices of human software engineers to make the AI a more trustworthy pair programmer.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago

Startups Should Use Specialized Codex for Agents, Not the General GPT Model

OpenAI recommends a bifurcated approach. Startups building bleeding-edge, code-focused agents should use the specialized Codex model line, which is highly opinionated and optimized for its tool harness. Applications requiring more general capabilities and steerability across various tools should use the mainline GPT model instead.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago

The AI Stack is Abstracting Upwards from Models to Pre-Packaged Agents

A major trend in AI development is the shift away from optimizing for individual model releases. Instead, developers can integrate higher-level, pre-packaged agents like Codex. This allows teams to build on a stable agentic layer without needing to constantly adapt to underlying model changes, API updates, and sandboxing requirements.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago

OpenAI's Codex Model Performs Better When Tools Are Named 'rg' Instead of 'grip'

AI models develop strong 'habits' from training data, leading to unexpected performance quirks. The Codex model is so accustomed to the command-line tool 'ripgrep' (aliased as 'rg') that its performance improves significantly when developers name their custom search tool 'rg', revealing a surprising lack of generalization.

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Latent Space: The AI Engineer Podcast·2 months ago