Get your free personalized podcast brief

We scan new podcasts and send you the top 5 insights daily.

A consistent flaw in both GPT-5.4 and 5.3 Instant is over-verbosity. Instead of being helpful, excessively long, multi-list responses create a cognitive burden on the user, requiring them to sift through noise and slowing down the creative process. This is a hidden cost of the model's new capabilities.

Related Insights

The problem with bad AI-generated work ('slop') isn't just poor writing. It's that subtle inaccuracies or context loss can derail meetings and create long, energy-wasting debates. This cognitive overload makes it difficult for teams to sense-make and ultimately costs more in human time than it saves.

Using AI to generate content without adding human context simply transfers the intellectual effort to the recipient. This creates rework, confusion, and can damage professional relationships, explaining the low ROI seen in many AI initiatives.

OpenAI found that significant upgrades to model intelligence, particularly for complex reasoning, did not improve user engagement. Users overwhelmingly prefer faster, simpler answers over more accurate but time-consuming responses, a disconnect that benefited competitors like Google.

Models that generate "chain-of-thought" text before providing an answer are powerful but slow and computationally expensive. For tuned business workflows, the latency from waiting for these extra reasoning tokens is a major, often overlooked, drawback that impacts user experience and increases costs.

While precise instruction-following is often a feature, the GPT-5.x Codex family can be too literal for creative work. It blindly implements prompts without nuance, overfitting to the most recent instruction. For example, when asked to add a section on integrations, it can make the entire page about integrations.

Sam Altman acknowledged that models are becoming "spiky," with capabilities improving unevenly. OpenAI intentionally prioritized making GPT-5.2 excel at reasoning and coding, which led to a degradation in its creative writing and prose. This highlights the trade-offs inherent in current model training.

Sam Altman admitted OpenAI intentionally neglected the model's writing style, which became unwieldy, to focus limited resources on enhancing its core intelligence and engineering capabilities. This reveals a strategy of prioritizing foundational model improvements over user-facing polish during development cycles.

Companies like OpenAI and Anthropic are intentionally shrinking their flagship models (e.g., GPT-4.0 is smaller than GPT-4). The biggest constraint isn't creating more powerful models, but serving them at a speed users will tolerate. Slow models kill adoption, regardless of their intelligence.

The speed of the new Codex model created an unexpected UX problem: it generated code too fast for a human to follow. The team had to artificially slow down the text rendering in the app to make the stream of information comprehensible and less overwhelming.

While AI development tools can improve backend efficiency by up to 90%, they often create user interface challenges. AI tends to generate very verbose text that takes up too much space and can break the UX layout, requiring significant time and manual effort to get right.