Despite advancements, the model exhibits a surprising tendency to hallucinate. When investigating bugs or validating information, it confidently presents hypotheses as facts without grounding them in data. This is a significant reliability issue, especially for a model marketed as "more honest."
The model performs impressively on one-shot, greenfield projects but struggles with the critical final details and edge cases. When pushed to refine or iterate on a task, it begins to introduce bugs and loses consistency, revealing a significant weakness in handling sustained complexity.
Despite its capabilities, the model produces uninspired and safe outputs when prompted for ambitious, "state-of-the-art" agentic coding projects. It delivers serviceable code but fails to push creative boundaries or think expansively, falling short of its "10x agentic coding" potential.
The model has "narrow vision," latching onto specific data or code points and treating them as definitive truth without broader context. This leads to flawed conclusions in both strategic analysis and coding, as it fails to contextualize information or zoom out to see the bigger picture.
In a direct comparison, the older Opus 4.7 model proved superior for business strategy. It produced structured, data-anchored analysis, whereas Opus 4.8 was "handwavy," struggled to find relevant data, and over-rotated on minor data points, leading to weaker strategic recommendations.
