The author observed a "subjective feeling" that older versions of commercial AI models begin to perform worse ("get dumber") immediately preceding the launch of a new version. This suggests that model performance is not static and may be influenced by the provider's release cycle, creating unpredictable results for developers.
An experiment revealed that the more architecturally powerful Claude Opus model created a "beautiful" but non-functional code structure. The project's tests passed only because the older, pre-existing code was still being executed, highlighting the risk of AI-driven over-engineering that isn't properly integrated.
When given autonomy, the more focused Codex model successfully implemented features and fixed bugs. The more powerful Claude Opus model, however, drifted into creating architecturally elegant but non-functional code. This suggests a trade-off between an AI's abstract reasoning ability and its practical execution skills in uncontrolled environments.
While one AI model failed to correctly implement a core streaming pipeline, it successfully identified and built several valuable, adjacent features on its own. These included adding stream timeouts with fallbacks, restoring history on restart, and checking for dead links, demonstrating an AI's capacity for opportunistic value-add even when it misunderstands the primary objective.
