LLMs shine when acting as a 'knowledge extruder'—shaping well-documented, 'in-distribution' concepts into specific code. They fail when the core task is novel problem-solving where deep thinking, not code generation, is the bottleneck. In these cases, the code is the easy part.

Related Insights

A core debate in AI is whether LLMs, which are text prediction engines, can achieve true intelligence. Critics argue they cannot because they lack a model of the real world. This prevents them from making meaningful, context-aware predictions about future events—a limitation that more data alone may not solve.

Making an API usable for an LLM is a novel design challenge, analogous to creating an ergonomic SDK for a human developer. It's not just about technical implementation; it requires a deep understanding of how the model "thinks," which is a difficult new research area.

The "generative" label on AI is misleading. Its true power for daily knowledge work lies not in creating artifacts, but in its superhuman ability to read, comprehend, and synthesize vast amounts of information—a far more frequent and fundamental task than writing.

High productivity isn't about using AI for everything. It's a disciplined workflow: breaking a task into sub-problems, using an LLM for high-leverage parts like scaffolding and tests, and reserving human focus for the core implementation. This avoids the sunk cost of forcing AI on unsuitable tasks.

Current AI models resemble a student who grinds 10,000 hours on a narrow task. They achieve superhuman performance on benchmarks but lack the broad, adaptable intelligence of someone with less specific training but better general reasoning. This explains the gap between eval scores and real-world utility.

The true danger of LLMs in the workplace isn't just sloppy output, but the erosion of deep thinking. The arduous process of writing forces structured, first-principles reasoning. By making it easy to generate plausible text from bullet points, LLMs allow users to bypass this critical thinking process, leading to shallower insights.

Instead of asking an AI to directly build something, the more effective approach is to instruct it on *how* to solve the problem: gather references, identify best-in-class libraries, and create a framework before implementation. This means working one level of abstraction higher than the code itself.

Anthropic suggests that LLMs, trained on text about AI, respond to field-specific terms. Using phrases like 'Think step by step' or 'Critique your own response' acts as a cheat code, activating more sophisticated, accurate, and self-correcting operational modes in the model.

The process of struggling with and solving hard problems is what builds engineering skill. Constantly available AI assistants act like a "slot machine for answers," removing this productive struggle. This encourages "vibe coding" and may prevent engineers from developing deep problem-solving expertise.

Developing LLM applications requires solving for three infinite variables: how information is represented, which tools the model can access, and the prompt itself. This makes the process less like engineering and more like an art, where intuition guides you to a local maxima rather than a single optimal solution.