LLMs may use available packages in a project's environment without properly declaring them in configuration files like `package.json`. This leads to fragile builds that work locally but break on fresh installations. Developers must manually verify and instruct the LLM to add all required dependencies.
A practical hack to improve AI agent reliability is to avoid built-in tool-calling functions. LLMs have more training data on writing code than on specific tool-use APIs. Prompting the agent to write and execute the code that calls a tool leverages its core strength and produces better outcomes.
When asked to modify or rewrite functionality, LLMs often attempt to preserve compatibility with previous versions, even on greenfield projects. This defensive behavior can lead to overly complex code and technical debt. Developers must explicitly state that backward compatibility is not a requirement.
As AI generates more code than humans can review, the validation bottleneck emerges. The solution is providing agents with dedicated, sandboxed environments to run tests and verify functionality before a human sees the code, shifting review from process to outcome.
Models like Gemini Flash can exhibit a behavior of creating and then deleting temporary utility files (e.g., code analyzers), assuming they are no longer needed. This forces costly regeneration. To prevent this, users must explicitly instruct the LLM to save these scripts in a specific directory for future use.
Building features like custom commands and sub-agents can look like reliable, deterministic workflows. However, because they are built on non-deterministic LLMs, they fail unpredictably. This misleads users into trusting a fragile abstraction and ultimately results in a poor experience.
AI can generate code that passes initial tests and QA but contains subtle, critical flaws like inverted boolean checks. This creates 'trust debt,' where the system seems reliable but harbors hidden failures. These latent bugs are costly and time-consuming to debug post-launch, eroding confidence in the codebase.
AI coding tools dramatically accelerate development, but this speed amplifies technical debt creation exponentially. A small team can now generate a massive, fragile codebase with inconsistent patterns and sparse documentation, creating maintenance burdens previously seen only in large, legacy organizations.
Instead of treating a complex AI system like an LLM as a single black box, build it in a componentized way by separating functions like retrieval, analysis, and output. This allows for isolated testing of each part, limiting the surface area for bias and simplifying debugging.
When given ambiguous instructions, LLMs will choose the most common technology stack from their training data (e.g., React with Tailwind), even if it contradicts the project's goals. Developers must provide explicit constraints to avoid this unwanted default behavior.
Developing LLM applications requires solving for three infinite variables: how information is represented, which tools the model can access, and the prompt itself. This makes the process less like engineering and more like an art, where intuition guides you to a local maxima rather than a single optimal solution.