The term 'harness' implies constraining a wild animal. A better mental model for agent infrastructure is a 'mecha suit' that empowers the LLM, giving it new capabilities like storage, compute, and API access. The goal is to broaden what the model can do, not just narrow its focus.
While better models always outperform older ones, the value of a good harness is multiplicative. It provides crucial commercial benefits like lower cost, higher reliability, speed, and oversight. For established, automated workflows, these factors are more important than marginal gains in model intelligence.
To manage context costs, Tasklet summarizes agent history with decreasing granularity over time. Recent interactions are sent verbatim, while older conversations have tool calls, thinking steps, and messages truncated or summarized. This is done in cache-aware buckets to minimize cost.
To replace systems like Salesforce, agent platforms must solve for accidental data loss by unreliable agents. Features like versioned file systems, state rollback, human-in-the-loop approvals, and generating testable migration scripts are crucial harness-level capabilities for building enterprise trust.
Tasklet's winning strategy was to first bet heavily on the single best model (Claude) to achieve critical capabilities. Once multiple models reached that threshold, they pivoted to a neutral, horizontal platform that abstracts the model layer, offering customers choice and de-risking their own supplier dependency.
Initially an add-on, computer use (running shell commands, interacting with file systems and databases) is now the absolute core of Tasklet's platform. This architectural shift reflects a move towards more fundamental, general-purpose agent capabilities over relying on pre-built API integrations.
Tasklet completely re-architected its agent, moving from feeding chat history into the LLM to treating the file system as the primary context. The agent now receives hints and pointers to relevant files, enabling it to handle infinitely long histories and larger contexts beyond the token window.
Andrew Lee observes that top models like GPT and Claude are converging in capability because the labs are in a tight feedback loop. For example, Claude became more 'Codex-like' for coding, while GPT improved at agentic tool-use, an area where Claude previously excelled, leading to market convergence.
Tasklet's new feature generates UIs on the fly from a single prompt, validating their initial fear that led them to pivot from a specialized AI email client. This capability suggests that general-purpose agents will soon be able to replicate and replace most specialized SaaS application interfaces.
As a proxy for how deeply AI is integrated into its own operations, Tasklet tracks internal token spend relative to payroll. This ratio, currently at 5-10%, reflects their use of tools like Claude, Codex, and their own platform to automate work, serving as a key metric for AI-driven productivity.
Andrew Lee's primary competitor is his critical supplier, Anthropic. 80% of Tasklet's churned users go to Anthropic's first-party products, which offer direct customers an estimated five times as many tokens for the same price as Tasklet pays via API. This dynamic forces a strategic pivot.
Andrew Lee predicts most SaaS businesses will be obsoleted. Only three types will survive: 1) a few horizontal agent platforms (Tasklet's goal), 2) headless, API-first companies with deep moats like Stripe, and 3) solutions companies that sell outcomes, not software, like a law firm using AI.
