Tasklet's CEO argues that while traditional workflow automation seems safer, agentic systems that let the model plan and execute will ultimately prove more robust. They can handle unexpected errors and nuance that break rigid, pre-defined workflows, a bet on future model improvements.
For complex, multi-turn agentic workflows, Tasklet prioritizes a model's iterative performance over standard benchmarks. Anthropic's models are chosen based on a qualitative "vibe" of being superior over long sequences of tool use, a nuance that quantitative evaluations often miss.
Tasklet's CEO reports that when AI agents fail at using a computer GUI, it's rarely due to a lack of intelligence. The real bottlenecks are the high cost and slow speed of the screenshot-and-reason process, which causes agents to hit usage or budget limits before completing complex tasks.
To avoid confusing agents with contradictory goals, Tasklet plans to shift from pre-generated, static instructions to dynamically generating them just-in-time for each task run. This ensures the agent always operates on the most current user feedback, preventing errors from conflicting historical directives.
Tasklet, a platform for automating recurring tasks, found a surprising user behavior: most messages are for ad-hoc, one-off requests. Users invest time creating a highly-contextualized agent for automation, then leverage that same smart agent for immediate, chat-based assistance, making chat the dominant interaction model.
To make agents useful over long periods, Tasklet engineers an "illusion" of infinite memory. Instead of feeding a long chat history, they use advanced context engineering: LLM-based compaction, scoping context for sub-agents, and having the LLM manage its own state in a SQL database to recall relevant information efficiently.
Tasklet's experience shows AI agents can be more effective directly calling HTTP APIs using scraped documentation than using the specialized MCP framework. This "direct API" approach is so reliable that users prefer it over official MCP integrations, challenging the assumption that structured protocols are superior.
Tasklet's CEO points to pricing as the ultimate proof of an LLM's value. Despite GPT-4o being cheaper, Anthropic's Sonnet maintains a higher price, indicating customers pay a premium for its superior performance on multi-turn agentic tasks—a value not fully captured by benchmarks.
Contrary to the trend toward multi-agent systems, Tasklet finds that one powerful agent with access to all context and tools is superior for a single user's goals. Splitting tasks among specialized agents is less effective than giving one generalist agent all information, as foundation models are already experts at everything.
