Constantly Test New AI Models Against a Personal "Suite" of Unsolvable Tasks

Related Insights

AI's 'Jagged Frontier' Requires You to Map Its Value in Your Own Field of Expertise

AI's capabilities are inconsistent; it excels at some tasks and fails surprisingly at others. This is the 'jagged frontier.' You can only discover where AI is useful and where it's useless by applying it directly to your own work, as you are the only one who can accurately judge its performance in your domain.

1. Fat layer of humans

Economist Podcasts·21 days ago

Design AI Products for Where LLMs Will Be, Not Their Current Limitations

Building an AI-native product requires betting on the trajectory of model improvement, much like developers once bet on Moore's Law. Instead of designing for today's LLM constraints, assume rapid progress and build for the capabilities that will exist tomorrow. This prevents creating an application that is quickly outdated.

His 1st startup failed. His 2nd became a unicorn in just 18 months. | Jake Stauch, Founder of Serval

A Product Market Fit Show | Startup Podcast for Founders·a month ago

Rapid AI Progress Creates "Capability Blindness" in Users Who Don't Re-test Failed Tasks

Users frequently write off an AI's ability to perform a task after a single failure. However, with models improving dramatically every few months, what was impossible yesterday may be trivial today. This "capability blindness" prevents users from unlocking new value.

Vibe Check: Claude Cowork Is Claude Code for the Rest of Us

AI & I·a month ago

Develop Personal Instinct for AI Models Instead of Searching for the "Objectively Best" One

The goal of testing multiple AI models isn't to crown a universal winner, but to build your own subjective "rule of thumb" for which model works best for the specific tasks you frequently perform. This personal topography is more valuable than any generic benchmark.

AI New Year’s: The 10-Week AI Resolution

The AI Daily Brief: Artificial Intelligence News and Analysis·2 months ago

AI Product Leaders Build Intuition by Using Every New Model, Not Just Reading About It

The essential skill for AI PMs is deep intuition, which can only be built through hands-on experimentation. This means actively using every new LLM, image, and video model upon release to objectively understand its capabilities, limitations, and trajectory, rather than relying on second-hand analysis.

Inside An AI Acquisition: How Yana Welinder Built & Sold Kraftful To Amplitude

Product Talk·24 days ago

AI 'Evals' Are the New Product Requirement Documents for Models

The primary bottleneck in improving AI is no longer data or compute, but the creation of 'evals'—tests that measure a model's capabilities. These evals act as product requirement documents (PRDs) for researchers, defining what success looks like and guiding the training process.

Why experts writing AI evals is creating the fastest-growing companies in history | Brendan Foody (CEO of Mercor)

Lenny's Podcast: Product | Career | Growth·5 months ago

Build AI Products for the Model Capabilities You Expect in Six Months, Not Today's

When developing AI-powered tools, don't be constrained by current model limitations. Given the exponential improvement curve, design your product for the capabilities you anticipate models will have in six months. This ensures your product is perfectly timed to shine when the underlying tech catches up.

Boris Cherny (Creator of Claude Code) On How His Career Grew

The Peterman Pod·2 months ago

Ask AI to Audit Your Daily Tasks to Reveal Its Own Best Use Cases

Instead of guessing where AI can help, use AI itself as a consultant. Detail your daily workflows, tasks, and existing tools in a prompt, and ask it to generate an "opportunity map." This meta-approach lets AI identify the highest-impact areas for its own implementation.

How to Build Your AI Team, Task by Task

The Duct Tape Marketing Podcast·4 months ago

Build a Personal "AI Model Map" for a Competitive Productivity Edge

A significant source of competitive advantage ("alpha") comes from systematically testing various AI models for different tasks. This creates a personal map of which tools are best for specific use cases, ensuring you always use the optimal solution.

Building a Personal AI Model Map [AI Operators Bonus Episode]

The AI Daily Brief: Artificial Intelligence News and Analysis·a month ago

Build Internal AI Benchmarks for Core Job Roles Instead of Waiting for Public Ones

Instead of waiting for external reports, companies should develop their own AI model evaluations. By defining key tasks for specific roles and testing new models against them with standard prompts, businesses can create a relevant, internal benchmark.

#172: Sora 2, Claude Sonnet 4.5, ChatGPT Instant Checkout, How OpenAI Uses AI, Grokipedia & Mercor’s AI Productivity Index

The Artificial Intelligence Show·4 months ago