To analyze video cost-effectively, Tim McLear uses a cheap, fast model to generate captions for individual frames sampled every five seconds. He then packages all these low-level descriptions and the audio transcript and sends them to a powerful reasoning model. This model's job is to synthesize all the data into a high-level summary of the video.

Related Insights

To overcome AI's tendency for generic descriptions of archival images, Tim McLear's scripts first extract embedded metadata (location, date). This data is then included in the prompt, acting as a "source of truth" that guides the AI to produce specific, verifiable outputs instead of just guessing based on visual content.

To move beyond keyword search in their media archive, Tim McLear's system generates two vector embeddings for each asset: one from the image thumbnail and another from its AI-generated text description. Fusing these enables a powerful semantic search that understands visual similarity and conceptual relationships, not just exact text matches.

While generative video gets the hype, producer Tim McLear finds AI's most practical use is automating tedious post-production tasks like data management and metadata logging. This frees up researchers and editors to focus on higher-value creative work, like finding more archival material, rather than being bogged down by manual data entry.

To automate trend analysis, the speaker built a system using chained AIs. The first AI analyzes and synthesizes trends from expert newsletters. A second AI is then used to validate the first AI's output, creating a more robust and reliable final result than a single model could produce.

AI can now analyze video ads frame by frame, identifying the most compelling moments and justifying its choices with sophisticated creative principles like color theory and narrative juxtaposition. This allows for deep qualitative analysis of creative effectiveness at scale, surpassing simple A/B testing.

While today's focus is on text-based LLMs, the true, defensible AI battleground will be in complex modalities like video. Generating video requires multiple interacting models and unique architectures, creating far greater potential for differentiation and a wider competitive moat than text-based interfaces, which will become commoditized.

Gemini 3 can analyze hour-long videos, providing detailed, actionable feedback on performance. This moves AI from a content summarizer to a sophisticated coach for presenters, podcasters, and sales professionals, identifying nuanced issues like alienating audio-only audiences.

Traditional video models process an entire clip at once, causing delays. Descartes' Mirage model is autoregressive, predicting only the next frame based on the input stream and previously generated frames. This LLM-like approach is what enables its real-time, low-latency performance.

Maximize the ROI of video content with a specific three-tool workflow. Use Opus Pro to auto-generate social clips. Use Get Recall to pull a clean transcript. Then, feed that transcript into Claude to write multiple, targeted articles based on different themes from the video.

When analyzing video, new generative models can create entirely new images that illustrate a described scene, rather than just pulling a direct screenshot. This allows AI to generate its own 'B-roll' or conceptual art that captures the essence of the source material.

Use Cheap AI Models for Granular Analysis and Powerful Models for High-Level Synthesis | RiffOn