While photorealism is a common goal, the first fully AI-generated films will likely be animated or fantasy. This is because traditional filmmaking is already cheap and effective at capturing reality. AI's true economic and creative advantage lies in generating complex, non-photorealistic visuals that are currently expensive to produce.
A common pattern for developers building with generative media is to use two types of models. A cheaper, lower-quality 'workhorse' model is used for high-volume tasks like prototyping. A second, expensive, state-of-the-art 'hero' model is then reserved for the final, high-quality output, optimizing for cost and quality.
The generative video space is evolving so rapidly that a model ranked in the top five has a half-life of just 30 days. This extreme churn makes it impractical for developers to bet on a single model, driving them towards aggregator platforms that offer access to a constantly updated portfolio.
The primary performance bottleneck for LLMs is memory bandwidth (moving large weights), making them memory-bound. In contrast, diffusion-based video models are compute-bound, as they saturate the GPU's processing power by simultaneously denoising tens of thousands of tokens. This represents a fundamental difference in optimization strategy.
The visual domain is more fertile for open-source contributions because small tweaks, like fine-tuning an aesthetic, produce tangible, distinct results. In contrast, fine-tuned LLMs often feel monolithic with less perceptible differences, leading to a less diverse open-source community.
Former DreamWorks CEO Jeffrey Katzenberg compares the current backlash against AI in creative fields to the initial revolt from traditional animators against computer graphics. He argues that, like computer animation, AI's adoption is an unstoppable technological shift that creators will either join or be left behind by.
The computational requirements for generative media scale dramatically across modalities. If a 200-token LLM prompt costs 1 unit of compute, a single image costs 100x that, and a 5-second video costs another 100x on top of that—a 10,000x total increase. 4K video adds another 10x multiplier.
Advanced generative media workflows are not simple text-to-video prompts. Top customers chain an average of 14 different models for tasks like image generation, upscaling, and image-to-video transitions. This multi-model complexity is a key reason developers prefer open-source for its granular control over each step.
