Distilled models like SDXL Lightning, hyped for real-time demos, failed to gain user retention. The assumption they'd be used for 'drafting' proved wrong, as users consistently prefer waiting for the highest possible quality output, making speed secondary to final results.
Many teams wrongly focus on the latest models and frameworks. True improvement comes from classic product development: talking to users, preparing better data, optimizing workflows, and writing better prompts.
Simply offering the latest model is no longer a competitive advantage. True value is created in the system built around the model—the system prompts, tools, and overall scaffolding. This 'harness' is what optimizes a model's performance for specific tasks and delivers a superior user experience.
Users mistakenly evaluate AI tools based on the quality of the first output. However, since 90% of the work is iterative, the superior tool is the one that handles a high volume of refinement prompts most effectively, not the one with the best initial result.
A 'GenAI solves everything' mindset is flawed. High-latency models are unsuitable for real-time operational needs, like optimizing a warehouse worker's scanning path, which requires millisecond responses. The key is to apply the right tool—be it an optimizer, machine learning, or GenAI—to the specific business problem.
The primary driver for fine-tuning isn't cost but necessity. When applications like real-time voice demand low latency, developers are forced to use smaller models. These models often lack quality for specific tasks, making fine-tuning a necessary step to achieve production-level performance.
The primary bottleneck in improving AI is no longer data or compute, but the creation of 'evals'—tests that measure a model's capabilities. These evals act as product requirement documents (PRDs) for researchers, defining what success looks like and guiding the training process.
For consumer products like ChatGPT, models are already good enough for common queries. However, for complex enterprise tasks like coding, performance is far from solved. This gives model providers a durable path to sustained revenue growth through continued quality improvements aimed at professionals.
For marketing, resist the allure of all-in-one AI platforms. The best results currently come from a specialized stack of hyper-focused tools, each excelling at a single task like image generation or presentation creation. Combine their outputs for superior quality.
Despite base models improving, they only achieve ~90% accuracy for specific subjects. Enterprises require the 99% pixel-perfect accuracy that LoRAs provide for brand and character consistency, making it an essential, long-term feature, not a stopgap solution.
Unlike streaming text from LLMs, image generation forces users to wait. An A/B test by one of Fal's customers proved that increased latency directly harms user engagement and the number of images created, much like slow page loads hurt e-commerce sales.