We scan new podcasts and send you the top 5 insights daily.
Because music is subjective, AI music models can't be trained on "right" answers like chess or code. Instead of aiming for peak performance in one genre, Suno's team focuses on identifying and improving areas where the model underperforms, or has "anti-spikes."
Descript's AI audio tool worsened after they trained it on extremely bad audio (e.g., vacuum cleaners). They learned the model that best fixes terrible audio is different from the one that best improves merely "okay" audio—the more common user scenario. You must train for your primary user's reality, not the worst possible edge case.
Despite public industry skepticism, AI music tools are becoming indispensable creative co-pilots for professional songwriters and producers. The CEO of Suno reveals that while many pros use the platform extensively for ideation, they are reluctant to admit it publicly.
AI performs poorly in areas where expertise is based on unwritten 'taste' or intuition rather than documented knowledge. If the correct approach doesn't exist in training data or isn't explicitly provided by human trainers, models will inevitably struggle with that particular problem.
Suno's breakthrough came from rejecting established musical concepts like the 12-tone scale. By training their model on raw, continuous sound waves, they created a generic, unconstrained music machine capable of generating novel sounds and genre blends beyond human convention.
Creating AI that can reliably judge aesthetics is a frontier problem. Unlike tasks with clear right or wrong answers, aesthetics is subjective. This lack of a clear, objective benchmark makes it difficult to apply standard model improvement techniques, making it a better fit for Reinforcement Learning from Human Feedback (RLHF).
To teach AI subjective skills like poetry, a group of experts with some disagreement is better than one with full consensus. This approach captures diverse tastes and edge cases, which is more valuable for creating a robust model than achieving perfect agreement.
Fine-tuning an AI model is most effective when you use high-signal data. The best source for this is the set of difficult examples where your system consistently fails. The processes of error analysis and evaluation naturally curate this valuable dataset, making fine-tuning a logical and powerful next step after prompt engineering.
The best AI models are trained on data that reflects deep, subjective qualities—not just simple criteria. This "taste" is a key differentiator, influencing everything from code generation to creative writing, and is shaped by the values of the frontier lab.
AI tools enable "vibe coding," where you describe a desired outcome or feeling (e.g., "make the crowd go wild") rather than technical specifications. This decouples taste (what you want) from skill (how to make it), opening creative fields to non-experts.
Suno's CEO asserts that music AI is not a scale problem like LLMs. Because music lacks objective benchmarks, smaller models aligned via massive amounts of human preference data are more effective. This preference data not only aligns the model but also fuels novel research breakthroughs, creating a virtuous cycle.