ElevenLabs unveils Music v2, introducing seamless genre transitions and legally safe AI audio
With seamless genre transitions and legally cleared audio, the upgraded model brings professional control to AI music production.
May 28, 2026

Voice artificial intelligence pioneer ElevenLabs has unveiled Music v2, a major upgrade to its generative music capabilities that fundamentally changes how creators interact with AI audio[1]. While early iterations of generative music tools often produced static, unpredictable tracks that behaved like closed audio files, Music v2 is engineered with an emphasis on compositional malleability and structural coherence[2][3]. The headline capability of this new model is its ability to manage extreme mid-track genre transitions[4][3]. A single generated song can pivot seamlessly from a grand operatic arrangement to aggressive heavy metal, and then shift into a rhythmic rap verse, all without losing the underlying musical thread or breaking the temporal flow of the piece[5][1]. This technical achievement addresses a long-standing limitation in generative audio, where complex prompts or sudden stylistic changes typically cause neural networks to output chaotic, disconnected noise[3]. By preserving melodic and rhythmic coherence across wildly disparate musical styles, the model transforms generative audio from a casual novelty into a highly functional tool for professional sound design and musical experimentation[2][3].
Beyond its capacity for stylistic versatility, the technical architecture of the new model introduces unprecedented levels of control for audio editors[4][2]. One of the most significant upgrades is the introduction of a feature called inpainting, which allows creators to isolate specific portions of a track and regenerate them using targeted text prompts without affecting the surrounding composition[4][1]. If a producer wants to alter a bridge or refine a vocal line, they can now do so selectively, preserving the existing chorus and instrumental backing[5][2]. Additionally, rather than relying on a simple, long-form prompt that attempts to generate an entire song in one go, users can now construct compositions section by section[1]. Creators can meticulously build out a track's structural blueprint, starting with a distinct intro, moving through structured verses and choruses, and wrapping up with a cohesive outro[5][6]. The vocal synthesis capabilities have also reached a new level of sophistication, demonstrating an aptitude for dense lyrical arrangements, rapid-fire rap sequences, and multilingual performances where vocals and arrangements naturally conform to the linguistic nuances of the input text[5][4]. Furthermore, the model can embed non-musical sound effects directly into the soundscape, allowing for highly atmospheric and immersive production[4][1].
To support this sophisticated generation model, ElevenLabs is rolling out its music technology across three distinct, workflow-oriented platforms designed to target different segments of the media and technology industries[7][2]. The first is ElevenMusic, a dedicated space where creators can listen to, remix, and build full-length tracks[4][6]. This platform offers deep editing tools, including the ability to download individual instrument and vocal stems, giving human producers the freedom to bring AI-generated assets into traditional digital audio workstations for further refinement[8][6]. It also supports custom music finetunes, a feature that allows artists to train the underlying model on their own original audio to establish a highly personalized, consistent sonic identity[6]. The second platform, ElevenCreative, is tailored specifically for marketing, branding, and video production teams[4][1]. This tool provides a streamlined interface for generating and downloading high-fidelity background tracks, promotional music, and audio beds optimized for advertisements and corporate video content[5][4]. Finally, the company is preparing to launch ElevenAPI, an integration pipeline that will allow third-party developers to embed these advanced music generation capabilities directly into external software applications, games, and digital platforms[4][7].
The release of Music v2 also represents a major strategic shift in how AI developers approach the complex and highly contentious legal landscape of generative media[2]. A primary hurdle for the adoption of AI music in corporate and professional environments has been the threat of copyright infringement[1][3]. High-profile lawsuits from major record labels and industry groups like the Recording Industry Association of America have targeted prominent AI music startups, alleging that their models were trained on copyrighted materials without authorization[1][3]. ElevenLabs has taken a radically different approach by emphasizing that its new music model is trained entirely on licensed, rights-cleared data[5][1]. Because every track generated by the system is legally cleared for commercial use, businesses and content creators can utilize the audio in public-facing campaigns, video streaming, and commercial products without the looming fear of litigation[5][1]. This emphasis on compliance positions the company as a highly attractive partner for enterprise clients who require legal guarantees and strict compliance standards before integrating generative tools into their content pipelines[9][2].
The arrival of these capabilities highlights an intensifying arms race in the generative audio sector[10][3]. This launch coincides with other major developments in the space, such as Stability AI's release of its Stable Audio update, which similarly emphasizes open-weights models and licensed training datasets[10][3]. This collective push toward licensed training and granular editing interfaces indicates that the industry is shifting away from simple, end-to-end prompt-to-song generators[2][3]. Instead, AI developers are actively competing to build comprehensive audio pipelines that integrate seamlessly with professional human workflows[8][2]. While massive, consumer-focused platforms like Suno still capture significant user attention due to their low barrier to entry, the introduction of features like stem exporting, precise inpainting, and API access is bridging the gap between casual AI experimentation and professional-grade music production[1][10][8]. As these tools evolve, they are redefining the role of the modern music producer, shifting the creative process from manual synthesis and arrangement to a more collaborative, curatorial experience where AI acts as a rapid prototyping assistant[11][8].
The rapid technological progression of ElevenLabs is backed by massive capital and a soaring market valuation, which has positioned the company as one of the most formidable players in the voice and audio AI ecosystem[3]. Following a highly successful Series D funding round earlier this year that pushed its valuation to eleven billion dollars, the startup has aggressively expanded its research and development efforts beyond voice cloning and text-to-speech[3]. By successfully entering the music generation space with a model that prioritizes both structural control and legal compliance, the company is solidifying its footprint across the broader digital media landscape[9][2]. As the boundaries between music creation, video editing, and automated software integration continue to blur, the structural and stylistic coherence offered by Music v2 serves as a preview of a future where high-fidelity, adaptive soundtracks can be generated on the fly, tailoring themselves in real-time to the shifting narrative demands of games, films, and interactive media[4][8].