fbpx

Move Over, Taylor Swift: AI Can Now Create Full Songs from Text Prompts

An image of an AI robot playing a piano to illustrate Stable Audio

If you’ve ever wanted to be a superstar musician but lack the requisite skills, talent, or dedication to become one, AI now has you covered.

Stability AI, the company behind the popular Stable Diffusion technology, has launched Stable Audio, a text-to-audio generation programme powered by artificial intelligence.

Much like Stable Diffusion, which allows users to create AI images from written instructions, Stable Audio creates short audio clips from whatever written input the user gives it.

“Stable Audio is a first-of-its-kind product that uses the latest generative AI techniques to deliver faster, higher-quality music and sound effects via an easy-to-use web interface,” the company wrote in a press release.

“Our hope is that Stable Audio will empower music enthusiasts and creative professionals to generate new content with the help of AI, and we look forward to the endless innovations it will inspire,” Emad Mostaque, CEO of Stability AI.

An image showing the text prompt function of stable audio
Image: Stablitiy AI

The audio generated by the programme ranges from sound effects, like a door slamming or an explosion, through to fully-fledged songs of any genre imaginable.

However, from our testing, it doesn’t appear to be able to be able to recreate human vocals very well. The tracks it can produce are music-only and seem better suited for backing music or the kind of generic music you hear over TikTok videos.

Listening to samples shared online, the ‘diffusion’ element in the audio is clear. Much like how Stable Diffusion images are often a nightmarish mashup of distorted objects melted together in an approximation of, say, a photo taken on a hunting trip, Stable Audio kind of sounds like real songs. What’s apparent is a complete lack of theoretical understanding, producing merely an echo of actual music. Really terrible music at times too.

Stability AI envisions that Stable Audio will be useful for creatives, like podcasters or videographers, who need bespoke sound effects or music for their work but can’t afford to license them.

The audio model works on similar AI techniques as it does to create images by using something called diffusion. This is an AI model that is trained on a certain database, in this case audio, and instructed to create something new that sounds similar to what it has learned.

Stable Audio was built in partnership with the music library AudioSparx, using 800,000 pieces of their licensed music. The key here is that, instead of generating repetitive MIDI tones, the product is rich and complex, complete with metadata and raw audio output.

What’s more, Stable Audio even be used to create ‘stems’ – an isolated drum loop, guitar melody, or synth sample – which can then be pulled together and manipulated by music creators.

Interestingly, according to the terms and conditions, Stability AI grants ownership over the end result to the user, meaning by writing ‘futuristic techno, 145bpm, Berlin, distortion’, you are essentially the creator of whatever Stable Audio spits out.

The programme is free, offering 20 generations of 45 seconds per month. A premium service is also available for USD $12 per month, giving you 500 generations of one-minute 30 per month.

It’s not the first time that computers have generated music. Jukedeck, the music-generating technology now used by TikTok, was created in 2011.

However, with the increasing use of AI in music creation, the industry is struggling to harmonize technology with the rights of creators and licensees. There are increasing fears that AI will replace the jobs of many in the field, and the latest development from Stability is likely to be another step towards this concerning future.

Related: Video May Have Killed the Radio Star, But Will AI Kill the DJ?

Related: Prophets of Doom — Here’s What the Experts Have Said AI Will Do to Society

Read more stories from The Latch and subscribe to our email newsletter.