Press Clipping
05/24/2023
Article
How voice cloning is changing the music industry

Camille Sojit Pejcha

Illustration by
Cindy Kim

Posted
May 24, 2023
Universal Music Group just inked a deal with Endel AI—further cementing a revolution in modern listening habits and intellectual property laws alike

“We have seen what happens when technology outpaces regulation. The unbridled exploitation of personal data. The proliferation of disinformation. And the deepening of societal inequalities.”

This is how Senator Richard Blumenthal opened last week’s Senate hearing—before revealing that the voice delivering these remarks wasn’t actually his; it was a computer-generated clone, produced by one of the many deepfake audio applications currently on the market.

Such technologies aren’t new, but in recent years, their capabilities have grown by leaps and bounds. Apple’s forthcoming voice-cloning feature can learn to speak in your voice in as little as fifteen minutes—while others, like Microsoft’s VALL-E, require only three seconds of audio to synthesize the speaker’s unique timbre and tone. Deepfake music generators are now widely available, leading to the creation of viral tracks like Ghostwriter’s “Heart on My Sleeve”: a since-deleted, AI-powered collaboration featuring the voices of Drake and The Weeknd. The song racked up millions of views, prompting Universal Music Group—an industry titan that controls one-third of the global music market—to take a stand against AI. “[The rise of AI-generated music] begs the question as to which side of history all stakeholders in the music ecosystem want to be on,” a spokesperson for the company said in April, urging streaming platforms to align themselves with “artists, fans and human creative expression,” not “deepfakes [and] fraud.”

A few short weeks later, UMG is now encouraging its own artists to adopt voice-cloning technology, having signed a deal with the functional music-focused AI startup Endel. Described as a “first of its kind strategic relationship,” this collaboration will allow the company’s roster of artists to create AI-powered soundscapes to promote relaxation: music that, as Endel founder Oleg Stavitsky puts it, “is not designed for conscious listening.”

“Functional audio has become a lucrative opportunity for industry stakeholders to cash in on streams from the platform’s least-discerning listeners: those who quite literally want the music to fade into the background while they drift off to dreamland.”

Functional audio is one of the music industry’s fastest-growing sectors, accounting for an estimated 120 billion streams in 2022—about 15 times as many as Taylor Swift. With an increasing number of users seeking pre-curated playlists to soundtrack daily activities like sleep, meditation, or studying, functional audio has become a lucrative opportunity for industry stakeholders to cash in on streams from the platform’s least-discerning listeners: those who quite literally want the music to fade into the background while they drift off to dreamland.

This also makes the genre ripe for exploitation: “Fake artists,” for instance, have garnered controversy since before AI entered the scene, with major labels and streaming platforms allegedly dodging royalty payouts by hiring lesser-known musicians to compose ambient tracks for a one-time fee, playlisting them, and pocketing the full profits.

More recently, a slew of AI-generated songs have been recommended to Spotify users by the platform’s “Discovered On” algorithm—and because they’re often served up to idle listeners, such tracks manage to rack up thousands of streams. Take, for instance, the 47-second, AI-generated song “Martin Luther King Conservatory,” which—despite sounding like a clumsy, hacked-together version of several different jazz performances—accrued hundreds of listeners, having been recommended to Spotify users seeking classical music for babies. As the music writer Jaime Brooks put it, “By targeting a style of music that often appears in sleep-oriented playlists, [these songs] gain access to the least discerning users on the entire platform. Unconscious adults are very unlikely to notice they’re not listening to Mozart anymore… Babies don’t even have the option!”

These literal sleeper hits may sound like a bleak indication of our modern listening environment, and perhaps they are—but for UMG’s artists, the popularity of ambient soundscapes could be a welcome opportunity to profit from the use of AI. In fact, some of the company’s signed artists have already collaborated with Endel prior to the partnership: In 2020, Grimes teamed up with the Berlin-based company to release an AI-powered sleepscape; she’s also been one of the few musicians to embrace the technology, offering to split the royalties with fans who create music using her voice.

“These literal sleeper hits may sound like a bleak indication of our modern listening environment—but for UMG’s artists, the popularity of ambient soundscapes could be a welcome opportunity to profit from the use of AI.”

Thus far, much of the AI discourse has centered on artists-versus-machines, with fear of replacement looming large in the collective consciousness. Many music industry heavyweights have disavowed the use of AI—Ice Cube recently called it “demonic,” encouraging Drake to sue the creators of “Heart on My Sleeve—while others believe that, just as sampling brought about the birth of new musical genres, AI will simply become another tool for artists. Most agree that such technologies necessitate the invention of new rules and norms to protect creators from exploitation.

At present, AI voice models lie in a legal gray area: human and synthetic voices cannot be copyrighted, nor can artistic styles. But long before the proliferation of AI models, vocal impersonation cases have made their way through the courts: for instance, when the auto company Ford hired one of Bette Midler’s backup singers to imitate the delivery of her own song in a commercial in the late-’90s, she sued them for copyright infringement and won. Then, there was Waits v. Frito-Lay, in which Tom Waits sued the snack company for hiring an impersonator to emulate his voice in a radio ad.

On a legal level, rulings around vocal likeness often rest on the rights of artists to control the commercial use of their name, image, and other unequivocal identifiers. For instance, Tom Waits is known for his gravelly voice—so, emulating this quality in an advertisement was seen as an infringement on his artistic persona. In winning the case—which, unlike Midler’s, did not involve the use of his own song—Waits set a precedent for the legal protection of distinctive vocal timbres, establishing that some stylistic aspects can be defined and defended. But, as Herndon observes, “his voice also sounds like a pastiche of a lot of Black American music”—further complicating the debate about who should benefit from the use of a specific style, inflection, or dialect. The voice, Herndon says, is “a communal instrument”—and because humans learn traits like vocal style and delivery from other people, it’s difficult to parse which qualities are the property of the individual speaker.

An early adopter of AI voice models, Herndon has been raising awareness of their creative potential—and hazard potential—for years. In 2021, she created her own AI voice clone, Holly+, and released it to the public: both encouraging users to experiment with the technology for free, and also licensing it out for commercial use by those who wished to profit from their creations. “Holly+ was intended as a kind of thought experiment that people could run with,” Herndon says, adding that the project was never a money-maker—rather, it was intended to highlight the potential dangers the technology poses to artists, and inspire others to come up with their own solutions to the issue of creator compensation in the AI era.

“Many music industry heavyweights have disavowed the use of AI. Others believe it will simply become another tool for artists. Most agree that such technologies necessitate the invention of new rules.”

In Herndon’s view, among the greatest risks to artists is a lack of understanding about how this technology could be used—which is why she has openly cautioned musicians not to sign any AI-related contracts before legal infrastructure is developed to guard against exploitation. “There is going to be a frenzy of companies trying to lock in rights to artist voices,” she states in a recent Twitter thread. “Some may be very well-intentioned. I don’t think art education is anywhere close to where it needs to be for people to tell the difference.”

As the threat of AI hovers over creative industries, new organizations are taking shape to address these challenges. Just last week, a group of music industry stakeholders launched the Human Artistry Campaign, a coalition that aims to ensure artificial intelligence technologies are developed and used in ways that support human culture and artistry, rather than replacing it. Among the foremost concerns of the campaign, which has garnered support from over 40 music organizations, is the enforcement of best practices for the use of AI—such as respect for the work and personas of artists; transparency about the use of machine learning in music, and adherence to existing law including copyright and intellectual property.

The organization doesn’t oppose the use of AI, pointing out that—from piano rolls to guitar pedals and synthesizers—musicians have long used technology to express their creative vision. Just as sampling opened up new opportunities for artists, it’s likely that the use of other people’s voices as new, AI-powered instruments will become an accepted facet of the music industry, governed by new standards for use—so, rather than curbing the development of such technology, The Human Artistry Campaign aims to chart a healthy path for its use. “Art cannot exist independent of human culture,” the organization states, arguing that even in the machine learning era, “human-created works will continue to play an essential role in our lives”—at least, as long as these new norms aren’t established while we’re asleep at the wheel, lulled by the sound of AI-generated lullabies.