Written by Stephen Nellis
(Reuters) – Nvidia on Monday unveiled a new artificial intelligence model for generating music and audio that can modify voices to produce novel sounds. This technology is aimed at music, movie, and video game producers.
Nvidia, the world’s largest supplier of chips and software used to build AI systems, said it has no immediate plans to make the technology, which it calls Fugatto, short for Foundational Generative Audio Transformer Opus 1, available to the public. .
This joins other technologies demonstrated by startups like Runway and big players like Meta Platforms, which can generate audio and video from text prompts.
Santa Clara, California-based Nvidia’s version generates sound effects and music from text descriptions, including novel sounds such as a dog making trumpet noises.
What sets it apart from other AI technologies is that it can take existing audio and modify it. For example, you can convert a line played on a piano to a line sung by a human voice, or you can capture and modify a recording of spoken words. The accents used and the atmosphere expressed.
“If you think about synthesized audio over the last 50 years, music sounds different than it does now because of computers and synthesizers,” said Brian Catanzaro, vice president of applied deep learning research at Nvidia. “I think generative AI will bring new capabilities to music, video games, and just people in general who want to create things.”
Companies such as OpenAI are negotiating with Hollywood studios about whether and how AI can be used in the entertainment industry, especially since Hollywood star Scarlett Johansson has expressed concerns that OpenAI could imitate her voice. Relations between technology and Hollywood have been strained ever since.
Nvidia’s new model is trained on open source data, and the company said it is still evaluating whether and how to make it available to the public.
“There’s always some risk with generation technology, because people can use it to generate things that we don’t want,” Catanzaro said. “We have to be careful on that, so we don’t have any plans to release this right away.”
Creators of generative AI models have not yet determined how to prevent users from misusing the technology, such as generating false information or infringing copyrights by generating copyrighted characters. .
OpenAI and Meta similarly haven’t said when they plan to make their audio and video-generating models publicly available.
(Reporting by Stephen Nellis in San Francisco; Editing by Will Dunham)