File Photo: AI chip maker Nvidia collects videos from YouTube and other sources for AI training. Photo credit: Reuters
AI chipmaker Nvidia is collecting videos from YouTube and other sources for AI training, 404 Media reported. According to documents and chats reviewed by the media, employees at the company were asked to collect videos from Netflix, YouTube and other sources to build datasets for AI models for its Omniverse 3D world generator, self-driving car system and digital human products. The related project, named Cosmos, has not yet been made public, according to the report.
The conversations revealed that when employees asked about potential copyright issues, they were told that Nvidia was fully compliant with “the spirit of copyright law” and had permission from the highest levels of the company.
Emails reviewed by the media showed project managers discussing using 20 to 30 Amazon Web Services virtual machines to download 80 years’ worth of video every day.
Apart from these, Nvidia also used a movie trailer database called MovieNet with an internal library of video game footage and GitHub video datasets, and InternVid-10M, which contains 10 million YouTube video IDs.
(For the day’s top tech news, subscribe to our tech newsletter, Today’s Cache)
Earlier in April, YouTube CEO Neil Mohan said scraping data from YouTube to train AI models was a “clear violation” of the company’s terms.