Nvidia is getting into world models, AI models inspired by the mental models of the world that humans naturally develop.
At CES 2025 in Las Vegas, the company announced that it is rolling out a family of world models that can predict and generate “physics-aware” videos. Nvidia calls this family the Cosmos World Foundation Model (Cosmos WFM for short).
Models that can be fine-tuned for specific applications are available through Nvidia’s API and NGC catalogs, GitHub, and AI development platform Hugging Face.
“NVIDIA is making available the first wave of Cosmos WFM for physically-based simulation and synthetic data generation,” the company said in a blog post provided to TechCrunch. “Researchers and developers, regardless of company size, are free to use Cosmos models under Nvidia’s permissive open model license, which allows commercial use.”
The Cosmos WFM family has numerous models and is divided into three categories: Nano for low-latency and real-time applications, Super for “high-performance baseline” models, and Ultra for the highest quality and fidelity output. Masu.
Model sizes range from 4 billion to 14 billion parameters, with Nano being the smallest and Ultra being the largest. Parameters roughly correspond to the model’s problem-solving skills, and models with more parameters generally perform better than models with fewer parameters.
As part of Cosmos WFM, Nvidia will develop technologies such as an “up-sampling model,” a video decoder optimized for augmented reality, a guardrail model to ensure responsible use, and the generation of sensor data for self-driving car development. We also release models that are fine-tuned for applications. . These models, like other Cosmos WFM models, were trained on 9,000 trillion tokens from 20 million hours of real-world human interaction, environmental, industrial, robotics, and driving data, Nvidia said. says. (In AI, a “token” represents a bit of raw data, in this case video footage.)
Nvidia has not disclosed the source of this training data, but at least one report and lawsuit alleges that the company trained on copyrighted YouTube videos without permission.
When asked for comment, an Nvidia spokesperson told TechCrunch that Cosmos is “not designed to copy or infringe on protected works.”
“Cosmos learns in the same way humans learn,” the spokesperson said. “To support Cosmos Learning, we collect data from a variety of public and private sources and are confident that our use of the data is consistent with both the letter and spirit of the law. The facts about how the world works, that is, what the Cosmos Model learns, are not copyrightable or subject to the control of individual authors or companies.”
Aside from the fact that models like Cosmos don’t actually learn the way humans learn, copyright experts say claims like Nvidia’s, which draw support from fair use doctrine, are not subject to judicial oversight. He says it may be unbearable. Whether these companies win will depend largely on how courts decide whether fair use, which allows copyrighted material to be used to create something new as long as it is transformative, applies to AI training. Depends.
Nvidia claimed that given text or video frames, the Cosmos WFM model can generate “controllable, high-quality” synthetic data to bootstrap training of models such as robotics and driverless cars.
“Nvidia Cosmos’ suite of open models means developers can customize WFM with datasets such as video recordings of self-driving cars driving or robots moving through warehouses,” Nvidia said in a press release. I’m writing. “Cosmos WFM is purpose-built for physical AI research and development and can generate physically-based video from a combination of inputs such as text, images, video, robot sensor and motion data.”
According to Nvidia, companies including Waabi, Wayve, Fortellix, and Uber will pilot Cosmos WFM for a variety of use cases, from video search and curation to building AI models for self-driving cars. We are already working on it.
“Generative AI is driving the future of mobility and requires both rich data and extremely powerful computing,” Uber CEO Dara Khosrowshahi said in a statement. “We believe that by working with Nvidia, we can significantly accelerate the timeline for safe, scalable autonomous driving solutions for the industry.”
An important thing to note is that Nvidia’s world model is not “open source” in the strict sense of the word. To adhere to one widely accepted definition of “open source” AI, an AI model must provide sufficient information about its design and attribution to allow humans to “substantially” recreate the AI. You must disclose relevant details about your training data, including your training data. How to obtain or license your data.
Nvidia has not released details of its Cosmos WFM training data and has not made available all the tools needed to recreate the model from scratch. Perhaps that’s why the tech giants call their models “open” rather than open source.
“We hope[Cosmos]will do the same thing for the world of robotics and industrial AI as Llama did for the enterprise,” Nvidia CEO Jensen Huang said on stage at a press event Monday. I sincerely hope so.”