Robotics developers can significantly accelerate the development of AI-enabled robots, including humanoids, with new AI and simulation tools and workflows that NVIDIA announced this week at the Conference on Robotics Learning (CoRL) in Munich, Germany. .
This lineup includes the general availability of the NVIDIA Isaac Lab robot learning framework. Six new humanoid robot learning workflows for Project GR00T, an effort to accelerate humanoid robot development. It also includes new world model development tools for curation and processing of video data, including NVIDIA Cosmos tokenizer for video processing and NVIDIA NeMo Curator.
The open source Cosmos tokenizer provides superior visual tokenization for robot developers by decomposing images and videos into high-quality tokens with very high compression rates. Running up to 12x faster than current tokenizers, NeMo Curator delivers video processing curation up to 7x faster than non-optimized pipelines.
In conjunction with CoRL, NVIDIA also published 23 papers and 9 workshops related to robot learning, and released training and workflow guides for developers. Additionally, Hugging Face and NVIDIA announced that they will collaborate with LeRobot, NVIDIA Isaac Lab, and NVIDIA Jetson to accelerate open source robotics research for the developer community.
Accelerate robot development with Isaac Lab
NVIDIA Isaac Lab is an open source robot learning framework built on NVIDIA Omniverse, a platform for developing OpenUSD applications for industrial digitization and physical AI simulation.
Developers can use Isaac Lab to train robot policies at scale. This open source, integrated robot learning framework applies to all embodiments, from humanoids to quadrupeds to cobots, to handle increasingly complex movements and interactions.
Leading commercial robot manufacturers, robot application developers, and robotics research institutes around the world rely on Isaac Lab. These include 1X, Agility Robotics, The AI Institute, Berkeley Humanoid, Boston Dynamics, Field AI, Fourier, Galbot, Mentee Robotics, Skild AI, and Swiss. – Miles, Unitree Robotics, XPENG Robotics.
Project GR00T: Foundation of a general-purpose humanoid robot
Building advanced humanoids is extremely difficult and requires a multi-layered technical and interdisciplinary approach to enable robots to effectively perceive, navigate, and learn skills for human-robot and robot-environment interaction. is required.
Project GR00T is an effort to develop accelerated libraries, foundational models, and data pipelines to accelerate the global humanoid robot developer ecosystem.
Six new project GR00T workflows provide humanoid developers with a blueprint for achieving the most challenging humanoid robot capabilities. They include:
GR00T-Gen builds OpenUSD-based 3D environments powered by generative AI. GR00T-Mimic generates robot movements and trajectories. GR00T-Dexterity realizes dexterous robot operation. GR00T-Control performs whole body control. GR00T-Mobility enables robot movement and navigation. GR00T-Perception enables robot movement and navigation. For multimodal sensing
“Humanoid robots are the next wave of embodied AI,” said Jim Huang, senior research manager of embodied AI at NVIDIA. “NVIDIA research and engineering teams collaborated internally and across our developer ecosystem to build Project GR00T to accelerate the advancement and development of humanoid robot developers worldwide.”
New development tools for model builders around the world
Today, robot developers are building world models, or AI representations of the world that can predict how objects and the environment will react to the robot’s actions. Building these world models is incredibly computationally and data intensive, with the models requiring thousands of hours of hand-picked real-world image or video data.
NVIDIA Cosmos tokenizers simplify the development of these world models by providing efficient, high-quality encoding and decoding. They set new standards in minimizing distortion and temporal instability, enabling high-quality video and image reconstructions.
Providing high-quality compression and up to 12x faster visual reconstruction, the Cosmos tokenizer paves the way for scalable, robust, and efficient generative application development across a wide range of visual domains.
Humanoid robot company 1X has updated its 1X World Model Challenge dataset to use the Cosmos tokenizer.
“NVIDIA Cosmos tokenizers provide extremely high temporal and spatial compression of data while maintaining visual fidelity,” said Eric Jang, vice president of AI at 1X Technologies. . “This allows us to train world models using long-range video generation in an even more computationally efficient way.”
Other humanoid and general purpose robot developers, such as XPENG Robotics and Hillbot, are developing with NVIDIA Cosmos tokenizers to manage high-resolution images and video.
NeMo Curator now includes a video processing pipeline. This allows robot developers to process large-scale text, image, and video data to improve the accuracy of world models.
Curation of video data is challenging due to its huge size and requires scalable pipelines and efficient orchestration for load balancing across GPUs. Additionally, filtering, captioning, and embedding models must be optimized to maximize throughput.
NeMo Curator overcomes these challenges by streamlining data curation and significantly reducing processing time with automatic pipeline orchestration. Supports linear scaling across multi-node, multi-GPU systems to efficiently process over 100 petabytes of data. This simplifies AI development, lowers costs, and speeds time to market.
Promoting robot learning communities with CoRL
Nearly 20 research papers published by the NVIDIA Robotics team with CoRL explore the integration of vision language models for improved environmental understanding and task execution, temporal robot navigation, the development of long-term planning strategies for complex multi-step tasks, and skill improvement. Breakthroughs in the use of human demonstration for learning are highlighted.
Groundbreaking papers on humanoid robot control and synthetic data generation include SkillGen, a system based on synthetic data generation to train robots with minimal human demonstration; There are basic robot models such as HOVER.
NVIDIA researchers will also participate in nine workshops at the conference. Click here to learn more about the full schedule of events.
availability
NVIDIA Isaac Lab 1.2 is available now and open source on GitHub. NVIDIA Cosmos tokenizer is available now on GitHub and Hugging Face. NeMo Curator for video processing will be available later this month.
New NVIDIA Project GR00T workflows are coming soon to help robotics companies more easily build humanoid robot capabilities. Visit the NVIDIA Technical Blog to learn more about the workflow.
Researchers and developers learning how to use Isaac Lab now have access to developer guides and tutorials, including a migration guide from Isaac Gym to Isaac Lab.
Catch the latest in robot learning and simulation at the OpenUSD Insider Livestream on Robot Simulation and Learning on November 13th. You can also attend NVIDIA Isaac Lab office hours for hands-on support and insight.
Developers can apply to join the NVIDIA Humanoid Robot Developer Program.