Nvidia unveiled a prototype AI avatar that resides on your PC’s desktop at CES 2025. R2X, an AI assistant, looks like a video game character and helps you navigate apps on your computer.
R2X avatars are rendered and animated using Nvidia’s AI models, and users can run their avatars in a popular LLM of their choice, such as OpenAI’s GPT-4o or xAI’s Grok. Users can also talk to R2X through text and voice, upload and process files, and enable the AI assistant to see what’s happening live on the screen and camera.
Technology companies are creating a lot of AI avatars these days, not just for video games, but also for businesses and consumers. Although the early demos are strange, some believe these avatars are a promising user interface for AI assistants. With R2X, Nvidia is combining generative video game capabilities with cutting-edge LLM to create an AI assistant that looks and feels human-like.
The company plans to open source these avatars in the first half of 2025. Nvidia sees this as a new user interface for developers to build, allowing users to plug in their favorite AI software products and even run these avatars locally.
Similar to Microsoft’s Recall feature (which was postponed due to privacy concerns), R2X can continuously take screenshots of your screen and run them through an AI model for processing, but this feature is turned off by default. It has become. When turned on, it provides feedback about the applications running on your computer, helping you with complex coding tasks, for example.
R2X is still a prototype, and even Nvidia admits there are still some bugs to work out. In TechCrunch’s demo, Nvidia’s avatar had an uncanny valley feel to it, with faces sometimes stuck in odd positions and the tone sometimes coming off as a bit aggressive. And in general, it feels a little weird to have a humanoid avatar staring at me while I’m at work.
R2X typically provided helpful instructions and displayed exactly what was shown on the screen. However, at one point the avatar gave the wrong instructions and after that the avatar could no longer see the screen at all. This could be an issue with the underlying AI model (GPT-4o in this case), but this example illustrates the limitations of this early technology.
In one demo, Nvidia product leaders showed how R2X can display apps on the screen and assist users. Specifically, R2X helped us use Adobe Photoshop’s Generative Fill feature. The photo we chose is of Nvidia CEO Jensen Huang standing in an Asian restaurant with two restaurant employees. Nvidia’s avatar hallucinated and gave incorrect instructions on where to find Photoshop’s Generative Fill feature. The avatar was then unable to see the screen, but after switching the AI model used to xAI’s Grok, the avatar was able to see the screen.
In another demo, R2X was able to pull in a PDF from the desktop and answer questions about it. This process is powered by Local Search Augmentation Generation (RAG) functionality, which allows these AI avatars to retrieve information from documents and process it using the underlying LLM.
Nvidia uses several AI models from its video game division to enhance the look of these avatars. To generate the avatars, Nvidia uses the RTX neural face algorithm. To automate facial, lip, and tongue movements, Nvidia is using a new model called Audio2Face™-3D. The model seemed to stall at several points, holding the avatar’s face in an awkward position.
The company also says these R2X avatars can join Microsoft Teams meetings as personal assistants.
Nvidia’s product chief said the company also plans to give these AI avatars agent capabilities, allowing R2X to one day perform actions on the desktop. Achieving these capabilities is a long road ahead and will likely require partnerships with software manufacturers such as Microsoft and Adobe, who are looking to develop similar agent systems in-house.
It’s not immediately clear how Nvidia is producing audio in these products. R2X’s voice when using GPT-4o sounds different than any of ChatGPT’s preset voices, but xAI’s Grok chatbot still doesn’t have any voice modes at all.
TechCrunch has a newsletter focused on AI. Sign up here to get it delivered to your inbox every Wednesday.