Targeted at processor and system architects from industry and academia, the Deep Technology Conference has become a key forum for the trillion-dollar data center computing market.
Next week at Hot Chips 2024, NVIDIA senior engineers will present the latest advancements powering the NVIDIA Blackwell platform, as well as research on liquid cooling for data centers and AI agents for chip design.
How do they share it:
NVIDIA Blackwell brings together multiple chips, systems, and NVIDIA CUDA software to power next-generation AI across use cases, industries, and countries. NVIDIA GB200 NVL72 is a multi-node, liquid-cooled, rack-scale solution connecting 72 Blackwell GPUs and 36 Grace CPUs, raising the bar for AI system design. NVLink interconnect technology provides communication between all GPUs, enabling record-high throughput and low-latency inference for generative AI. NVIDIA Quasar Quantization System pushes the boundaries of physics to accelerate AI computing. NVIDIA researchers are building AI models that help build processors for AI.
The NVIDIA Blackwell talk on Monday, August 26, will provide further details about the new architecture and also include examples of generative AI models running on Blackwell silicon.
Leading up to this will be three tutorials on Sunday, August 25th, explaining how hybrid liquid cooling solutions can help data centers transition to a more energy-efficient infrastructure and how AI models such as large language model (LLM)-powered agents can help engineers design next-generation processors.
These presentations will showcase how NVIDIA engineers are innovating in all areas of data center computing and design to deliver unprecedented performance, efficiency and optimization.
Get ready for Blackwell
NVIDIA Blackwell is the ultimate full-stack computing challenge, comprised of multiple NVIDIA chips, including Blackwell GPUs, Grace CPUs, BlueField data processing units, ConnectX network interface cards, NVLink switches, Spectrum Ethernet switches and Quantum InfiniBand switches.
NVIDIA Architecture Directors Ajay Tirumala and Raymond Wong will give a first look at the platform and explain how these technologies work together to enable new standards in AI, faster computing performance and improved energy efficiency.
The multi-node NVIDIA GB200 NVL72 solution is a perfect example. LLM inference requires low latency and high throughput token generation. Working as a unified system, the GB200 NVL72 can accelerate inference for LLM workloads by up to 30x, enabling trillion-parameter models to run in real time.
Tirumala and Wong will also explain how the NVIDIA Quasar Quantization System, which combines algorithmic innovations, NVIDIA software libraries and tools, and Blackwell’s second-generation Transformer Engine, supports high accuracy with low-precision models, and present examples using LLM and visual generative AI.
Cooling the Data Center
The traditional noise of air-cooled data centers may become a thing of the past as researchers develop more efficient and sustainable solutions using hybrid cooling, a combination of air and liquid cooling.
Liquid cooling technology moves heat away from systems more efficiently than air, helping to keep computing systems cooler even when they’re handling heavy workloads. And because liquid cooling equipment takes up less space and consumes less power than air-cooling systems, data centers can add more computing power by adding more server racks within their facilities.
Ali Heydari, director of data center cooling and infrastructure at NVIDIA, presents some hybrid-cooled data center designs.
Some designs involve retrofitting existing air-cooled data centers with liquid cooling units, providing a quick and easy solution to add liquid cooling capabilities to existing racks. Other designs require installing piping to deliver liquid cooling directly to the chips, using cooling distribution units or immersing the entire server in an immersion cooling tank. These options require a large upfront investment, but can provide significant savings in both energy consumption and operational costs.
Heydari will also share his team’s work as part of COOLERCHIPS, a U.S. Department of Energy program to develop advanced data center cooling technologies. As part of this project, the team is using the NVIDIA Omniverse platform to create a physics-based digital twin to model energy consumption and cooling efficiency to optimize data center design.
AI agents enter processor design
Designing semiconductors is an enormous challenge on a microscopic scale: Engineers developing cutting-edge processors pack as much computing power as possible onto slabs of silicon just a few inches wide, testing the limits of what is physically possible.
AI models support engineers by improving design quality and productivity, streamlining manual processes, and automating time-consuming tasks. Models include prediction and optimization tools that help engineers quickly analyze and improve designs, and LLMs that help engineers answer questions, generate code, and debug design problems.
Mark Ren, director of design automation research at NVIDIA, will provide an overview of these models and their uses in a tutorial. The second session will focus on agent-based AI systems for chip design.
AI agents powered by LLM can be directed to complete tasks autonomously, enabling a wide range of applications across industries. In microprocessor design, NVIDIA researchers are developing agent-based systems that can reason and act using customized circuit design tools, interact with experienced designers, and learn from databases of human and agent experience.
NVIDIA experts are not just building this technology, they are using it in practice. Ren will provide examples of how engineers can use AI agents for timing report analysis, cell cluster optimization processes, and code generation. The cell cluster optimization work was recently awarded the Best Paper Award at the 1st IEEE International LLM Assisted Design Workshop.
Register for Hot Chips, taking place August 25-27 at Stanford University and online.