Elon Musk and the team behind xAI have accomplished an engineering marvel by setting up a supercluster of 100,000 H200 Blackwell GPUs in an incredible 19 days. Nvidia CEO Jensen Huang talked about the incredible prowess Elon Musk has installed on the X with members of Tesla Owners Silicon Valley.
Fan described Musk’s 19-day escapade with awe and respect, calling the effort “superhuman.” The xAI team is said to have gone from the “concept” stage to full compatibility with Nvidia’s “gear” within three weeks. This includes running xAI’s first AI training run on the newly built supercluster.
Elon Musk is a superhuman. What would have taken anyone else a year took him only 19 days. pic.twitter.com/q51sM48lsuOctober 13, 2024
From start to finish, this process included building a massive X-factory populated with GPUs and equipping the entire factory with liquid cooling and power to run all 200,000 GPUs. . Not to mention the coordination between Nvidia and Elon Musk’s engineering teams to ensure all the hardware and infrastructure was shipped and installed in a precise and coordinated manner.
Huang says it would take the average data center four years to do what Elon Musk and his team were able to do in 19 days. Only three of those years are spent planning, and the last year is spent shipping the equipment, installing it, and getting everything up and running.
Huang also details how complex the network on Nvidia’s hardware is. He explains that networking Nvidia equipment is different from networking traditional data center servers. “The number of wires that go through one node…the back of the computer is all wires.”
Elon Musk’s integration of 100,000 H200 GPUs “has never been done before” (according to Jensen Huang) and probably won’t be duplicated by another company again, at least not for a very long time. Probably.