Mark Zuckerberg said Meta’s Llama 4 AI models are trained on the industry’s largest GPU cluster. During Meta’s earnings call, he said the cluster is “bigger than 100,000 H100s.” Elon Musk said xAI uses 100,000 Nvidia’s H100 GPUs to train the Grok chatbot.
Elon Musk talked about his AI startup’s vast inventory of in-demand Nvidia chips. Now it’s Mark Zuckerberg’s turn to be flexible.
Zuckerberg said Meta has put more computing power into training its upcoming Rama 4 artificial intelligence model than its competitors.
During Meta’s third-quarter earnings call on Wednesday, Meta’s CEO said Rama 4 is “well ahead in development” and is being trained on a larger cluster of graphics processing units than its rivals.
“We are training Llama 4 models on clusters larger than 100,000 H100s, or larger than what others have reported doing,” he said.
The 100,000 number may refer to Musk’s AI startup, xAI, which launched the Colossus supercomputer in the summer. Tesla’s CEO called it “the world’s most powerful AI training system” and said xAI uses 100,000 Nvidia H100 GPUs to train the Grok chatbot.
Nvidia’s H100 chip, also known as Hopper, is in high demand by tech giants and AI startups for its computing power and training of large-scale language models. The cost per chip is estimated at $30,000 to $40,000.
Related articles
The number of H100s that a company has accumulated is taken into account when hiring top AI talent. Perplexity CEO Aravind Srinivas said in a podcast interview that the topic came up when he was trying to pull someone out of the meta.
“I tried to hire senior researchers at Meta, and you know what they said? ‘When you get 10,000 H100 GPUs, come back to me,'” Srinivas said. said in March.
Meta released Llama 3 models in April and July. Zuckerberg added on Wednesday’s earnings call that Meta’s Rama 4 model will have “new methodologies, features, and stronger inference” and will be “much faster.” A smaller model will likely be ready for launch soon, perhaps in early 2025, he said.
Asked about Meta’s large spending on AI, Zuckerberg said that Meta was building out its AI infrastructure faster than expected and that even at higher costs, “it’s not what investors want.” “Maybe not, but I’m happy with how well the team is doing it.” to listen. ”
Meta expects capital spending to continue increasing next year as AI infrastructure expands.
Meta CEO declined to say exactly how large the company’s H100 chip cluster is. Meanwhile, Musk said on X earlier this week that xAI will double its cluster size to 200,000 H100 and H200 chips in the coming months.