Nvidia’s challengers are seizing a new opportunity to crack the advantages of artificial intelligence chips after Chinese startup DeepSeek accelerates changes in AI computing requirements.
Other so-called “inference” models, such as Deepseek’s R1 and Openai’s O3 and Anthropic’s Claude 3.7, consume more computing resources than previous AI systems when users request a process known as “inference.”
This has reversed the focus of AI computing demand. This was focused on training or modeling until recently. Inference is expected to become a large part of the technology needs as demand increases among individuals and businesses for applications beyond today’s popular chatbots, such as ChatGpt and Xai’s Grok.
Here, Nvidia’s competitors focus on efforts to disrupt the world’s most valuable semiconductor companies, from start-ups with AI chip makers such as Cerebras and GROQ to custom accelerator processors from leading high-tech companies such as Google, Amazon, Microsoft and Meta.
“Training allows AI and inference to use AI,” says Andrewfeldman, CEO of Celebras. “And the use of AI has gone through the roof. . We have a bigger opportunity to create chips that are much better with reasoning than training.”
Nvidia dominates the market for huge computing clusters, including Elon Musk’s Xai facility in Memphis and Openai’s Stargate project with SoftBank. But the investor is looking for the peace of mind that he can take over his rivals in a much smaller data center under construction focused on reasoning.
Vipul Ved Prakash, CEO and co-founder of AI, an AI-focused cloud provider that was valued at $3.3 billion last month in the round led by General Catalyst, said reasoning is a “big focus” for his business. “I believe that performing inference on a scale will be the biggest workload on the internet at some point,” he said.
Analysts at Morgan Stanley estimate that over 75% of the power and computational demand in US data centers will be useful for inference in the coming years, but have warned of “significant uncertainty” about how the transition will work.
Still, that means that if AI use continues to grow at its current pace, investment worth hundreds of millions of dollars could flow towards inference facilities over the next few years.
Barclays analysts will surpass training over the next two years, rising from $12.26 billion in 2025 to $20.82 billion in 2026.

Barclays predicts that Nvidia has “essentially 100% market share” in frontier AI training, but will only provide 50% of inference calculations “in the long term.” This led to the company’s rivals having made nearly $200 billion in tip spending by 2028.
“There’s a big pull towards better, faster, more efficient (chips),” said Walter Goodwin, founder of UK-based chip startup Fractile. Cloud computing providers are keen on Nvidia’s “that blocks overdependence,” he added.
Nvidia’s CEO Jensen Huang argued that his company’s chips are as powerful for reasoning as training as he sees enormous new market opportunities.
The latest Blackwell chips from US companies are designed to better handle inferences, and many early customers of these products use them to provide AI systems rather than training. The popularity of software based on its own Cuda architecture among AI developers also presents a formidable barrier to its competitors.
Huang said in a revenue call last month that “the amount of inference required is already more than 100 times more inference calculations than when the massive language model began.” “And that’s just the beginning.”
The cost of providing responses from LLMS has dropped rapidly over the past two years, driven by a combination of stronger chips, more efficient AI systems and fierce competition between AI developers such as Google, Openai and humanity.
“The cost of using a certain level of AI is about 10 times every 12 months, and a drop in price leads to more use,” Openai CEO Sam Altman said in a blog post last month.
Deepseek’s V3 and R1 models, which caused stock market panic in January, helped to further reduce inference costs thanks to the architectural innovation and coding efficiency of Chinese startups.
At the same time, the type of processing required for an inference task (which could include much larger memory requirements to answer longer and more complex queries) opens the door to alternatives to Nvidia’s graphics processing units, and its advantage is that it handles a huge amount of similar calculations.
“Hardware inference performance is the ability to move in and out of memory (moving data),” said Feldman of Cerebras.
Recommended
Speed is essential to attracting users, Feldman said. “One of the things Google (search) showed 25 years ago is to reduce the attention of viewers even in microseconds (of delay),” he said. “We’re writing Le Chat answers when (Openai’s) O1 took 40 cases from time to time.”
Nvidia argues that its chips are as strong as training, showing a 200x improvement in inference performance over the past two years. It says that hundreds of millions of users will access AI products through today’s millions of GPUs.
“Our architecture is alternative and easy to use in all of these different ways,” Huang said last month that he was building large models or offering AI applications in new ways.
Prakash, who counts Nvidia as an investor, said he uses the same Nvidia chip for reasoning and training today. This is “very convenient.”
Unlike Nvidia’s “generic” GPUs, inference accelerators work best when tailored to a particular type of AI model. In a rapidly moving industry, it could prove a problem for chip startups betting on the wrong AI architecture.
“I think one of the advantages of general purpose computing is that it’s more flexible as the model architecture is changing,” Prakash said, “My feeling is that there are complex combinations of silicon over the next few years.”
Additional Reports by Michael Acton of San Francisco