A large-scale language model built in China called Deepseek-R1 is thrilling scientists as an affordable and open rival to “inference” models such as Openai’s O1.
These models generate responses in stages, a process similar to human reasoning. This makes it more proficient than earlier language models in solving scientific problems and potentially useful in research. Initial testing of R1, released on January 20th, shows that its performance on certain tasks in chemistry, math, and coding is on par with O1.
“This is wild and completely unexpected,” Elvis Saravia, an AI researcher and co-founder of UK-based AI consultancy Dair.ai, wrote in X.
The R1 stands out for another reason. Deepseek, the Hangzhou startup that built the model, released it as an “open weight.” This means researchers can study and build algorithms. This model, released under the MIT license, can be freely reused, but cannot be considered fully open source as no training data is made available.
“Deepseek’s openness is quite surprising,” says Mario Krenn, head of the Artificial Scientist Laboratory at the Max Planck Light Scientific Institute in Erlangen, Germany. By comparison, O1 and other models built by O3 and Openai in San Francisco, Calif., including the latest effort, O3, are “essentially black boxes,” he says.
AI hallucinations cannot be stopped, but these techniques may limit their damage
DeepSeek has not released the full cost of training R1, but charges users approximately 1/30th of the cost of running O1. The company also created a Mini “Distilled” version of R1 to allow researchers with limited computing power to play with the model. “Experiments that cost over 300 pounds in O1 cost less than $10 in R1,” says Krenn. “This is a dramatic difference and will certainly play a role in future recruitment.”
challenge model
R1 is part of China’s leading language model (LLMS) boom. Spun out of a hedge fund, Deepseek emerged from relative obscurity last month when it released a chatbot called V3. Experts estimate that it will cost around $6 million to rent the hardware needed to train the model, compared to more than $60 million for Meta’s Llama 3.1 405b.
Part of the talking point during Deepseek was its success in creating R1 despite U.S. export controls that limit Chinese companies’ access to the best computer chips designed for AI processing. “The fact that it’s coming out of China shows that being efficient with your resources is more important than computational scale alone,” said Dr. says François Charette.
Deepseek’s advances suggest that “the perceived lead we once had has become narrower,” said Alvin Wang Graylin, a Bellevue, Wash., technology expert in Taiwan. I work at HTC, an immersive technology company based in the US. Countries should pursue a collaborative approach to building advanced AI versus continuation with the current favorable arms race approach. ”
chain of thoughts
LLMS trains on billions of samples of text and cuts them into word parts called “tokens” to train learning patterns on the data. These associations allow the model to predict subsequent tokens in the sentence. However, LLMs tend to invent facts, a phenomenon called “hallucinations”, and often have trouble reasoning through problems.