Researchers Build an "AI Scientist" — What Can It Do?

Can science be fully automated? A team of machine learning researchers is setting out to try.

Developed by a team from Tokyo firm Sakana AI and academic labs in Canada and the UK, the “AI Scientist” will carry out the full cycle of research, from reading existing literature on a problem to generating hypotheses for new developments, testing solutions and writing a paper. The AI Scientist will also do some of the work of peer reviewers and evaluate their own results.

AI Scientist joins a growing body of efforts to create AI agents that automate at least parts of the scientific process. “To my knowledge, no one has brought the entire scientific community together in one system,” says AI Scientist co-creator Cong Lu, a machine learning researcher at the University of British Columbia in Vancouver, Canada. The result, paper 1, was posted this month to the arXiv preprint server.

“It’s amazing that they did this from start to finish,” says computational social scientist Jevin West of the University of Washington in Seattle, “and I think we should try these ideas out, because they could help science.”

So far, the results haven’t been amazing, and the system can only conduct research in the field of machine learning itself. In particular, AI Scientist lacks the ability to do laboratory work, which most scientists consider a key part of scientific research. “There’s still a lot of work to do to go from an AI making hypotheses to having a robot scientist implement them,” says Gerbrand Ceder, a materials scientist at Lawrence Berkeley National Laboratory and the University of California, Berkeley. Still, Ceder adds, “As we look to the future, I have no doubt that a lot of science will move in this direction.”

Automated Experiments

AI Scientist is based on large-scale language models (LLMs). It starts by using a paper describing a machine learning algorithm as a template and searching the literature for similar research. The team then employed a technique called evolutionary computation, inspired by Darwinian evolution’s mutations and natural selection. It proceeds step-by-step, applying small random changes to the algorithm and selecting the changes that bring about efficiency gains.

To do this, AI scientists conduct their own “experiments” by running their algorithms and measuring their performance. Finally, they write up a paper and evaluate it in a sort of automated peer review. Having thus “extended the literature,” the algorithms can start the cycle again and build on their own results.

The authors acknowledge that the paper produced by the AI scientists contains only incremental progress. Some other researchers posted scathing comments on social media. “As a journal editor, I would probably desk reject the paper. As a peer reviewer, I would reject the paper,” said one commenter on the website Hacker News.

West also says the authors take a simplified view of how researchers find out about the current state of their field. Much of what researchers know comes from other forms of communication, like attending conferences or chatting with colleagues at the water cooler. “Science is not just a pile of papers,” West says. “Five minutes of conversation is more effective than five hours of studying the literature.”

West’s colleague Shahan Memon agrees, but both West and Memon praised the authors for fully opening up their code and results. This makes it possible to analyze AI Scientist’s results. For example, they found that the selection of previous papers it lists as references was “popularity biased,” leaning toward papers with high citations. Memon and West say they are also looking to measure whether AI Scientist’s choices were the most relevant ones.

Repetitive tasks

Of course, AI Scientist is not the first attempt to automate at least various parts of a researcher’s job. The dream of automating scientific discovery is as old as artificial intelligence itself, dating back to the 1950s, says Tom Hope, a computer scientist at the Jerusalem-based Allen Institute for AI. Already a decade ago, for example, Automatic Statistician2 could analyze data sets and write its own papers. Seder and his colleagues are also automating some bench work: a “robot chemist” unveiled last year can synthesize new materials and experiment with them3.

Hope says current LL.M. programs are “unable to formulate novel, useful scientific directions beyond a basic, superficial combination of buzzwords.” Still, Seder says that even if AI can’t do the more creative parts of the job right away, it can automate many of the more repetitive aspects of research. “At a low level, you’re trying to analyze what something is, how something reacts. That’s not the creative part of science, but it’s 90 percent of what we do.” Lu says he’s gotten similar feedback from many other researchers. “People will say, I have 100 ideas that I don’t have time for. Let’s get an AI scientist to do them.”

Lu says that extending AI Scientist’s capabilities to abstract areas outside of machine learning, such as pure mathematics, may require incorporating techniques beyond language models. For example, recent results from Google Deep Mind on solving mathematical problems show the power of combining LLMs with “symbolic” AI techniques. Symbolic AI builds logical rules into the system rather than relying solely on learning from statistical patterns in data. But the current iteration is just the beginning, he says. “We really believe this is the GPT-1 of AI science,” he says, referring to an early large-scale language model from San Francisco, California’s OpenAI.

West says the findings add to a recent conversation that has concerned many researchers: “My colleagues across different scientific disciplines are all trying to figure out where AI fits into our research. It forces us to think about what science is in the 21st century — what science is and what it’s not,” he says.

What's Hot

Teaching AI to communicate by voice like humans | Massachusetts Institute of Technology News

What to expect from ON Semiconductor’s next quarterly earnings report — TradingView News

AI-generated “slop” is slowly destroying the internet, so why isn’t anyone doing anything to stop it? | Arwa Mahdawi

Teaching AI to communicate by voice like humans | Massachusetts Institute of Technology News

AI-generated “slop” is slowly destroying the internet, so why isn’t anyone doing anything to stop it? | Arwa Mahdawi

Elon Musk agrees that AI training data is exhausted

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Most Popular

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Our Picks

Teaching AI to communicate by voice like humans | Massachusetts Institute of Technology News

AI-generated “slop” is slowly destroying the internet, so why isn’t anyone doing anything to stop it? | Arwa Mahdawi

Elon Musk agrees that AI training data is exhausted

Subscribe to Updates

What's Hot

Researchers Build an “AI Scientist” — What Can It Do?

Automated Experiments

Repetitive tasks

Related Posts

Subscribe to Updates