
Two new AI tools check for errors in research papers, including calculations, methodologies and references.Credit: Jose A. Barnat Bequete/Getty
Late last year, media worldwide warned that black plastic cookware contained concern levels of cancer-related flame retardants. The risks are found to be exaggerated. The mathematical errors in the underlying research suggest that when in fact 10 times lower the limit, important chemicals have exceeded their safe limits. A keen researcher quickly showed that the artificial intelligence (AI) model could have discovered an error in seconds.
The incident spurred two projects using AI to find errors in the scientific literature. The Black Spatula project is an open source AI tool that has analyzed approximately 500 papers on errors so far. The group, which has around eight active developers and hundreds of volunteer advisors, has yet to publish the error. Instead, they are directly approaching the affected authors, and Joaquin Gulloso, an independent AI researcher based in Cartagena, Colombia, helps coordinate the project. “Already, it’s catching a lot of errors,” says Garroso. “It’s a huge list. It’s just crazy.”
Another effort was called Yesnoerror and inspired by the Black Spatula project, says founder and AI entrepreneur Matt Schlicht. Funded by its own dedicated cryptocurrency, the initiative sets its sight even higher. “Like all papers, why don’t you experience it?” Schlicht says. He says their AI tools have analyzed over 37,000 papers in two months. The website flags papers that have discovered defects. While many of them have not yet been tested by humans, Schlicht says Yesnoerror has plans to do so on a scale.
In both projects, researchers hope to use the tool before submitting their work to the journal and use the journal before publishing it. This is to avoid mistakes, avoid fraud and advance into the scientific literature.
The project receives interim support from academic looses who work in research integrity. However, there are concerns about potential risks. Michel Neuetten, a researcher at Metasciens at the University of Tilburg in the Netherlands, says the tool must be able to find the error and make clear whether the claim has been verified. “If you start pointing your fingers at people and then it turns out you’ve made no mistakes, there may be a reputational damage,” she says.
Others add that there is risk and that the project should be careful about what they argue, but the goal is correct. James Heathers, a forensic metaxient player at Linnae University in Vekjojo, Sweden, is much easier to unravel the tinsel paper than to retract them. As a first step, AI can be used to triage papers for further scrutiny, says Heathers, who acted as a consultant for the Black Spatula project. “It was early days, but I’m supportive,” he adds.
ai sleuths
Many researchers have devoted their careers to finding concerns about integrity in their papers. Also, tools already exist to see specific aspects of paper. However, proponents hope that AI can perform a wider range of checks in one shot and process more papers.
Both the Black Spatula Project and Yesnoerror use large-scale language models (LLM) to find various errors, including facts and papers such as calculations, methodologies, references, and more.
The system first extracts information from the paper, including tables and images. Next, you create a complex set of instructions known as prompts. This dictates the “inference” model (specialist type LLM). The model may scan for different types of errors each time, or analyze the paper multiple times to cross-check the results. The cost of analyzing each paper can range from 15 cents to a few dollars, depending on the length of the paper and the series of prompts used.
False-positive speed is a big hurdle when claiming an error when there is no AI. Currently, the Black Spatula project’s system is wrong about the errors of about 10% of the time, says Gulloso. Steve Newman, a software engineer and entrepreneur who founded the Black Spatula Project, must review each error with the subject expert and determine that it is the biggest bottleneck of the project.
So far, Schlicht’s YesnoError team has quantified false positives with just 100 mathematical errors found in the first batch of 10,000 AIs. Of the 90% of authors who responded to Schlicht, he says, except for one person who agreed that the detected errors were valid. Ultimately, Yesnoerror plans to work with Researchhub, a platform that conducts peer reviews to pay doctoral scientists at Cryptocurrency. When AI checks the paper, Yesnoerror triggers a request to check the results, which has not started yet.