From scientists manipulating numbers to paper mills mass-producing fake paper, questionable manuscripts have long plagued academic literature. Scientific detectives are working tirelessly to uncover this fraud in order to set the scientific record straight. But their job is becoming increasingly difficult with the introduction of a powerful new tool for fraudsters: generative artificial intelligence (AI).
“Generative AI is evolving very rapidly,” says Jana Christopher, image integrity analyst at FEBS Press in Heidelberg, Germany. “People in my field, which is image integrity and publishing ethics, are increasingly concerned about the possibilities it poses.”
AI-generated images and videos are here. How do they shape research?
The ease with which generative AI tools can create text, images, and data has raised concerns that the scientific literature, full of fake figures, manuscripts, and conclusions that are difficult for humans to spot, will become increasingly unreliable. Masu. An arms race has already erupted as integrity experts, publishers, and technology companies race to develop AI tools that can help quickly detect deceptive AI-generated elements in papers.
“It’s a scary development,” Christopher said. “But there are some smart people and some good structural changes that are being proposed.”
Research integrity experts say that while AI-generated text is already allowed in many journals in some circumstances, using such tools to create images and other data could be acceptable. It is said that the gender is low. “In the near future, it may be okay to use AI-generated text,” says Elizabeth Bick, an image forensics expert and consultant in San Francisco, California. “But I draw the line at data generation.”
What ChatGPT and generative AI mean for science
Bik, Christopher and colleagues argue that data containing images fabricated using generative AI is already widely available in the literature, and that paper mills may be using AI tools to mass-produce manuscripts. I doubt it (see Quiz: Can you spot a fake AI?). ).
under the radar
Accurately identifying images generated by AI poses significant challenges. Often they are almost impossible to distinguish from real images, at least to the naked eye. “I feel like we encounter AI-generated images every day,” Christopher says. “But unless you can prove it, there’s not much you can do.”
There are some obvious examples of generative AI being used in scientific images, such as the now-infamous drawing of a rat with absurdly large genitals and a nonsensical label created using the image tool Midjourney. The illustration, published in the journal Journal in February, sparked a social media storm and was retracted days later.
Most cases are not so obvious. Figures created in Adobe Photoshop or similar tools before the rise of generative AI (particularly in molecular and cell biology) have certain features, such as identical backgrounds and an unusual absence of dirt and grime. They often contain telltale signs that detectives can spot. Figures created with AI often lack these signs. “I see a lot of papers that say Western blots aren’t real, but there’s no definitive answer,” says Bick. “All I can say is that it just looks weird. Of course, that’s not enough evidence to write a letter to the editor.”
However, there are signs that AI-generated figures are appearing in published manuscripts. Text written using tools like ChatGPT is increasing in papers, replacing standard chatbot phrases that authors forget to delete and telltale words that AI models tend to use. Masu. “So we have to assume that the same thing is happening with data and images,” Bick says.
Another clue that scammers are using sophisticated imaging tools is that most of the issues detectives are currently detecting are in papers that are several years old. “Image problems have become less and less common over the last few years,” says Bick. “I think most people who get caught manipulating images have moved on to creating cleaner images.”
How to create an image
Creating beautiful images using generative AI is not difficult. Scientific image detective Kevin Patrick, known as Cheshire on social media, demonstrated how easy it is and posted his results on X. Patrick used Photoshop’s AI tool Generative Fill to create realistic images that could potentially be published in scientific papers. Can be used for analysis of tumors, cell culture, western blots, etc. Most images took less than a minute to create (see “Generating Fake Science”).
“If I can do this, I’m sure people who are paid to generate false data will do this,” Patrick said. “There’s probably a lot more data that could be generated using tools like this.”
Some publishers say they have found evidence of AI-generated content in published studies. These include PLoS, which receives alerts about suspicious content and, through internal investigations, updates AI-generated text from papers and submissions, said Renee Hock, editor-in-chief of the PLoS publication ethics team in San Francisco, California. They say they have found data evidence. (Hoch points out that the use of AI is not prohibited at PLoS journals and that its AI policy focuses on author accountability and transparent disclosure.)
Other tools may also provide opportunities for people who want to create fake content. Last month, researchers announced a generative AI model for creating high-resolution microscopic images1, but some integrity experts have expressed concerns about the work. “This technology can be easily used by malicious people to generate hundreds or thousands of fake images instantly,” Bick says.
The tool’s creator, Yoav Shechtman from the Technion-Israel Institute of Technology in Haifa, says the tool is useful for creating training data for models because high-resolution microscopy images are difficult to obtain. . However, users have little control over the output, making it useless for producing fakes, he added. He suggests that existing image processing software, such as Photoshop, is more convenient for manipulating shapes.
eliminate fakes
The human eye may not be able to pick up the generated images created by AI, but AI may be able to (see “AI images are hard to spot”).
Makers of tools like Imagetwin and Proofig, which use AI to detect integrity issues in scientific numbers, are extending their software to eliminate images created by generative AI. These images are extremely difficult to detect, so the companies are creating their own database of generated AI images to train their algorithms.
AI models fed with AI-generated data immediately spit out nonsense.
Proofig has already released the ability to detect microscopic images generated by AI into its tools. Dror Kolodkin-Gal, the company’s co-founder in Rehovot, Israel, said he tested thousands of AI-generated images and real images from papers and found that the algorithm identified AI images 98% of the time. It said the false positive rate was 0.02%. Dror added that the team is currently working on understanding what exactly the algorithm detects.
“I have high expectations for these tools,” Christopher says. But she points out that their work always needs to be evaluated by experts who can verify the issues they raise. Christopher has yet to see evidence that AI image detection software is reliable (Proofig’s internal evaluations are not publicly available). These tools are “certainly very useful, although limited, as they mean we can expand our efforts to screen submissions,” she added.
Multiple publishers and research institutions are already using Proofig and Imagetwin. For example, scientific journals use Proofig to scan for image integrity issues. Megan Phelan, communications director for the Department of Science in Washington, D.C., said the tool has yet to discover any AI-generated images.
Springer Nature, which publishes Nature, is developing its own text and image detection tools called Geppetto and SnapShot. These tools detect anomalies and are evaluated by humans. (The Nature news team is editorially independent from the publisher.)
Scammers, beware
Publishing organizations are also taking steps to address AI-generated images. A spokesperson for the International Society of Science, Technology and Medicine (STM) Publishers in Oxford, UK, said it was taking the issue “very seriously” and said it was taking the issue “very seriously” and said it was working with paper mills such as United2Act and the STM Integrity Hub. He pointed out the efforts. Other scientific integrity issues.
ChatGPT 1 year later: Who is using it, how, and why?
Christopher, who chairs the STM Working Group on Image Tampering and Copying, said he has been working to improve the quality of his work, including labeling images taken from microscopes with invisible watermarks similar to those used in AI-generated text. He said there is a growing recognition that methods of validating data need to be developed. — that might be the way forward. This will require new technology and new standards for equipment manufacturers, she added.
Patrick and others worry that publishers won’t act fast enough to address the threat. “We’re concerned that this is a new generation problem in the literature that won’t be solved until it’s too late,” he says.
Still, some are optimistic that the AI-generated content published in papers today will be discovered in the future.
“I’m confident that technology will advance to the point where it can detect the work that’s being done, because at some point it will be considered relatively crude,” Patrick said. “Cheaters shouldn’t sleep well at night. They can game the process today, but I don’t think they can game the process forever.”