SAN FRANCISCO (AP) — Tech giant OpenAI touted its artificial intelligence-powered transcription tool Whisper as approaching “human-level robustness and accuracy.”
However, Whisper has a major flaw. According to interviews with more than a dozen software engineers, developers, and academic researchers, Whisper tends to compose chunks of text and even entire sentences. These experts say some of the invented texts (known in the industry as hallucinations) can include racial commentary, violent rhetoric, and even imagined cures. said.
Experts say the hoaxes are problematic because Whisper is used in many industries around the world to translate and transcribe interviews, generate text in common consumer technologies, and subtit videos. He said there is.
More worryingly, medical centers are using Whisper-based tools to transcribe patient consultations with doctors, despite OpenAI’s warning that the tool should not be used in “high-risk areas.” The reason for this is that people are using it in a hurry.
Although it is difficult to pinpoint the full extent of the problem, researchers and engineers said they frequently encountered Whisper hallucinations while on the job. For example, researchers at the University of Michigan studying public meetings said they found hallucinations in eight out of 10 audio transcripts they examined before they began refining their model.
One machine learning engineer said he first discovered hallucinations in about half of the more than 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly all of the 26,000 transcripts he created with Whisper.
This problem still occurs even with short, well-recorded audio samples. A recent study by computer scientists found 187 hallucinations from over 13,000 clear audio fragments examined.
Researchers say this trend could result in tens of thousands of incorrect transcriptions of millions of recordings.
Alondra Nelson, who led the White House Office of Science and Technology Policy in the Biden administration until last year, said such mistakes can have “very serious consequences,” especially in hospital settings.
“No one wants to be misdiagnosed,” says Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”
Whisper is also used to create closed captions for the deaf and hard of hearing, people who are at special risk of transcription failure. Christian Vogler, who is deaf and directs Gallaudet University’s Technology Access Program, identifies that fabrications are “hidden among other texts” for deaf and hard-of-hearing people. He said it was because there was no other way.
OpenAI seeks to address the problem
In response to this widespread illusion, experts, advocates, and former OpenAI employees are calling on the federal government to consider regulating AI. At the very least, OpenAI needs to address this flaw, they said.
“I think this problem is solvable if the company is willing to make it a priority,” said William Saunders, a San Francisco-based research engineer who left OpenAI in February over concerns about the company’s direction. “I can do it,” he said. “The problem is that when you put this out there, people become overconfident about what it can do and how it can be integrated into all the other systems.”
An OpenAI spokesperson said the company is continually researching ways to reduce hallucinations and appreciates the researchers’ findings, adding that OpenAI incorporates feedback into model updates. .
Most developers assume that transcription tools will misspell words or make other errors, but engineers and researchers have found that no AI-powered transcription tool can cause as many hallucinations as Whisper. He says he has never seen it.
whisper hallucination
The tool is integrated into some versions of ChatGPT, OpenAI’s flagship chatbot, and is built into cloud computing platforms from Oracle and Microsoft that serve thousands of businesses around the world. . It is also used to transcribe text and translate it into multiple languages.
Last month alone, one of the latest versions of Whisper was downloaded more than 4.2 million times from open source AI platform HuggingFace. Sanchit Gandhi, a machine learning engineer at the school, said Whisper is the most popular open source speech recognition model, built into everything from call centers to voice assistants.
Professor Alison Konecke of Cornell University and Mona Sloan of the University of Virginia examined thousands of short snippets obtained from TalkBank, a research repository hosted by Carnegie Mellon University. They determined that almost 40% of hallucinations are harmful or concerning because the speaker can be misunderstood or misrepresented.
In the example they uncovered, the speaker said, “He, the boy, I don’t know exactly, was trying to hold an umbrella.”
However, the transcription software added: “He took the big pieces of the cross and the little pieces of the cross…He must have killed many people because he didn’t have the scary knife with him.”
“Two other girls and a woman,” the speaker on another recording explained. Whisper made up additional racial commentary, adding, “There were also two girls and a woman, uh, black.”
In the third transcript, Whisper invents a non-existent drug called “hyperactivated antibiotic.”
Researchers don’t know why Whisper and similar tools cause hallucinations, but software developers said the hoaxes tend to occur when paused, background sounds, or music are playing.
In an online disclosure, OpenAI recommended against using Whisper in “decision-making situations where flaws in accuracy could lead to obvious flaws in the results.”
Transcribe your doctor’s appointment
Even with this warning, hospitals and medical centers are using speech-to-text models such as Whisper to reduce the time healthcare providers spend taking notes and writing reports. I haven’t stopped transcribing the content.
More than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, have started using the Whisper-based tool built by Nabla, which has offices in France and the United States.
Martin Raison, Nabla’s chief technology officer, said the tool was fine-tuned for medical language to transcribe and summarize patient interactions.
Company officials said they are aware that Whisper can cause hallucinations and are working to mitigate the problem.
Nabla’s tools erase the original audio for “data security reasons,” so it’s impossible to compare the transcripts generated by Nabla’s AI to the original recordings, Raison said.
Nabla said the tool has been used to record an estimated 7 million medical visits.
Saunders, a former OpenAI engineer, said erasing the original audio can be a concern if the transcripts aren’t double-checked or clinicians can’t access the recordings to confirm they’re accurate. He said there is.
“If you take away the ground truth, you can’t find errors,” he said.
Nabla said there is no perfect model and currently providers must quickly edit and approve transcribed notes, but that could change.
Privacy concerns
Because patient-doctor interviews are confidential, it is difficult to know how AI-generated records are impacting patients.
California Congresswoman Rebecca Bauer-Kahan took one of her children to the doctor earlier this year and filled out a form provided by the health network, which asked for permission to share the consultation audio with vendors including Microsoft Azure. He said he refused to sign it. A cloud computing system operated by OpenAI’s largest investor. Bauer-Kahan did not want such intimate medical conversations to be shared with technology companies, she said.
“The announcement was very specific that for-profit companies have the right to take this,” said Bauer Kahan, a Democrat who represents parts of the San Francisco suburbs in the state Legislature. “I was like, ‘No way, that’s not going to happen.'”
John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.
___
Shellman reported from New York.
___
This article was produced in partnership with the Pulitzer Center’s AI Accountability Network. This network also partially supports academic Whisper research.
___
The Associated Press receives funding from the Omidyar Network to support our reporting on artificial intelligence and its impact on society. AP is solely responsible for all content. Learn about AP’s standards for working with philanthropy, a list of supporters, and funded areas at AP.org.
___
The Associated Press and OpenAI have a license and technology agreement that gives OpenAI access to some of the AP’s text archives.