SAN FRANCISCO (AP) — Tech giant OpenAI touted its artificial intelligence-powered transcription tool Whisper as approaching “human-level robustness and accuracy.”
However, Whisper has a major flaw. According to interviews with more than a dozen software engineers, developers, and academic researchers, Whisper tends to compose chunks of text and even entire sentences. These experts say some of the invented texts (known in the industry as hallucinations) can include racial commentary, violent rhetoric, and even imagined cures. said.
Experts say the hoaxes are problematic because Whisper is used in many industries around the world to translate and transcribe interviews, generate text in common consumer technologies, and subtit videos. He said there is.
What is even more concerning is that Flood of medical centers Use Whisper-based tools to transcribe patient discussions with doctors. Open AI” Warning that this tool should not be used on “high-risk domains”.
Although it is difficult to pinpoint the full extent of the problem, researchers and engineers said they frequently encountered Whisper hallucinations while on the job. a University of Michigan For example, researchers conducting a public meeting study said they found hallucinations in 8 out of 10 audio transcriptions they examined before they started trying to improve their model.
One machine learning engineer said he first discovered hallucinations in about half of the more than 100 hours of Whisper transcriptions he analyzed. A third developer said he found hallucinations in nearly all of the 26,000 transcripts he created with Whisper.
This problem still occurs even with short, well-recorded audio samples. A recent study by computer scientists found 187 hallucinations from over 13,000 clear audio fragments examined.
Researchers say this trend could result in tens of thousands of incorrect transcriptions of millions of recordings.
___
This article was produced in partnership with the Pulitzer Center’s AI Accountability Network. This network also partially supports academic Whisper research. The AP also receives funding from the Omidyar Network to support reporting on artificial intelligence and its impact on society.
___
Such mistakes can have “really serious consequences,” especially in a hospital setting, he said. Alondra NelsonUntil last year, he led the White House Office of Science and Technology Policy in the Biden administration.
“No one wants to be misdiagnosed,” says Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”
Whisper is also used to create closed captions for the deaf and hard of hearing, people who are at special risk of transcription failure. That’s because deaf and hard of hearing people have no way of identifying fabrications “hidden among other texts,” he said. Christian Voglerhe is deaf and directs Gallaudet University’s Technology Access Program.
OpenAI seeks to address the problem
In response to this widespread illusion, experts, advocates, and former OpenAI employees are calling on the federal government to consider regulating AI. At the very least, OpenAI needs to address this flaw, they said.
“I think this problem is solvable if the company is willing to make it a priority,” said William Saunders, a San Francisco-based research engineer who left OpenAI in February over concerns about the company’s direction. “I can do it,” he said. “The problem is that when you put this out there, people become overconfident about what it can do and how it can be integrated into all the other systems.”
Ann OpenAI A spokesperson said the company is continually researching ways to reduce hallucinations and appreciates researchers’ findings, adding that OpenAI incorporates feedback into model updates.
Most developers assume that transcription tools will misspell words or make other errors, but engineers and researchers have found that no AI-powered transcription tool can cause as many hallucinations as Whisper. He says he has never seen it.
whisper hallucination
The tool is integrated into some versions of ChatGPT, OpenAI’s flagship chatbot, and is built into cloud computing platforms from Oracle and Microsoft that serve thousands of businesses around the world. . It is also used to transcribe text and translate it into multiple languages.
Last month alone, one of the latest versions of Whisper was downloaded more than 4.2 million times from open source AI platform HuggingFace. Sanchit Gandhi, a machine learning engineer at the school, said Whisper is the most popular open source speech recognition model, built into everything from call centers to voice assistants.
professor Alison Konecke with a Ph.D. from Cornell University Mona Throne The University of Virginia researchers examined thousands of short snippets obtained from TalkBank, a research repository hosted by Carnegie Mellon University. They determined that almost 40% of hallucinations are harmful or concerning because the speaker can be misunderstood or misrepresented.
In the example they uncovered, the speaker said, “He, the boy, I don’t know exactly, was trying to hold an umbrella.”
However, the transcription software added: “He took the big pieces of the cross and the little pieces of the cross…He must have killed many people because he didn’t have the scary knife with him.”
“Two other girls and a woman,” the speaker on another recording explained. Whisper made up an additional commentary on race, adding, “There were also two girls and a woman, uh, black.”
In the third transcript, Whisper invents a non-existent drug called “hyperactivated antibiotic.”
Researchers don’t know why Whisper and similar tools cause hallucinations, but software developers said the hoaxes tend to occur when paused, background sounds, or music are playing.
In an online disclosure, OpenAI recommended against using Whisper in “decision-making situations where flaws in accuracy could lead to obvious flaws in the results.”
Transcribe your doctor’s appointment
Even with this warning, hospitals and medical centers are using speech-to-text models such as Whisper to reduce the time healthcare providers spend taking notes and writing reports. I haven’t stopped transcribing the content.
More than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles. Nablawith offices in France and the United States.
Martin Raison, Nabla’s chief technology officer, said the tool is fine-tuned to medical terminology to transcribe and summarize patient interactions.
Company officials said they are aware that Whisper can cause hallucinations and are working on the issue.
Nabla’s tools erase the original audio for “data security reasons,” so it’s impossible to compare the transcripts generated by Nabla’s AI to the original recordings, Raison said.
Nabla said the tool has been used to record an estimated 7 million medical visits.
Saunders, a former OpenAI engineer, said erasing the original audio can be a concern if the transcripts aren’t double-checked or clinicians can’t access the recordings to confirm they’re accurate. He said that there is.
“If you take away the ground truth, you can’t find errors,” he said.
Nabla said there is no perfect model and currently providers must quickly edit and approve transcribed notes, but that could change.
Privacy concerns
Because patient-doctor interviews are confidential, it is difficult to know how AI-generated records are impacting patients.
California Congressman, Rebecca Bauer-KahanShe took one of her children to the doctor earlier this year and asked for permission to share the consultation audio with vendors including Microsoft Azure, a cloud computing system run by OpenAI’s largest investor, provided by the health network. He said he refused to sign the form. . Bauer-Kahan did not want such intimate medical conversations to be shared with technology companies, she said.
“The announcement was very specific that for-profit companies have the right to take this,” said Bauer Kahan, a Democrat who represents parts of the San Francisco suburbs in the state Legislature. “I thought, “This will never happen.”
John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.
___
Shellman reported from New York.
___
AP is solely responsible for all content. Find AP standard Please see below for our philanthropic efforts, list of supporters and areas funded. AP.org.
___
Associated Press and OpenAI License and technology agreement Allows OpenAI to access some of AP’s text archives.