It’s the perennial “cocktail party problem”: standing in a packed room with a drink in hand, trying to hear what the other guests are saying.
In fact, humans are remarkably good at carrying on a conversation with one person while filtering out competing voices.
Yet, what is surprising is that this is a skill that, until recently, technology has been unable to replicate.
And this becomes important if audio evidence is to be used in court: audible background voices can obscure who is speaking and what is being said, rendering the recording useless.
Wave Sciences founder and chief technology officer, electrical engineer Keith McElveen, became interested in the problem while working on war crimes cases for the U.S. government.
“What we were trying to find out was who ordered the massacre of civilians. Among the evidence were recordings of voices speaking in unison. That’s when I learned what the ‘cocktail party problem’ was,” he said.
“I had been successful in removing noises from audio, like car sounds, air conditioners and fans, but when I started trying to remove voices from audio, I found that it was not only a very difficult problem, but one of the classic challenges in acoustics.
“The sound is bouncing around the room, and it’s very difficult to solve mathematically.”
The answer, he says, was to use AI to accurately identify and eliminate all competing sounds based on where they were coming from in the room.
This isn’t just because other people are talking, but there is also a great deal of interference caused by sound bouncing around the room, allowing the target speaker to be heard both directly and indirectly.
In a completely anechoic room (a room with no reverberations), one microphone per speaker would be enough to pick up everyone’s speech, but in a real room the problem is that a microphone is needed for every reflected sound.
McElveen founded Wave Sciences in 2009 with the goal of developing technology that could separate overlapping voices. Initially, the company used an array of microphones in a technique called array beamforming.
However, feedback from potential commercial partners indicated that this system required too many microphones to justify the cost for good results in many situations, and did not work at all in many others.
“The consensus was that if we could come up with a solution that addressed these concerns, they would be very interested,” McElveen said.
Then, he adds, “If it can be done with just two ears, I knew there had to be a solution.”
After 10 years of internally funded research, the company finally solved the problem and filed a patent application in September 2019.
What they came up with was an AI that could analyze how sound bounces around a room before it reaches your microphone or ear.
“It takes the sound that hits each microphone, works its way back to where that sound came from, and essentially suppresses any sounds that may not be coming from where you’re sitting,” McElveen says.
In some ways, the effect is similar to when a camera focuses on one subject and blurs the foreground and background.
“The results aren’t as clear because it can only learn from very noisy recordings, but they’re still impressive.”
This was the first time that the technique was used in actual forensic science in a murder case in the United States, and evidence obtained through this technique was key to the conviction.
After two hit men were arrested for the murder of a man, the FBI tried to prove they had been hired by a family fighting for custody of a child. They led the family to believe they were being blackmailed for their involvement, then waited patiently for their reaction.
While it was relatively easy for the FBI to access text and phone records, the in-person meetings at the two restaurants were a different story, but the court’s authorization for the use of Wave Sciences’ algorithms turned the audio recordings from inadmissible to crucial.
Since then, other government laboratories, including in the UK, have carried out a series of tests, and the company is now selling the technology to the US military, which uses it to analyse sonar signals.
McElveen said the technology could also be used in hostage negotiations or suicide scenarios, allowing both sides of a conversation to be heard, rather than just the negotiator with the megaphone.
Late last year, the company released a software application using its learning algorithms for government agencies to conduct voice forensics and acoustic analysis.
Ultimately, the company aims to introduce customized versions of its product for use in audio recording kits, automotive voice interfaces, smart speakers, augmented and virtual reality, sonar and hearing aid devices.
For example, when you talk to your car or smart speaker, the device can understand what you’re saying, even if there’s background noise.
AI is already being used in other areas of forensic science, according to Teri Armenta, a forensic educator at the Academy of Forensic Sciences.
“ML (machine learning) models analyse speech patterns to determine speaker identity, a process that is particularly useful in criminal investigations where audio evidence needs to be authenticated,” she says.
“Furthermore, AI tools can detect manipulation or alteration of voice recordings, ensuring the integrity of evidence presented in court.”
AI is also making inroads into other aspects of audio analysis.
Bosch has a technology called “SoundSee” that uses audio signal processing algorithms to analyze the sound of a motor, for example, and predict breakdowns before they happen.
“Traditional audio signal processing lacks the ability to interpret sound the way humans do,” says Dr. Samarjit Das, director of research and technology at Bosch USA.
“Audio AI will enable a deeper understanding and semantic interpretation of the sounds around us, including environmental sounds and sounds made by machines.”
Recent tests of the Wave Sciences algorithm have shown that even with just two microphones, the technology can perform as well as the human ear, with more microphones showing even better performance.
And they revealed something else.
“The computations of all of our tests are remarkably similar to human hearing,” McElveen said. “There are little quirks in what our algorithms can do and how accurately they can do it that are remarkably similar to some of the quirks that exist in human hearing.”
“We suspect that the human brain uses the same mathematics, and that in the process of solving the Cocktail Party Problem, we may have stumbled upon a secret about what really goes on in the brain.”