Voice recognition software can be programmed to only respond to a certain person’s voice, so virtual assistants like Siri and Alexa can be set up to only respond to their device’s owner.
But new research from the University of Wisconsin-Madison shows that talking through a PVC tube can alter the sound of someone’s voice enough to trick these types of systems.
Kassem Fawaz, an assistant professor in the department of electrical and computer engineering, led the research. He said this type of voice identification security is becoming popular for applications like banking. So, he wanted to test its limits.
Stay informed on the latest news
Sign up for WPR’s email newsletter.
Research has already shown audio recordings or “deep fake” technology can be used to trick these systems, Fawaz said. Because of that, some systems have “liveness detection” to weed out attackers who aren’t actually human.
To get around liveness detection, Fawaz’s team wanted to see if they could trick voice recognition security in a completely analog way, using the actual human voice.
They found that talking through a PVC pipe, like the kind you can find at any hardware store, altered the frequency of the voice in a way that could break through voice identification security features, said PhD student Shimaa Ahmed.
“It’s a human who implements that tech, and that human doesn’t need to be a professional impersonator,” she said. “Just a normal human being can change their voice by using simple devices like a tube.”
To conduct the research, they used 3D-printed tubes and celebrity “voice prints.” There are thousands of celebrity voices that can be used in research, Fawaz said, because there are so many existing recordings of them to pull from.
Talking through tubes made to specific length and diameter, they were able to impersonate hundreds of celebrities. Ahmed remembers that she was able to alter her own voice to sound like Lisa Kudrow, who played Phoebe on the long-running NBC sitcom “Friends.”
Still, talking through a tube does not sound like a perfect impersonation to the human ear. Nearly anyone would be able to recognize that Ahmed and Kudrow do not sound the same.
Fawaz said that’s because humans and computers “perceive the world differently,” and the slight changes in pitch make a big difference to this kind of software, he said.
“While it’s not enough to trick me as a human, it’s enough to trick the machine,” he said.
The findings show the flaws in voice identification security software, especially when it comes to important data like banking information, Fawaz said.
“The security features of these voice based systems are at best questionable,” he said.
He recommends sticking to passwords and multi-factor authentication, at least until voice technology improves.
“Although they are not convenient, and they’re annoying, they do provide better security features,” he said.
Wisconsin Public Radio, © Copyright 2024, Board of Regents of the University of Wisconsin System and Wisconsin Educational Communications Board.