A patient sits in a hospital bed, a bandage covering his neck with a small opening for the tracheostomy tube that supplies him with oxygen.
Because of his recent surgery, the man featured in this marketing video can’t vocalize. So a doctor holds up a smartphone and records the patient as he mouths a short phrase. An app called SRAVI analyzes the lip movements and in about two seconds returns its interpretation—”I need suction.”
It seems like a simple interaction, and in some respects, SRAVI (Speech Recognition App for the Voice Impaired) is still pretty simplistic. It can only recognize a few dozen phrases, and it does that with about 90 percent accuracy. But the app, which is made by the Irish startup Liopa, represents a massive breakthrough in the field of visual speech recognition (VSR), which involves training AI to read lips without any audio input. It will likely be the first lip-reading AI app available for public purchase.
Researchers have been working for decades to teach computers to lip-read, but it’s proven a challenging task even with the advances in deep learning systems that have helped crack other landmark problems. The research has been driven by a wide array of possible commercial applications—from surveillance tools to silent communication apps and improved virtual assistant performance.
Liopa is in the process of certifying SRAVI as a Class I medical device in Europe, and the company hopes to complete the certification by August, which will allow it to begin selling to healthcare providers.
While their intentions for the technology aren’t clear, many of the tech giants are also working on lip-reading AI. Scientists affiliated with or working directly for Google, Huawei, Samsung, and Sony are all researching VSR systems and appear to be making rapid advances, according to interviews and Motherboard’s review of recently published research and patent applications. The companies either didn’t respond or declined interviews for this story.
As lip-reading AI emerges as a viable commercial product, technologists and privacy watchdogs are increasingly worried about how it’s being developed and how it may one day be deployed. SRAVI, for example, is not the only application of lip-reading AI that Liopa is working on. The company is also in phase two of a project with a UK defense research agency to develop a tool that would allow law enforcement agencies to search through silent CCTV footage and identify when people say certain keywords.
Surveillance company Motorola Solutions has a patent for a lip-reading system designed to aid police. Skylark Labs, a startup whose founder has ties to the U.S. Defense Advanced Research Projects Agency (DARPA), told Motherboard that its lip-reading system is currently deployed in private homes and a state-controlled power company in India to detect foul and abusive language.
“This is one of those areas, from my perspective, which is a good example of ‘just because we can do it, doesn’t mean we should,’” Fraser Sampson, the UK’s biometrics and surveillance camera commissioner, told Motherboard. “My principal concern in this area wouldn’t necessarily be what the technology could do and what it couldn’t do, it would be the chilling effect of people believing it could do what it says. If that then deterred them from speaking in public, then we’re in a much bigger area than simply privacy, and privacy is big enough.”
The emergence of lip-reading AI is reminiscent of facial recognition technology, which was a niche area of research for decades before it was quietly, but rapidly, commercialized as a surveillance tool beginning in the early 2000s.