Using brain scanning technology, artificial intelligence, and speech synthesizers, scientists have transformed the patterns of the brain into understandable verbal speech — a progress that could ultimately give voice to those who do not.
It is a pity that Stephen Hawking is not alive to see this, as he may have received a real blow from this. A new speech system developed by researchers from the neuroacoustic processing laboratory at Columbia University in New York is what the late physicist could get.
Hawking had amyotrophic lateral sclerosis (ALS), a disease of the motor neurons that deprived him of verbal speech, but he continued to communicate using a computer and speech synthesizer. Using a cheek switch attached to his glasses, Hawking was able to pre-select on the computer words that were read by a voice synthesizer. It was a bit tedious, but allowed Hawking to reproduce about a dozen words per minute.
But imagine if Hawking had to manually select and launch words. In fact, some people, regardless of whether they suffer from ALS, blockage syndrome or recover from a stroke, may not have the motor skills necessary to control a computer, even with a cheek. Ideally, an artificial voice system should capture the thoughts of a person directly to create speech, eliminating the need to control a computer.
A new study, published today in the journal Scientific Advances, brings us closer to an important step towards this goal, but instead of capturing a person’s inner thoughts to restore speech, it uses brain patterns obtained while listening to speech.
To develop such a speech neuroprosthesis, neurobiologist Nima Mesgarani and his colleagues combined the latest advances in deep learning with speech synthesis technologies. Their final brain-computer interface, although it remained elementary, captured brain patterns directly from the auditory cortex, which were then decoded by a vocoder with artificial intelligence or a speech synthesizer to create legible speech. The speech sounded very robotic, but almost three out of four listeners could distinguish the content. This is an exciting progress that can ultimately help people who have lost the ability to speak.
To make it clear, the Mesgarani neuroprosthetic device does not translate the hidden speech of a person, that is, the thoughts in our heads, also called imaginary speech, directly into words. Unfortunately, we are not quite from the point of view of science. Instead, the system recorded the distinctive cognitive reactions of a person when he listened to the recordings of speaking people. The deep neural network was able to decode or broadcast these patterns, allowing the system to reconstruct speech.
“This study continues the recent trend in the application of deep learning methods for decoding neural signals,” said Andrew Jackson, a professor of neural interfaces at Newcastle University who was not involved in the new study. Gizmodo“In this case, the neural signals are recorded from the surface of the human brain during an epilepsy operation. Participants listen to different words and sentences that actors read. Neural networks are trained to study the relationship between brain signals and sounds, and as a result, they can then restore understandable reproductions of words / sentences based only on brain signals. ”
Epilepsy patients were selected for the study, because they often have to undergo brain surgery. Mesharani, with the help of Ashesh Dinesh Mehta, a neurosurgeon from the Institute of Neurobiology at Northwell Health Physician Partners and a co-author of the new study, hired five volunteers to experiment. The team used invasive electrocorticography (ECoG) to measure nervous activity when patients listened to continuous speech sounds. Patients listened, for example, to speakers who read numbers from zero to nine. Then their brain patterns were fed into the vocoder with AI support, which led to synthesized speech.
The results were very robotic, but quite understandable. In tests, students could correctly identify the spoken numbers in about 75% of cases. They could even tell if the speaker was male or female. Not bad, and the result, which even became a “surprise” for Mesgaran, as he said Gizmodo in email
Records of a speech synthesizer can be found here (researchers tested various methods, but the best result was obtained when combining deep neural networks with a vocoder).
Using a voice synthesizer in this context, unlike a system that can match and repeat pre-recorded words, was important for Mesgarani. As he explained Gizmodoit's not just about putting together the right words.
“Since the goal of this work is to restore speech communication among those who have lost the ability to speak, we sought to learn how to directly display the brain signal and the speech sound itself,” he said. Gizmodo, "You can also decode phonemes [distinct units of sound] or words, however, speech has much more information than just content – for example, the speaker [with their distinct voice and style], intonation, emotional tone and so on. Therefore, our goal in this article is to restore the sound itself. ”
Looking into the future, Mesgarani would like to synthesize more complex words and sentences and collect brain signals from people who simply think or imagine the act of speech.
Jackson was impressed with the new research, but he said that it is still not clear whether this approach will be applied directly to the brain-computer interfaces.
“In the article, the decoded signals reflect the real words heard by the brain. To be useful, the communication device would have to decode the words that the user imagines, ”said Jackson. Gizmodo“Despite the fact that there are often some overlaps between the brain areas involved in listening, speaking and imagination, we do not yet know exactly how similar the corresponding brain signals will be.”
William Tatum, a neuroscientist from the Mayo Clinic, who also did not participate in the new study, said that the study is important because it is the first to use artificial intelligence to restore speech through brain waves involved in creating well-known acoustic stimuli. The meaning is noticeable, “because it promotes the use of deep learning in the next generation of more advanced speech production systems,” he said. GizmodoHowever, he felt that the sample size of the participants was too small, and that the use of data extracted directly from the human brain during the operation was not ideal.
Another limitation of the study is that neural networks, so that they can more than just reproduce words from zero to nine, would have to be trained on a large number of brain signals from each participant. The system is individual for each patient, since we all develop different patterns of the brain when we listen to speech.
“In the future, it will be interesting to see how well-trained decoders for one person apply to other people,” said Jackson. “This is a bit like early speech recognition systems that the user had to individually train, unlike modern technologies like Siri and Alexa, which can perceive the voice of any person, again using neural networks.” Only time will tell if these technologies can one day do the same for brain signals. ”
Without a doubt, there is still a lot of work. But the new article is an encouraging step to achieve implantable neuroprosthetics.[Scientific Reports]