Concept Graph & Resume using Claude 3 Opus | Chat GPT4 | Llama 3:
Resume:
1.- Speech neural prosthesis aims to restore natural communication to people with severe paralysis, potentially benefiting over 3 million in the U.S.
2.- Current assistive technologies like spelling with head movements or eye tracking are much slower than natural speech (15 vs 120+ wpm).
3.- Speech neural prosthesis could decode intended speech from the brain using invasive (ECoG, microelectrode arrays) or non-invasive (EEG) neural interfaces.
4.- Over a century of work has characterized speech in the brain, with recent advances in decoding speech and text in the last 15 years.
5.- Bouchard et al. 2013 showed distinct neural responses for different speech sounds and an articulatory map in speech cortex during syllable production.
6.- Anumanchipalli et al. decoded speech from healthy speakers by mapping brain activity to speech waveforms, but it requires residual speech ability.
7.- Makin et al. decoded brain activity into text sentences using a convolutional-recurrent neural network, but also relies on overt speech alignment.
8.- The BRAVO clinical trial aims to decode and restore communication and movement to paralyzed individuals using various neural recording interfaces.
9.- Using a 50-word vocabulary, they decoded speech from a paralyzed participant with 25% WER at 15 wpm using an RNN and language model.
10.- Expanding to spelling-based BCI allows open vocabulary decoding - 26 letter classes are decoded as the user attempts to spell words.
11.- Spelling-based approach achieves 11% WER and 6% CER at 1000 word vocabulary, driven by neural decoding with language model refining the output.
12.- However, spelling is unnatural and slower than natural speech. Their new approach can synthesize speech from a paralyzed person without requiring overt speech.
13.- Participant Anne, who had a stroke causing loss of intelligible speech, was implanted with a high-density ECoG grid over her speech cortex.
14.- They decode probabilities over phonemes, acoustic speech features, and articulatory gestures from Anne's neural activity as she attempts to speak.
15.- The text decoding model uses CTC loss to map neural activity to phoneme probabilities without alignment, enabling open vocabulary decoding via language model.
16.- Achieved 25% WER at 78 wpm over 1000 words within a few weeks, with neural decoding driving performance more than the language model.
17.- Synthesized speech by decoding speech unit probabilities and using a synthesizer conditioned on Anne's pre-stroke voice for a personalized voice.
18.- Synthesized speech achieved up to 90% intelligibility on 50 phrases, with lower but promising performance on larger 500-1000 phrase sets.
19.- Decoded articulatory gestures from neural activity to drive a live 3D avatar in real-time, capturing speech and expressive facial movements.
20.- Avatar animation from decoded gestures was both intelligible and correlated well with real speakers' face movements during the same speech.
21.- Combined decoding of speech audio, text, and avatar animation provides an embodied neuroprosthetic for more complete communication restoration.
22.- Anne felt the personalized voice synthesis and avatar could enable her to counsel clients again and have fuller self-expression and interaction.
23.- The same articulatory representations remain intact in the brain even years after paralysis, enabling the speech neuroprosthesis to work.
24.- Future work aims to translate these proofs-of-concept into a fully implantable clinical device suitable for day-to-day use at home.
25.- Challenges include robustness across users, wireless system miniaturization, improving performance metrics like accuracy and latency, and expanding languages supported.
26.- High performance spelling-based approaches allow above 99% accuracy, but are slow. Streaming synthesis could enable more natural back-and-forth conversations.
27.- Continuously streaming approach synthesizes speech audio and text in 80 ms increments during decoding, doubling speaking rate vs delayed synthesis.
28.- Streaming synthesis with implicit voice activity detection enables uninterrupted multi-minute use of the decoder without explicit trial windowing.
29.- Error-related neural signals may help identify mistakes and improve the performance and robustness of the brain-computer interface system.
30.- These speech neuroprosthesis approaches work across multiple languages since they rely on decoding speech articulation rather than language-specific features.
Knowledge Vault built byDavid Vivancos 2024