NASA Tech Briefs: What is a sub-vocal speech system?
Chuck Jorgensen: Subvocal speech is silent, or sub-auditory, speech, such as when a person silently reads or talks to himself. Biological signals arise when reading or speaking to oneself with or without actual lip or facial movement. A person using the subvocal system thinks of phrases and talks to himself so quietly, it cannot be heard, but the tongue and vocal cords do receive speech signals from the brain.
NTB: How did you study the patterns of the complex nerve signals in
the throat that control speech?
In the demonstration that we prepared, we controlled a small Mars rover. We took the words stop, go, left, and right, and sent them to the rover on a Mars terrain, and could direct the rover to go to different locations without any audible sound.
In another demonstration, we used the digits 0-9 and some of the control worlds, like go, to control a modified Web browser. We coded the alphabetic characters in terms of a number sequence, basically using a matrix so that the letter A, for example, would be 1,1 in the matrix, the letter B would be 1,2, and so on. This allowed us to spell out the word NASA in the Web browser. We then used the word “go” and when it found the results for NASA, instead of the results being highlighted (like in a normal Web browser), we had the browser number them so that each of the items came back as text with a number on the side. This way we could say, “go to 1, go to 3, or go to 4,” and start moving around and browsing the Web. What we were trying to illustrate with these demonstrations were two different ways of using subvocal speech: (1) to control a device and (2) to control an information source or a computer program.
NTB: How did you “train” the software to recognize speech patterns?
wavelet transform is used to create a vector of coefficients. We use
about 50 vector elements that are fed to the neural net as inputs.
So we have 50 examples of the different words that are then transformed
and produce 50 vectors for each training sample. The training sample
is fed to the neural net, which associates one of the input samples with
an output category, which we assign. So the output category might be
that one sample corresponds to when the person said, “Stop,” and
another sample corresponds to when the person said, “Go.”
NTB: What was the success rate of recognition?
Jorgensen: It has varied since we first started. In the first report it was 92% and now, for small numbers and words, we are up to 99% plus. We’ve added a much larger number of words and are taking a look not just at individual words, but at vowels and consonants, which are the building blocks from which the words are made. We’re using about 40 of those right now and the if we are successful in getting those recognition rates up high enough (we are currently somewhere in the 70s for those), we should be able to feed it directly into a full-blown speech recognition engine, so it wouldn’t have to learn individual words anymore. We would function much in the same way as some of the auditory speech recognition systems work and hopefully be able to use some of that technology.
NTB: How could NASA utilize this technology?
are basically three ways that we view NASA using this technology. The
first is the case where we have either noisy environments
or we have atmospheric conditions; such as the breathing gases that astronauts
might be using that change their acoustic patterns. Additionally, under
conditions of low pressure, different gas mixes – much like you
would have with a deep-sea diver – and microgravity, which changes
the muscle responses all over the body, the voice sounds also change.
This has caused communication difficulties and also difficulties for
recognition with classical speech techniques. That is one of the first
applications for the subvocal speech system.
The third area deals with emergency safety. If there is physical injury, for example if the voicebox or arm is injured, and you don’t necessarily have a prosthetic available on Mars, we wanted to have an additional emergency method where someone could take the electrodes or even the existing medical monitoring sensors that they are going to have, and use their electrical signals to access or control a device. An example would be if you’re in a spacesuit and it’s pressurized and so stiff you can’t really type with it you could still move your fingers inside the gloves. Because we’re picking up the electronic signals being sent to the fingers, potentially performing typing operations that could control something inside a station. If a two-man team of astronauts are outside the space station and needed to electronically change some code or enter something, they could have an additional safety system available to them for emergency use.
NTB: Does this system have any commercial applications?
Jorgensen: Quiet cell phones would be one commercial application; possibly communication between divers, is another. Anyone who needs to use noisy haz-mat suits or work in high-noise environments could benefit from this technology. Environments where you want privacy, such as in teleconferencing and you want to talk to someone around the table – the neuro-electronic methods that we are discussing here pick up more than just word patterns that you might have sub-vocally. They can also identify who the speaker is, and track whether the speaker is tired, angry, happy, or sad, so we have a possibility (we have not done this) here of speech enrichment as well as just communication.
NTB: What is the next step for this technology?
Jorgensen: One area that we are investigating is to see how much of a speech system can we generate. We are in the equivalent of the early stages of auditory speech recognition, where we only have one speaker and individual words. Ultimately you want to have multiple speakers and continuous speech.
The second thing concerns the sensors. We do not seriously intend for people to walk around with wires hanging from their throats and grounding lines on their hands. There are both dry sensors that don’t require physical contact and we’ve begun to use those. We are also developing an entirely new type of sensor that doesn’t even have to touch the body, called a capacitive sensor. What we would like to do is be able to embed those sensors in either clothing or some kind of simple appliance that would be very convenient for someone to have where the electrical signals would be picked up in a non-invasive and comfortable way and available any time they wanted to use them for controlling a device.