Sensory’s updated Speech Chip includes text-to-speech, voice recognition, gestural commands, 24-voice sample-triggering MIDI synth and more…
Yahoo News has a little article about a chip that may be of interest to us:
The NLP-5x is the company’s first to feature text-to-speech, using a voice-morphing algorithm that can generate thousands of voices on the fly, according to Todd Mozer, the company’s chief executive.
Sensory counts major toy manufacturers like JVC, Mattel, and Hasbro among the company’s customers, so the capabilities that the NLP-5X offers could show up in tomorrow’s toys. In total, Sensory chips appears in between 30 to 40 products, over four generations of product. “But we decided it was time to rearchitect the whole chip,” Mozer said.
The NLP-5X upgraded the chip’s internal microcontrollers to a DSP, adding more MIPS horsepower, upgrading the analog-to-digital and digital-to-analog circuitry from 12- to 16-bit sensitivity.
There are two types of speech recognition: speaker-dependent, such as that provided by Nuance and its Dragon Naturally Speaking product. There, the PC or phone stores an enormous library of phonemes and words, but requires the system to “train” with the user to manage the complexity of the process. A speaker-independent system like Sensory manufactures allows a for a great deal of latitude in the input voice, and doesn’t require training. But the available “vocabulary” is also much smaller.
Still, even speaker-independent systems can factor in complexity. “An oven is a good example,” Mozer said. “It’s quite complex on older chips, compared to what we can do today. On an oven, for example, the unit can ask you, ‘What temperature do you want?’”
When a user replies, he might answer any of many possible combinations from roughly 100 degrees to 500 degrees, with some variance: “three-seventy-five,” for example, or “three-seven-five,” or “three hundred and seventy-five degrees”. The user also can be asked how he wants the meal cooked – convection, baked, or broiled – and for how long. “The natural language function is the most unique feature of the chip,” Mozer said.
With the addition of text-to-speech, that oven could go onto the Internet, find a recipe, and then read it to the user, Mozer added. The voice-morphing system allows it to speak thousands of voices, including male and female. But Mozer said that the company hasn’t recorded a celebrity voice. A 24-voice MIDI synthesizer allows it to play back music or MP3 files. Sensory even built in a gesture interface, Mozer added.
An “incredible algorithm” also allows the chip to always be on, Mozer said, “listening” for commands, instead of requiring a button to be depressed. In certain cases, the NLP-5x can receive a “hint” that it might be needed; a Sensory chip inside a Bluetooth headset can be cued by the phone. Even a MIPS-intensive task requires just 30 microamps, Mozer said, extending battery life even further.
“We’re far and away the leader in this space, and this is why,” Mozer said.
OK, now we’re talking! Here’s a list of the highlights for the lazy:
- analog-to-digital and digital-to-analog circuitry, 16-bit sensitivity
- natural language function speech recognition (having recently “trained” my Windows7 speech recognition I well know this is an important feature!)
- voice-morphing system allows it to speak thousands of voices, including male and female
- 24-voice MIDI synthesizer allows it to play back music or MP3 files
- built-in a gesture interface
- a Sensory chip inside a Bluetooth headset can be cued by the phone
In my mind this all adds up to something which could transmit Voice and Gesture Commands wirelessly, via Bluetooth, and trigger samples via MIDI… all for $2 (in batches of 100,000 per year, apparently, but still…)
No Comments
Leave a CommentNo comments yet.
Leave a comment
RSS feed for comments on this post. TrackBack URI