Voice Again

I have discussed voice (as a channel and application) a number of times here. Particularly voice, and the quality of voice communications in today's telephony, as we are almost at the same (hint: poor) level as we were hundred years ago. We have Web, movies, video calling applications on our handsets, yet our voice is still transmitted using narrow band codecs sounding similar to the rotary wired phones our ancestors used over the past century. Yes, there is light in that tunnel... 3G mobile networks introduce a number of enhancements here, especially the TrFO (Transcoder Free Operation), where transcoders are eliminated from the communication path, improving the overall quality a lot.

But there is more to voice than just person - to - person conversations. Voice is the most natural "interface" we use everyday. At least communicating with people. For person - to - machine conversations most of the time we use a keyboard (person - to - machine) and a display (machine - to person). QWERTY keyboards are awkward by design and screens in general are Gutenberg technology from the 15th century. So why can't we just speak to our gadgets and hear them back? The answer is simple - we have not mastered (yet) the process of converting speech to text, and that is the necessary step to let machines understand what we want from the. I should say we have not FULLY mastered this process, as there are many successful applications based on speech, available to consumers.

One such example is the speakerphone system I use every day driving my car. It works based on a very limited dictionary it can understand (a few commands plus digits and names recorded by a user), but it works very well:
  • (me) Dial number
  • (machine) Number please...
  • (me) six - zero - nine
  • (machine) six - zero - nine
  • (me) three - one - five
  • (machine) three - one - five
  • (me) three - six - six
  • (machine) three - six - six
  • (me) Dial!
  • (machine) Dialing...
Not very sophisticated, but works in so - called "speaker - independent" mode (any person can have such a dialog without any prior "training"), and works flawlessly, delivering me (as a driver) a safe and convenient way to dial phone numbers without touching anything or turning my eyes away from the road.

Las week Google introduced the voice search feature for Google Maps for Blackberry. Yet another important development. According to the blog, Google (obviously) uses its voice recognition technology developed for the original voice search product - GOOG-411. To quote the abovementioned blog "using your voice to search for businesses is super useful in situations when you can't type, when the name of the business is long, or when you're not sure how to spell it."

Voice recognition is still in its early stages of development. And the problem itself is very complicated. Suffice to say it takes a company like Google to move it forward just a bit. But with the continuous increase of computing power and accuracy of the algorithms we will be experiencing the rise of many more voice - based applications. Care to share some ideas?

Comments