Speech recognition
Created | Updated Jul 7, 2002
Most people make the mistake of installing the software and expecting it to work straight out of the box. It doesn't exactly work like that. What you have to realise is that the user is the linch pin of the whole experience. The software isn't complete until the user comes along and adapts the program to understand what he/she is saying. Only when the user has adapted the computer to his or her speech pattern, intimately linked to the soundcard used, is the program ready to perform to the specifications set forth by the designers.
It's not like a word processor where you double click on the icon or the shortcut and you have a product waiting for you ready to perform to optimum capacity. Speech recognition software cannot do that (yet).
If you want speech recognition software to perform to any kind of standard that you think is useful you have to train it to understand you. That means you have to actively give it a chance to actually work well. We live in an age where time is money to the point where it's ridiculous. We don't have time anymore to simply learn anything. You can't just start off as a trainee, you're supposed to reach the master stage when you're done reading the (very brief) installation guide. Anything else takes away from your time you need to make money (DNA rightly observed that it's not the money that's feeling unhappy, remember ?). And to this standard you expect speech recognition software to answer.
Well, I'm sorry to disappoint you, but you will have to spend a little more effort to make this work well.
You
have
to
train
the
programme
to
listen
to
and
understand
the
way
you
say
things.
And, as icing on the cake, you also have to give it texts you wrote, (that means : YOU, not your girlfirend or the nerd you bullied into writing your term paper) so that it can build a statistical analysis of the way you write. Which words are likely to follow other words you tend to use.
As if that's not enough you also have to train several words because you, the user, has all sorts of interesting and completely novel ways of pronouncing certain words that the rest of the species does not actually feel comfortable with. But the software doesn't mind, AS LONG AS YOU GIVE IT A CHANCE TO LEARN HOW YOU SPEAK ! You have to give something in order to get something back.
Speaking of which : talking to the confounded contraption is not complex enough without you louts trying to replace the RSI you wanted to get rid of by using speech recognition software to try and do the same thing to your vocal cords.
Using speech recognition software is not an examination. Please don't try to talk to the machine any other way than what you usually do discussing some cute secretary you wanted to snog. You don't get better marks when you injure your vocal cords. If your voice is as good as Moira Stewart's then that's just fine. Keep talking. If you sound like Onslow that's ok too. You may need to train the programme a little longer to make it understand your particular ideosynchrasies when talking but in the end it'll get there. Recognition will be phenomenal, independent of whether you sound like the weather girl or a coal miner.
The whole point of the excercise is to talk in your natural voice. Do not adapt to the software, have the software adapt to you. Many people start speaking in ways they are totally unacustomed with in order to be better understood. In an insultingly short period of time they will develop problems with their larynxes and vocal cords that are very severe and long-lasting. "But how do I know when I'm not speaking correctly ?". It's very easy. As soon as you find you're straining to produce the best possible sound for the software to understand, you're doing it wrong. Talk as effortlessly as you can, as calmly as you can, as natural as you can. Don't sound like a Dalek, use natural rhythm. Pitch and inflection are key components to good recognition. Start talking like a zombie and recognition is out of the window. Also, do not talk in syllables. Speech recognition looks at the contexts words are spoken in. Statistical models determine what the likelihood is that word x is spoken after word y. If you talk in syllables this context is lost and the software will try to understand each syllable as a word. There goes recognition again.
Also, some people get frustrated, angry, disintrested. Ideal circumstances to be misunderstood in. You're angry ? Your voice changes. You're frustrated ? You sound different. Recognition, built on the model you created when first teaching the software to understand your voice, suddenly runs into so much more difficulty trying to understand you. You get more frustrated because this darn piece of dingos kidneys isn't working properly, making accuracy even worse... ad infinitum. In the end people are going to observe you shouting at a computer. Think about that. A grown man/woman yelling at a piece of plastic. "So, how's your ability to handle stress on the job ?" "It's fine as long as I don't have to talk to that blasted machine"...
Take the time to adapt the software to the way you speak. All the effort you put in at first, you get back in buckets once you start using the software.
Don't sound like a deranged maniac about to shoot the family because you missed a rerun of "I love Lucy", speak in your natural voice.
Think carefully before you speak so that you can form longer, more natural sounding sentences. After a while you'll be doing this effortlessly.
Spare your voice. You'll be dictating at upwards of 100 words per minute. You'll be able to take a break and drink some water at regular intervals.
Do yourself a really big favour : expect the software to misrecognise what you said. It WILL make mistakes. Accept it. Don't feed your ulcer, don't clog your arteries. The software is going to make mistakes. You make typos when you're typing. This is the same thing. Just don't worry about it, correct the error, move on.
Speech recognition is not artificial intelligence (yet). It can't grasp what it was you intended to say, it doesn't understand you, it can't make out slurred speech, and it cannot possibly be expected to understand unspoken words. It has limits. Live with them.
Speech recognition software is here to stay. It's not error-free as yet, and chances are it never will be. Language is a hideously complicated concept that you're asking a piece of silicon to grasp. Be reasonable about that. Within the limits of what software can do you will see tremendous improvements of its range and capabilities. You should expect to see exquisite recognition in all the languages on Earth (at least the ones that make commercial sense to develop) within your lifetime. It will be tremendous and you'll love working with it. If you allow it to misunderstand you at least as much as other people (mis)understand you, you will have a realistic expectation of what it can do for you.
Have a great day.