A Conversation for Speech recognition

Peer Review: A781463 - Speech recognition

Post 1

Bluttsuuft

Entry: Speech recognition - A781463
Author: Bluttsuuft - U174943

I have tried to write an accessible piece on speech recognition from the point of view of the technical support professional who gets all kinds of questions on how to work with the software.
This is not supposed to be a piece about the science of speech recognition. I do not understand this science myself, I could not write about it in a knowledgeable fashion. I DO get to talk to a lot of people who do not fully grasp the realities of how this kind of software works from a user perspective. Hence the piece.


A781463 - Speech recognition

Post 2

xyroth

I find two problems with this article.

first (and relatively trivial) are the complete lack of headings.

The second problem comes from your above mentioned lack of understanding of the technology.

Speech software (contrary to the pr of the companies) is still basically at the travel dictionary level of competance. It also has the problem that speech interfaces are basically primative.

Contrary to your statement, unless you already happen to talk fairly like a dalek (ie no character in the voice), you have to earn how to talk pretty much like a dalek.

This involves clearly enunciating, and trying to speak exactly the same sentance exactly the same way.

This is because speech and writting have almost nothing in common. This is most clearly demonstrated with dialects, where two people can speek intelligently in chatrooms or forums for years, but cannot understand each other on the phone.

It is still the case that speech software (or language understanding software generally) work best when doing what is known as technical translation. This is when you are using poly-sylabic words that specify an idea exactly, and thus eave little room for the software to screw-up.

Anyone who tells you that you can read literature or peotry to a computer and it will get it right most of the time is either a liar, or a fool. the technology just isn't up to it yet.


A781463 - Speech recognition

Post 3

Spiff


Hi Bluut, smiley - smiley

So, you're back and you've got Schreiblust, eh? smiley - ok

Here's my feedback, along with a for yourself, smiley - smiley

I wasn't initially sure from the title what this piece would be about. Sure, you clear up any confusion straight away in the piece, but it is not clear until you decide to have a look.

Headers are a good suggested improvement, i agree, but do you know about Guide-ML? Do you know about Mark-up languages generally. You know, tags and stuff? smiley - smiley

This needs tags around your paragraphs, and some other minor formatting that would make it considerably more readable. It may seem like fancy extras, but it makes a difference. smiley - smiley

As to the content, I admit that I too am dubious about the quality of transcription with these products. I have worked with translators who used them and most had reservations.

This is an interesting group to poll on the value of the things, cos freelance translators would benefit enormously (in terms of productivity) from a good quality speech-recognition programme! They are not flocking to pick up the existing ones, though.

This does *not* make it an unsuitable subject for an entry. Just expressing my personal reservations. smiley - smiley

Best of luck with this and future efforts, smiley - ok
spiff


A781463 - Speech recognition

Post 4

Bluttsuuft

Thank you for your feedback. It is much appreciated.
I see I got another comment on headers and I'll answer that in the reply to that posting.

As for my lack of scientific understanding of the subject : writing speech recognition software is quite complex. I have a basic grasp of how it works on the development level but I have no coding ability and I will do myself the huge favour of not pretending that I do.
I wanted to offer some suggestions to the user of this kind of software how to go about improving his/her chances of being correctly understood by the software. I know that a lot of people do not (want to) take the trouble of using it correctly. This results in a lot of frustration and agony because getting the wrong result when you're dictating very quickly leads to anger and resentment. Believe me when I say that I know what I'm talking about.
As for talking like a Dalek : the software does work better when you use inflection. Although there are people who always use the same tone of voice, most of us do not. When you use the same tone of voice all day while dictating, you will find that dictating is tedious and after a short while also painful.
I'm not going into translation software, that's something totally different.
Dictating poetry or literature in general is possible too. What you would want to do here is to build your own vocabulary. There is software that allows you to do that. Once trained you will notice that your recognition improves dramatically. Having said that, I have tried dictating Shakespeare (and parts of The Guide !) without creating a specific vocabulary or training. I've printed out the result and stuck it to the wall to look at everytime I want a good laugh smiley - smiley

I also get a lot of positive feedback from people who use this kind of software in their everyday life. Although there is obviously room aplenty for improvement, we've entered a stage IMHO where the software allows a level of productivity that's practical in an office environment.
I wouldn't want to land a plane using only my voice, though.


A781463 - Speech recognition

Post 5

Bluttsuuft

Spiff,

Thank you very much for your comments as well. Don't be modest with the salt. Rub it in thick. I'm in tech support, pain is my life. I gather that you would have to go to extreme lengths to even register on the scale (I'm not doubting your ability, mind).
Although I haven't written much here yet, I want to add more to my curriculum because I absolute believe in what the late great DNA wanted to do here. Rather, it is because of my genuine respect for this great man that I haven't added a lot more. The reason for that is : I know nothing. At least that's what it feels like. I am confronted with the limitations of my knowledge on a daily basis and it makes me very careful to add anything here. I want to expose as little of my ignorance as possible. I'm vain that way.

But I'm working on it. I'm teaching myself HTML to create my first webpage and I would have liked to add some headers to make my entry more easy to read, but I don't know how to go about doing that yet. It will be a pleasure to find out how to create a decently formatted text. Actually, I just got an idea that should yield good results fairly quickly...

As for the title of my piece, I thought that "speech recognition" would be clear enough but you rightly assume there are more aspects to that than what I covered. Honestly, the title fit in the "Subject" box and that was plenty good enough for me. However, if you want a more elaborate title, I am only to happy to accomodate. I've somewhat of a reputation for creating titles for papers. How about "Some Comments On The Proper Training And Use Of Contemporary Speech Recognition Software - Practical Pointers From A TechSupport Point Of View" ?

Also, when I want to find my piece in the "Search" box using the title as criterion, why do I only get a less than 40% probability that I'm getting the right text ? Does that make sense ?

Greetings.


A781463 - Speech recognition

Post 6

Ausnahmsweise, wie üblich (Consistently inconsistent)

Hi,

Maybe you still need a bit more of a short introductory paragraph? Only after reading quite a way into the entry did I know what kind of speech recognition software it was about. I assume it's solely for dictating a text and having it converted to a computer document. But there's also IVR. And in factory settings, machines that require the use of both hands and allow the operator to give additional spoken commands. (A digitizer for semiconductor layouts come to mind.)

I don't know who Moira Stewart is (but I do know Onslow smiley - winkeye ). But then I've been gone a long time.

There's a story about Dr. Harry Tennant (A.I. / Natural Language Processing) testing some software. It was used to produce custom reports from a financial data base. He got a few standard reports OK. Then he asked it "And how about recently?" - It crashed!

Awu.
P.S. I think the writing guidelines say to avoid trying to write in DNA's style. So the reference to dingos' kidneys may have to go smiley - sadface



A781463 - Speech recognition

Post 7

Spiff

don't worry about the search tool. It's not the best thing on h2g2! smiley - biggrin

how about 'Voice Recognition Software'...

But then, as your super-title rightly points out, this piece is more like 'In defence of VRS'. It starts out from the assumption that people have certain negative prejudices about VR.

Either you could do a more introductory piece on what VRS is all about. Or do an out and out 'Apology' style piece, which is more where this one tends, I think.

Or you could do both with links.

All kinds of possibilities. smiley - smiley

seeya
spiff


A781463 - Speech recognition

Post 8

Bluttsuuft

Awu,

Two things on this :
1) Since DNA's work is very much in the realm of literature I feel quite free to quote him or refer to his work. I do that all the time with other writers, he is no different. I use this particular one because it will be readily recognised in this environment (Granted, I did not actually use quotes, I'm sorry. I'll make sure to do that next time). You do not need to worry that I would try to emulate him or use his words as my own. I have not the slightest desire to do so, there's no such thing.

2) : the big DNA bear owes me. Big time.
I'm from Belgium, you see. I live in Ghent.

I will freely quote DNA anytime it pleases me to do so.

But I will write you a nice introduction. I do mean to offer quality. It might take a while to get there. Bear with me.

Moira Stewart is a BBC newsreader. Beautiful woman. Beautiful voice.

Be well.



A781463 - Speech recognition

Post 9

xyroth

on the question of a title, "how to use speech recognition software" or "how to get better results from speech recognition software" might do the job.

as to infelction, the current generation of systems do work better if you don't speak in amonotone, but they seriously screw up if you have a good speaking or radio voice, to the extent that you do have to tone it down almost to the level of speaking like a dalek.

The reason I mention translation is that speech to text is a particularly hard problem in language translation.

in particular, it requires the use of information at every level of understanding just so that it can do a fairly incompetant job.

to get it to the point where you can use speech recognition like it is used in star trek (ie as standard) is a research level problem which will involve not just revolutions in speed of processing, but also revolutions in the methods of processing.

this is why despite doublings in processor speed every 18 months or so, speech recognition systems are still little better than those holiday translation dictionaries you can buy.

They do something usefull, but not to the level advertised, and won't in the forseeable future.

As to how the software works, I do understand it (and why it fails), which is why I have such a low opinion of the current software.


A781463 - Speech recognition

Post 10

Spelugx the Beige, Wizard, Perl, Thaumatologically Challenged

I haven't been in PR for a while, so this may seem like a stupid thing to say/ask, but I'll ask it anyway. Is it really a *problem* for an article submitted to PR to have been written using plain text? Is not having proper headings really a reason not to pick an entry? My view has always been that PR was more about checking whether the article made sense, checking factual accuracy, and rewriting unclear 'technical' (ie language specific to the subject been discussed) paragraphs, not turning the article into something that could be copied straight into the edited guide. What are your views?

spelugx -- former scout


A781463 - Speech recognition

Post 11

Spiff


hi Spelugx, smiley - smiley

as you ask for responses: I don't think any formatting is 'necessary for PR' but I *do* think *some* formatting makes a piece more readable, and is therefore helpful to reviewers. smiley - smiley

The suggestions to add headers and para marks were hardly overbearing and came along with other comments of the type you mention. Seems fair enough to me.

seeya
spiff


A781463 - Speech recognition

Post 12

Spelugx the Beige, Wizard, Perl, Thaumatologically Challenged

I would agree that formatting does make an article more readable, and therefore likely to favoured more by reviewers. What particularly irritated me was the remarks made by xyroth at the top of the thread, but I've now calmed down and re-read them, and they seem in line with what you say, but later you yourself say:

> This needs tags around your paragraphs, and some other minor
> formatting that would make it considerably more readable. It may seem
> like fancy extras, but it makes a difference.

It is totally unnecessary for guideml to be used, this post is not in GuideML yet I've used paragraphs. *This* is what particulary annoys me, the way that authors are encouraged to do more than just write. Surely writing the article is the most important thing. Polishing the formatting into a publishing format is the responsibility of those who will edit it. The other object I have about people pushing GuideML is that not everybody wants to learn how to use it, and the continual encouragement to learn it could discourage people from writing. For 'Bluttsuuft' this is his first article. The system should be easily accessable to people using it for the first time, in particular we don't want to set the standard too high.

spelugx -- lost this argument before.


A781463 - Speech recognition

Post 13

Smij - Formerly Jimster

Hi Bluttsuuft,

Just a note on the DNA quotes; you are of course, free to quote whoever you like in your entries. However, as the Edited Guide is intended to be accessible to all (whether they're Hitchhikers fans or not - this is, after all, not a Douglas Adams fan-site), references to 'dingoes kidneys' will only confuse non-Hitchhiker fans and might spoil an otherwise splendid entry. smiley - bubbly

As a rule, therefore, the editors remove/reword all off-topic Hitchhikers quotes from entries before they hit the front page.

Hope that helps to clarify the issue, smiley - smiley

Jimster


A781463 - Speech recognition

Post 14

Spiff


It's a fair point and I will bear it in mind. Far be it from me to discourage those who want to write but don't want to learn about gml. smiley - smiley

Even so, it might be said that by not encouraging authors to use guide ml, we end up giving more work for volunteer sub-eds, whose time is valuable too.

A similar point might be raised about fixing typos in PR, I guess.

Anyway, glad to hear you've calmed down! smiley - ok

seeya
spiff
*stifling in Strasbourg*


A781463 - Speech recognition

Post 15

Smij - Formerly Jimster

Hi spelugx,

GuideML is just a recommendation, as you point out. If the entry's good, it'd never be rejected just because it was in plain text (hence why that option exists). A good sub-editor can always do the work there. But it should be noted that GuideML is all part of the fun... well, maybe fun's too strong a word smiley - smiley

I should note that before I started at the Beeb, I'd hardly done any HTML, just very, very basic stuff. But I picked up GuideML very quickly.

Jimster


A781463 - Speech recognition

Post 16

Spiff


Yeah, I guess that's why i might tend to mention it quite often to new authors in PR- I'd never done any html at all and learnt here.

spiff
*wondering if Bluttsuuft mines us having a chat about the merits of guide-ml in his thread*

*figuring that if nothing else, at least it is keeping it at the top of the list! smiley - biggrin*


A781463 - Speech recognition

Post 17

Smij - Formerly Jimster

Good point, Spiff - chatting in the thread's a great way to make sure more people see your entry.

Jimster
(who got told off for playing Calvinball in his Robert De Niro thread)


A781463 - Speech recognition

Post 18

Wayfarer-- I only wish I were crackly

Hi Bluttsuuft.
Quotes are really only appropriate when they 1)are relevant to the topic and 2)help to make a point/help explain something or tell the reader something they may not have known already(ie, a quotation from a song that is being discussed). DNA semi-quotes are superfluous to an entry about using speech regonition software, and to someone who hasn't read the books or just doesn't remember that part, fetid dingo's kidneys will just seem bizarre and confusing.

And, as Jim pointed out, they get edited out anyway.


A781463 - Speech recognition

Post 19

Bluttsuuft

I am very happy with the attention my post is receiving. It is in itself an interesting interaction, worthy of a post. By observing your reactions one learns what is perceived to be important, what is deemed inappropriate and how these messages are communicated is a lesson all by itself.

Very interesting. Quite entertaining.

Guide-style HTML seems to be both desirable and non-essential. Should we take the trouble to learn it or should we just apply a finger to the key when the text wants to come up for air ?

References to The Creator should be shunned it seems, to protect the innocent. Selective amnesia on the other hand is not a problem.

Titles should state the business but not extensively so.

Editors seem to care for the plate but not for the dish.

Things to consider while rewriting the piece.

Stay in touch.







A781463 - Speech recognition

Post 20

Spiff


smiley - laugh

Nice response! smiley - biggrin

"Editors seem to care for the plate but not for the dish."

Please don't take this on my authority! smiley - yikes I think the sub-eds, as volunteers, enjoy the chance to read other researchers' contributions. and work hard getting them into shape for the EG. smiley - smiley

glad to see you enjoying peer review, smiley - smiley,

seeya
spiff


Key: Complain about this post