Tuesday, May 20, 2008

On the soapbox

Interesting discussion of limits of Speech recognition:

http://ixda.org/discuss.php?post=29030&search=voice+ui

Points I don't think I got across very well at the Boston-IA/AVIOS/BVUG meeting in March...
1. We need a new interaction model for voice to work better. Voice enabling the keyboard & mouse is not working.
2. I don't believe "natural language" works - humans are too sloppy
3. "Next Gen" mainstream voice/speech products and services provide the opportunity to explore new models based on actual human behavior. "User-centered Design" and user research is needed to develop products and services with high adoption rates.
4. Current (frustrated) DNS users - especially technology professionals who use speech - can play a role in this evolution. Hiring VUI professionals is a start, but it is better hire tech professionals who actually use speech! Require anyone who works on speech apps to use them!

Monday, May 19, 2008

Curious microphone behavior

Over the past week I have witnessed several occurrences of some strange microphone behavior. The recognition window suggests that it hears the following, "him him him him him him him him him". While I regularly use the playback on the correction to hear what has actually been said when I correct errors, I have not been able to catch this on audio to hear what is actually going on. It is inconsistent.

I've also recently noticed that the microphone is capturing a lot of breathing. I believe this indicates that I need to adjust my microphone position, particularly my boom. I actually have a mirror on my desktop facing me so I can monitor or microphone position, and while it looks like my microphone is in the right place I still pick up the breathing sounds. I don't know if the breathing sounds are related to the "him" occurrence.

I'm fairly convinced that the current user file I have is corrupt, but I've been using it for less than two weeks. I think it may be time to resort to my saved pristine user, or scratch the pristine user and retrain a new user.

Thursday, May 8, 2008

Can I phase out capitalization?

I speculate that I can improve my productivity by 45% if I could just ignore capitalization rules. Usually Microsoft Office applications will capitalize the first word in a sentence (which is tremendously helpful) , but there are many situations where it does not work. Also some words will be capitalized when they shouldn't be, I suspect DNS is responsible for this problem - the vocabulary uses the capitalized version of the word even though I intended to use the other. I spend so much time correcting capitalization errors. I'm sure I miss up to 10% of the errors in my documents. My life would be so much easier if I could just set the computer to all lowercase and get away with it!

I think part of the problem is the word "cap". I wish I could replace this or with another word that is more recognizable and easier to say.

I also get very confused when I train words. Sometimes the word appears capitalized in the training dialog box, and other times the spoken version appears, "Cap+word". I don't say "cap" when I see a capitalized word in the training dialog box, but I say the word "cap” + word when the spoken form is in the training dialog box. I wonder if there are best practices to improve the situation.

Is Excel hopeless?

While I train, and train, and train both "sell”, "cell”, and others like "spell”, DNS only gets it right 50% of the time. Also, my alphanumeric cell coordinates are misrecognized over half the time. This is such a drag!

I need to look into the alternatives... maybe there are solutions I am not aware of. If identifying a cell is a command, I could perhaps use command mode. I also toggle between using a single letter and using the military also that word - I wonder if this gets me into trouble.

Monday, May 5, 2008

Tracking performance & recognition accuracy

I wish I had a tool that I could run concurrently with Dragon to measure performance and recognition accuracy over a session. I think recognition accuracy could be measured fairly reliably, but I' I am less sure how you would measure performance. I would love to see how many times the "?????" was displayed with a link to the audio that led to the distress signal. I would also like to see some sort of report on my corrections. It would be helpful to see a summary of my corrections to detect patterns and make corrections. I feel like I make the same errors & corrections over and over again, but I can never sit down at the end of a session and remember the details of my experience.

While I am in my dream world, I would love to be able to send my audio file to someone who could tell me what speech patterns I have that are problematic for speech recognition software. Sometimes I feel like I do mumble or slur. I've always been told that I am a fast talker, and sometimes I feel like that may get in my way with Dragon. I suspect that someone with some professional experience in voice/speech (not necessarily a speech pathologist, but maybe even a public speaking coach) could listen to me and point out unproductive speech patterns that I don't even began to hear.

This post is inspired by several weeks of poor quality user files. I have replaced my "pristine" backup user, and recognition & accuracy still sucks. Grump.

Thursday, May 1, 2008

Stream of consciousness thoughts while Creating a New User File

Mysteries of the "Accuracy Center"
1. How do you train commands versus vocabulary?
2. How do you deal with capitalizations during training? (The command "Cap" is my most recognized, most irritating misrecognition).
3. Are links in the Accuracy Center to "Run in the Vocabulary Optimizer" the same as "Add words from your documents to the vocabulary" & "Increase accuracy from e-mail". Duh. Usability (or is it information architecture) 101. Group similar items. Use consistent labeling for items. Don't use a lot of jargon and expect any of the features to be understood by the end-user.
4. Should I assume that I can train a single word in my vocabulary (and avoid the view or edit vocabulary dialog box) by clicking the "Add a single word to your vocabulary" box.
5. "Check your audio settings". What in the hell is speech to noise ratio? The system offers very little in help to those with a volume that is too soft, or a bad ratio.
6. The Acoustic and Language Model Optimizer is regarded with much suspicion in the user community, I wish someone could explain to me what actually did and how to roll back if it does not help.

Common misrecognition's that I train when starting with a new user file...
Cell/sell
Be/the
Cap
Nope
him
this
an/and

7. What happens when you toggle
Options/Correction/Automatically add words to vocabulary (?)
on & off?

How bad does Dragon suck?

Most people who use Dragon NaturallySpeaking will admit that the program requires time, patience, and perhaps creativity (not to mention luck) to get it to work well. Nonusers often ask how difficult the program is to use. It is hard to give an answer because it is used for many different things in many different contexts. I would consider Dragon an amazing product if my goal were to purely dictate. Or if I used Dragon to transcribe my digital recordings I would think it was a miracle every time the words appeared on the screen. Not surprisingly Dragon is most popular in the medical & legal fields. However, I use Dragon for hands-free computing, what I believe is called "command & control", completing tasks that require rapidly switching between multiple programs. While Nuance markets the product to the disabled community, product team insiders are always for defensive about the products shortcomings claiming that the product was not designed with hands-free computing in mind. alas, the suck factor is relative.

I define success with Dragon in terms of productivity & lack of pain after working on the computer. Metrics of success include recognition accuracy, system performance, and how I'm able to apply my knowledge of DNS to the task at hand and avoid reaching for the keyboard & mouse. accuracy & performance & pain are all subjective, there are not really accurate ways to measure any of these. Trying to determine how successful I am with Dragon is simply a gut feeling.

I don't understand the relationship between recognition accuracy & system performance, but occasionally they work really well together, and when one sucks the other one seems to suck in equal proportion. this makes troubleshooting very difficult & frustrating. When Dragon is not working well there is like a list of at least 35 things to test & tinker with. I will get to that list later...