KVoiceControl- User's Guide
Introduction
KVoiceControl is a tool that gives you voice control over your unix commands.
It uses a template matching based speaker dependent isolated word
recognition system with non-linear time normalization (DTW).
Note, that isolated word does not necessarily mean one single
word. It just describes a class of recognition systems among
-
isolated word
-
connected words
-
continuous speech
-
spontaneous speech
Consider the following example:
KVoiceControl knows five utterances (being connected to appropriate
commands ...)
-
<connect to internet>
-
<netscape communicator>
-
<one xterm please>
-
<launch emacs NOT VI!>
-
<how are you today?>
So the recognition vocabulary consists of five "words" (=utterances) that
can only be recognized one at a time, i.e
you cannot connect the words to build "sentences" like "<how
are you today?> please launch <netscape communicator> and <connect
to internet>"!
Important Note:
As mentioned above you do not need to use one single word as an utterance
- and it is strongly recommended to use longer sequences !!!!! This
is due to the base concept of this recognition system: template matching.
So if you used short commands like say <xterm> and <xedit>
confusability would be increased significantly and therefore recognition
accuracy drops rapidly!!!
Basics
KVoiceControl uses a speakermodel that contains sample utterances
for the recognition process.
These speakermodels can be loaded/saved, so one can create different
speakermodels for different speakers or even have different models for
the same speaker.
A speakermodel contains references that consist of the following elements
(*):
-
The reference's name (that means normally you would enter here what is
being said)
-
The command to execute (when KVoiceControl has recognized this reference)
-
Sample Utterances for this reference
All The references are listed within the ListBox of KVoiceControl's main
GUI.
Edit References
The Buttons to the right of the ListBox can be used to edit the references.
-
New create a new reference (untitled)
and add it to the ListBox
-
Delete delete a reference from the list
-
Edit ... invokes the reference editor dialog....
Within the reference editor one can adjust the stuff listed at (*).
Text contains the name of the reference, Command contains
the command to execute.
Note: You can use the special command detect_mode_off
to switch off recognition mode!
The ListBox below contains the sample utterances for this reference.
You should enter between two and five (or even more) sample utterances
per reference in order to ensure good recognition performance! (The
more the better, but the more machine power is needed!)
Beware:
Within this dialog, KVoiceControl automatically detects signals coming
from the soundcard. That means that your sample utterances are being
recorded automagically! -> Just speak! ;-)
BTW: A pre- and a postfetch sound buffer ensure that the signal is
not cut off dramatically, so automatic recording is working fine!
Recorded utterances are always being replayed on your soundcard, so
you can check whether the recording was OK! After recording (and preprocessing)
the utterance is being added to the listbox using actual system date and
time as the entry name. Broken utterances can be deleted using the Delete
button.
Recognition Mode
Select Detect Voice from the Options menu to let KVoiceControl
enter "action mode". Having Detect Voice selected, KVoiceControl again
automatically detects sound signals, records what you say, "pattern matches"
this utterance to all references and executes the command of the "best
fitting" reference.
An utterance is accept if:
-
its score is below a specified threshold and
-
the first and second best score belong to the same utterance or
the distance between first and second score is higher than a given
threshold
Options
You can adjust the following options within KVoiceControl:
-
Recording Threshold
This is the minimum integer value, that triggers the automatic sound
recording process.
A value around 10 works fine for me (This value can be adjusted
automatically now using the Calibration functionality!).
-
Accepted Silence Frames
Here one can specify how many silence frames (1 frame = 0.125
sec) shall be accepted during recording
without stopping. This is needed to be able to use multi-word-utterances
that contain silence frames.
My system accepts 4 frames
-
Adjustment Window Width
This value is used within the pattern-matching process. Roughly
speaking, the bigger the value, the better the recognition but the more
calculation power needed. For more details: kiecza@ira.uka.de
(I use a value around 70)
-
Rejection Threshold
The score of the best matching reference utterance must not be bigger
than this threshold; otherwise nothing is being recognized (15.0 suits
for me)
-
Minimum Distance Between Different Hypos
A reference is accepted as recognition result, when the two best
scored utterances belong to that reference (and are below the rejection
threshold) or when only the best utterance belongs to that reference but
not the second best and the score distance between these two utterances
is bigger than the value specified here. (I use 3.5 here)
Train References From File
To train several references comfortably select Train From File ...
from the Options menu.
In the following file dialog you can choose a .txt file that has to
contain per line:
<Reference Name>TAB<Associated Unix Command>
(see commands.txt for an example)
After this file selection a Reference Trainer dialog pops up. The use
of this dialog should be clear ...
Remind: Sample Utterance recording is done automatically, too!
Calibrate Microphone
KVoiceControl uses a calibration dialog to adjust your microphone's levels
(start level and stop level).
For this purpose choose Calibrate Micro ... from the Options
menu.
You are then asked to start a mixer program (like kmix) that is needed
to adjust the soundcard's microphone in level.
The next dialog then shows the actual level value coming coming from
MIC IN. You must adjust the mic level in the mixer so that this value
is stable at zero!
Pushing OK leads you to the calibration of the start recording level.
You are asked to talk to your microphone for some seconds. KVoiceControl
then extracts a proper recording level automatically.
If the level values seem plausible calibration is done. Else KVoiceControl
restarts the calibration process.
Panel Docking
KVoiceControl is docking onto the panel, showing two LED lamps. The functionality
is as follows:
The upper (yellow) LED is on, when
KVoiceControl is in voice autodetection mode. When you start speaking
and as long as you speak, this LED
will blink. After the utterance is finished the LED switches off and
the lower LED blinks red - meaning: recognition
in progress. If the recognition is successful this LED will switch
to green, otherwise deactivates ..... after
recognition is done, KVoiceControl automatically switches back to voice
autodetection mode.
Mouse control on panel:
-
left click: toggle the state of the main window between hidden
and on-screen
-
right click: toggle voice autodetection mode
Last changed: 14. Jun 1998
Daniel Kiecza