VB and Voice Recognition, Part 3: The Voice Controls

My previous articles, "VB and Voice Recognition" and "VB and Voice Recognition, Part 2" covered a few of the basic properties and methods of the Voice recognition controls. You now are going to look at each control's properties, methods, and events. I gave a very basic description of what each does, the variables used, and a short list of valid settings. In the download is a module containing all the constants detailed in this article.

Some of the properties and methods of the two controls are not listed here. This is because they either are not fully implemented yet (the AttributeMemory property, for example, is specified as 'Under construction' in the SDK help files), or Methods that work with Properties are not fully implemented yet. The CopyToMemory method, for example, has (and I quote the SDK Helpfiles) 'If an application wishes to free the information, both chunks must be freed with [??]. Also, the application must call [???] (only once) on each of the [???] returned by [???}.' in the description.

Right, so take a look at the Voice Command Control first. Here are the properties.

Properties

AutoGainEnable () As Long

This property forwards the applied gain setting to or from the sound card, if the sound card supports Auto gain.

Valid Settings: 0 to 100 (0 = No gain, 100 = 100% Auto gain adjustment)

AwakeState () As Long

Although this property is defined as long, it uses a Boolean (True of False) setting. When set to True (awake), the control will listen for commands from any voice menu in the application. When set to False (asleep), the control will listen for commands only listed in the sleep menu in the application.

This property differs significantly from the Activate and Deactivate methods. These two methods actively activate or deactivate only the menu listed, not the entire control.

CountCommands (Menu as long) As Long

Passing a command menu number to the property will return the number of commands currently on the menu.

Device () As Long

Returns or sets the device identifier of the Wave-In device used by the control. Most users need not worry about altering this property unless you have multiple audio capture devices installed on the PC.

Enabled () As Long

Again, although this property is defined as a long, it uses a Boolean setting. This enables or disables the control. When disabled, the control will not respond to any voice commands, even those on the sleep menu.

EnableMenu () As Long

Used to enable voice menu number x.

Note: You need to use the value returned by MenuCreate.

HWnd () As Long

The Window handle of the control.

Initialized () As Long

Returns or sets the initialised state of the control. Most methods and properties will automatically initialise the control.

Valid Settings: 0, 1 (0 = Not initialised, 1 = Initialised)

LastError () As Long - Read only

Returns the result code of the last method or property called.

MenuCreate (Application as String, State as String, Flags as long) As Long

Creates a Voice Menu object in the control returning a unique identifier.

Valid Flags:

  • VCMDMC_CREATE_ALWAYS: Creates a empty menu. If it exists, the menu is cleared.
  • VCMDMC_CREATE_NEW: Creates a empty menu. If it exists, an error is returned.
  • VCMDMC_OPEN_ALWAYS: Opens the menu. If it does not exist, an empty menu is created.
  • VCMDMC_OPEN_EXISTING: Opens the menu. If it does not exist, an error is returned.

Microphone () As String

The name of the type of microphone used.

Speaker () As String

The name of the currently set speaker (User). The control can be set up for multiple users, each with their own training files.

SRMode () As String

The GUID of the recognition engine mode.

SuppressExceptions () As Integer

Enables or Disables the suppression of Exception errors. When enabled, you can check the LastError property to get the Error code of the last method or property call.

Threshold () As Long

A % level that the control compares the match % to. Any match under the threshold level is discarded as unrecognised. A high value can return a good match to the spoken command. A low value will return more matches that are not necessarily exact matches.

Valid Settings: 0 to 100 (0= Worst case match, 100 = Best case match)

The Methods

Activate (Menu as Long) As Long

Activates the secified menu in the control. Until the menu has been activated, no command on the list will be recognised.

AddCommand (Menu As Long, id As Long, command As String, description As String, category As String, flags As Long, action As String)

Adds or modifies a command to the menu specified.

Valid Flags:

  • VCMDCMD_DISABLED_PERM: Disables this command on the menu.
  • VCMDCMD_VERIFY: Asks for verification before passing the command to the application.
  • VCMDCMD_CANTRENAME: Locks the command on the menu so that it cannot be changed.

CmdMimic (Application as String, State as String, Command as string)

Causes the control to respond as though the command has been spoken by the user.

Deactivate (Menu as Long)

Deactivates the speciffied menu in the control.

EnableItem (Menu As Long, Enable As Long, Cmdnum As Long, Flag As Long)

Enables or Disables items on the menu. Although defined as long, Enable accepts Boolean (True or False)

Valid Flags:

  • VCMD_BY_IDENTIFIER: Cmdnum is the identifier of the menu item.
  • VCMD_BY_POSITION: Cmdnum is the position in the list of the menu item.

GetCommand (Menu As Long, Index As Long, Command As String, Description As String, Category As String, Flags As long, Action As String)

Returns details about the specified command on the given menu.

Valid Flags:

  • VCMDCMD_DISABLED_PERM: This command is disabled.
  • VCMDCMD_VERIFY: This command requires verification before passing to the application.
  • VCMDCMD_CANTRENAME: This command is locked and cannot be changed.

ListGet (Menu As Long, List As String, Listdata As String)

Returns a list of commands stored on the specified menu by Listset.

ListSet (Menu As Long,List As String, ListNum As Long, Listdata As String)

Sets a list of phrases in the voice menu. The user can speak any phrase in the list in place of the list name. The usage here is best described with an example:

AddCommand Vcmenu ,1,"Today is <Weekday>","list example","",2,""
ListSet VCmenu,"Weekday List",3,"Monday|Tuesday|Friday"

MenuDelete (Application As String, State As String)

Deletes a menu from the control. You cannot delete a menu if it is currently active.

ReleaseMenu (Menu As Long)

Releases the menu identifier and frees up the allocated memory.

Remove (Menu As Long, Index as long)

Removes the specified command from the menu.

SetCommand - Requires (Menu As Long, Index as long, id As Long, command As String, description As String, category As String, flags As Long, action As String)

Changes the information for an existing command on the menu specified. (Options are the same as for Addcommand.)

The Events

AttribChanged (Attrib as Long)

Occurs when one of the attributes of the control is changed.

ClickIn (X as Long, Y as Long))

Occurs when the user clicks the object's icon.

CommandOther (CmdName As String, Command As String)

Occurs when a phrase was not recognized as being from the application's command set. This event is mostly ignored in applications.

CommandRecognize (ID As Long, CmdName As String, Flags As Long, Action As String, NumLists As Long, ListValues As String, command As String)

Occurs when a spoken phrase was recognized as being from the application's command set, and returns all the commands set's options.

CommandStart ()

Occurs when recognition processing has begun for a command.

MenuActivate (cmdName As String, bActive As Boolean)

Occurs when a voice menu has been activated or deactivated.

UtteranceBegin ()

Occurs when the speech recognition engine has detected the beginning of an utterance or sound.

UtteranceEnd ()

Occurs when an utterance is finished. Typically, when between .25 to .5 of a second of silence is detected.

VUMeter (Level As Long)

Notifies the application of the current VU level.

On the next page, you will see the Diction Control.

VB and Voice Recognition, Part 3: The Voice Controls

After covering the Voice Command Control, the Diction Control has quite a few more properties and methods. I will explain only those that are not the same as the Voice command control.

Note: All the Properties marked * are the same as the Command properties.

The Voice Diction Control Properties

*AutoGainEnable () As Long

CountCommands (FGlobal as Long) As Long - Read Only

Although FGlobal is defined as long, it requires a Boolean (True or False) setting. Returns the number of commands in the global (FGlobal = True) or application-specific (FGlobal = False) command set.

CountGlossary (FGlobal As Long) As Long - Read Only

Returns the number of glossary entries in the global (FGlobal = True) or application-specific (FGlobal = False) glossary entry set.

CountSpeakers () As Long

Returns the number of speakers known to the speech recognition engine.

Echo () As Long

Although this property is defined as long, it requires a Boolean (True or False) setting. by setting echo to True, the Recognition engine will treat soft noises and Mumblings (in other words, background radio sounds, loud neighbours) as white noise and not try to recognise them. A setting of false will allow the Recognition engine to try and recognise any noise or sound detected.

EnergyFloor () As Long

Decibel value of the residule noise in a room or office (Signal to Noise Ratio). Sets a low value for a quiet office (High SNR) or a Higher Value for a noisy office (Low SNR).

Valid Settings: SRATTR_MINENERGYFLOOR to SRATTR_MAXENERGYFLOOR (&H00 to &HFFFF),

Flags () As Long

Indicates the state and settings of the correction's window.

Valid Settings:

  • VDCTGUIF_VISIBLE: Correction's Window is visible.
  • VDCTGUIF_DONTMOVE: Correction's window position locked. - (Read Only)

*HWnd () As Long

*Initialized () As Long

IsAnyoneDictating (HWnd as Long) As String

Returns the application that activated the diction control for the supplied window handle. If HWnd is null, returns the application for the global Diction control.

*LastError () As Long - Read only

*Microphone () As String

Mode () As long

Indicates the current mode of operation for the current application for both the Voice Dictation and Voice Command controls.

Valid Settings:

  • VSRMODE_DISABLED: Diction and Command are disabled.
  • VSRMODE_CMDPAUSED: Voice command is paused and listening for the wake up phrases.
  • VSRMODE_CMDONLY: Voice command active. Diction is not active.
  • VSRMODE_DCTONLY: Diction is active. Voice command is not active.
  • VSRMODE_CMDANDDCT: Diction and command is active.

Option (Option as string) As Boolean

Returns the Boolean state of the Option specified. A list of options can be obtained by using OptionsEnum.

OptionsEnum () As String - Read Only

Returns the names and descriptions of all the normalization options available in the following format (name of option = description).

RealTime () As Long

Sets the percentage time the processor takes to process speech. (Used as a guide, the processor will try to match this setting as best as posible.) With a value of 100, the processor will take 1 minute to process 1 minute of speech; and with a value of 50, the processor will take 30 seconds to process 1 minute of speech. For non-Real-time applications (Pre-recorded audio), a value of 200 or more can be used. The effect this will have on the accuracy of recognition is unknown at the moment, but it is assumed that the more processor time allocated to recognition, the more accurate the recognition will be.

*Speaker () As String

SpeakerGet (Index as Long) As String - Read only

Returns the name of the speaker in the Index position.

*SuppressExceptions () As Integer

*Threshold () As Long

TimeoutComplete () As Long

The time the recognition engine waits before it regards a phrase complete, in milliseconds.

TimeoutIncomplete () As Long

The time the recognition engine waits before it discards a phrase as incomplete, in milliseconds. (Should be longer than TimeoutComplete.)

The Methods

Activate ()

Activates the diction control. But, before diction can take place, the control must be set in the appropriate Mode.

*AddCommand (Global As Long, Id As Long, Command As String, Description As String, Category As String, Flags As Long, Action As String)

Although defined as Long, the Global variable accepts Boolean (True or False) values. Does the same as Addcommand in the Command Control; the only difference is the first variable. Setting Global to True will add the command to the Global commands. Setting Global to False will add the Command to the application.

AddGlossary (Global As Long, Id As Long, Glossary As String, Description As String, Category As String, Flags As Long, Action As String)

Adds a Glossary item to the Global Glossary (Global = True) or Application Glossary (Global = False). Glossary items cannot contain lists like Command items.

BookmarkAdd (ID As Long, Posn As Long)

Adds a bookmark into the text. The meaning of the bookmark is left up to the developer.

BookmarkRemove - Requires (ID As Long)

Removes the bookmark.

*Deactivate ()

FX (FX As Long)

Performs the specified action on the selected text.

Valid Settings:

  • VDCTFX_CAPFIRST: Capitalize the first letter of all the words selected.
  • VDCTFX_LOWERFIRST: Lower cases the first letter of all the words selected.
  • VDCTFX_TOGGLEFIRST: If the first letter is lower case, the first letter is capitalized and all other letters are lower cased. If the first letter is upper case, the whole word is lower cased.
  • VDCTFX_CAPALL: Capitalizes all of the words selected.
  • VDCTFX_LOWERALL: Lower cases all of the words selected.

GetBookMark (Start As Long, Length As Long, Index As Long, Id As Long, Posn As Long)

Returns the Bookmark Information within the specified text block.

GetChanges (NewStart As Long, NewEnd As Long, OldStart As Long, OldEnd As Long)

Returns the start and end of changes made to the internal text. NewStart and NewEnd return the start and end of the new text to be inserted. OldStart and OldEnd return the start and end of the old text to be removed.

*GetCommand (FGlobal As Long, Index As Long, Command As String, Description As String, Category As String, Flags As Long, Action As String)

See note on AddCommand.

GetGlossary (FGlobal As Long , Index As Long, Glossary As String, Description As String, Category As String,Flags As Long, Action As String)

Returns information about a glossary entry.

Lock ()

Locks the internal text object and prevents the diction control from doing any further processing. The application can still make changes to the internal text with TextSet and TextRemove.

*RemoveCommand (FGlobal As Long, Index As Long)

See the note on AddCommand.

RemoveGlossary (FGlobal As Long, Index As Long)

Removes the specified glossary entry from the glossary entries.

*SetCommand (FGlobal As Long, Index As Long, Id As Long, Command As String, Description As String, Category As String, Flags As Long, Action As String)

See the note on AddCommand.

SetGlossary (FGlobal As Long, Index As Long, Id As Long, Glossary As String, Description As String, Category As String, Flags As Long, Action As String)

Changes the information for the glossary entry specified. (Options are the same as for AddGlossary.)

SpeakerDelete (Name As String)

Deletes all information that the recognition engine has about the named speaker.

SpeakerNew (Name As String)

Creates a new speaker with the specified name. If the name already exists, the SpeakerNew method returns an error.

SpeakerQuery (Name As String, Data As String)

Returns the name of the currently selected speaker in the Data Variable. Name is listed as 'Not used'.

SpeakerSelect (Name As String, Lock As Boolean)

Changes the current speaker to the one specified. If the Named speaker does not exist, a new speaker is created. If Lock is set to True, only calls from the application can change the speaker again.

TextGet (Start As Long, NumChars As Long, Data As String)

Returns the specified text from the internal buffer.

TextRemove (Start As Long, NumChars As Long, Reason As Long)

Removes the specified text from the internal buffer.

Valid Settings for Reason:

  • VDCT_TEXTCLEAN
  • VDCT_TEXTKEEPRESULTS

TextSelGet (SelStart As Long, SelLen As Long)

Returns the position of the text selection in the internal buffer. Applications should keep this in sync with the text they display to the user.

TextSelSet (SelStart As Long, SelLen As Long)

Sets the position of the selection in the internal buffer. Applications should keep this in sync with the text they display to the user.

TextSet (NewText As String, Start As Long, Numchars As Long, Reason As Long)

Replaces the text selection defined with NewText.

Valid Settings for Reason:

  • VDCT_TEXTCLEAN
  • VDCT_TEXTKEEPRESULTS

TrainGeneralDlg (HWnd As Long, Title As String)

Displays a General Training dialog box.

TrainMicDlg (HWnd As Long, Title As String)

Displays a Microphone Training dialog box.

Unlock ()

Unlocks the text object so that dictation can take place. Also resets the Start and End values for GetChanges.

The Events

*AttribChanged (lAttrib As Long)

*ClickIn (x As Long, y As Long)

CommandBuiltIn (Command As String)

Occurs when a spoken command was recognized and handled. Most applications ignore this event.

*CommandOther (Command As String)

*CommandRecognize (Id As Long, Flags As Long, Action As String, Command As String)

Dictating (Appstring As String, Flags As Long)

Occurs when an application starts or stops dictation.

Valid Values for Flags: 0 , 1 (0 = Diction has stoped, 1 = Diction has started)

Interference (Attrib As Long)

Notifies the application that the engine cannot recognize speech properly for a known reason.

Valid Settings:

  • SRMSGINT_NOISE: Background noise is too loud.
  • SRMSGINT_TOOLOUD: The Speaker is talking too loudly.
  • SRMSGINT_TOOQUIET: The Speaker is talking too softly.

PhraseFinish (Flags As Long, Phrase As String)

Informs the application that the speaker has finished a phrase and the speech recognition engine is certain about the words that were spoken. Most applications ignore this event.

Valid Settings for Flags:

  • ISRNOTEFIN_RECOGNIZED: If the Recognition threshold was reached, this flag will be set.
  • ISRNOTEFIN_THISGRAMMAR: If this Phrase scored the highest, this flag will be set.

PhraseHypothesis (Flags As Long, Phrase As String)

Informs the application that the recognition engine has a hypothesis (general idea) about the phrase. Most applications ignore this event. Settings are the same as for PhraseFinish.

PhraseStart ()

Notifies the application that the engine has started processing speech. Most applications ignore this event.

TextBookmarkChanged (Val As Long)

Notifies the application that a bookmark has been changed.

TextChanged (Reason As Long)

Notifies the application that the internal text has been changed.

Valid Settings:

  • VDCT_TEXTADDED: Text was added.
  • VDCT_TEXTREMOVED: Text was removed.
  • VDCT_TEXTREPLACED: Text was replaced.
  • VDCT_TEXTMOVED: Text was moved.

TextSelChanged ()

Notifies the application that the selection has changed. Some built-in commands might cause the selection to change.

Training (Train as Long)

Notifies the application that the engine requires training. The application should call the appropriate training dialog box for the engine.

Valid Settings:

  • SRGNSTTRAIN_MICROPHONE: Training required for the current microphone.
  • SRGNSTTRAIN_GRAMMAR: Training required for the current grammar.
  • SRGNSTTRAIN_GENERAL: General training required.

*UtteranceBegin ()

*UtteranceEnd ()

*VUMeter (vu As Long)



About the Author

Richard Newcombe

Richard Newcombe has been involved in computers since the time of the Commodore 64. Today, he has excelled in programming, and designs. Richard is in his mid 30's and, if or when you looking for him look no further than his computer. Always willing to help and give advice where he can in regard to computer related subjects. At present he is working as a .NET 2008 Software Developer for Syncrony Web Services, South Africa.

Downloads

Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • IBM Worklight is a mobile application development platform that lets you extend your business to mobile devices. It is designed to provide an open, comprehensive platform to build, run and manage HTML5, hybrid and native mobile apps.

  • Protecting business operations means shifting the priorities around availability from disaster recovery to business continuity. Enterprises are shifting their focus from recovery from a disaster to preventing the disaster in the first place. With this change in mindset, disaster recovery is no longer the first line of defense; the organizations with a smarter business continuity practice are less impacted when disasters strike. This SmartSelect will provide insight to help guide your enterprise toward better …

Most Popular Programming Stories

More for Developers

Latest Developer Headlines

RSS Feeds