Building the Right Environment to Support AI, Machine Learning and Deep Learning
This part covers using the Microsoft Diction Control, and adding it to your previous project to make a full speech-to-text application. Adding the Diction control to the application is almost as easy as adding the Voice commands.
Just a few notes about the Diction control before I start. The diction control has a text buffer that has to be kept up to date with the text in the displayed text box. The current cursor position in the text box has to be updated in the control as well. When dictating, the diction control will update the internal copy and pass only the updated text to the application. You can use a standard text box to work with this method, but a Rich Text Box is recommended.
Now, you add the Diction control to your Project form, and call it 'VoiceDic'. To Initialise the voice diction and set the correct recognition mode, as with the Voice command control, you use the following:
Private Sub Init_Diction() VoiceDic.Initialized = 1 'Set the Diction control mode to Diction only. VoiceDic.Mode = VSRMODE_DCTONLY ' (&H20) VoiceDic.Activate End Sub
Note: The Mode property sets how the diction control reacts to any speech. More options for the property, and what they do, are listed in Part 3.
This will start up the Voice diction; any speech that is not part of voice commands will now be passed through the diction. This in turn is passed to the application through an event called 'TextChanged'. The event will tell you only that the internal text of the diction has changed and the reason why. (In other words, New text, Changed text, Replaced text, Deleted text). Within this event, you now have to query the diction's internal text for the changes and apply them to your rich text box. The following is a simplified version of the code.
Private Sub VoiceDic_TextChanged(ByVal Reason As Long) Dim newStart As Long Dim newend As Long Dim oldStart As Long Dim oldEnd As Long Dim selstart As Long Dim sellen As Long Dim Text As String ' Lock or pause the diction control temporarily. VoiceDic.Lock ' Request the currently buffered changes. VoiceDic.GetChanges newStart, newend, oldStart, oldEnd ' Was text deleted? If (oldStart < oldEnd) Then RTBox.selstart = oldStart RTBox.SelLength = oldEnd - oldStart ' Delete the text from the Rich text box. RTBox.SelText = "" End If ' Was text added? If (newend > newStart) Then RTBox.selstart = newStart RTBox.SelLength = 0 ' Get the new text from the control. VoiceDic.TextGet newStart, newend - newStart, Text ' Add the new text to the Rich text box. RTBox.SelText = Text End If ' Release the Lock on the Diction Control. VoiceDic.Unlock End Sub
Don't be too worried about why the text has changed; simply ignore reason. First, you have to Lock the diction control to prevent it from reactivating and changing the text before you can query the current changes.
The Getchanges method queries the Diction control for the start and end of any changes. Oldstart and OldEnd are references to deleted text, so you select the text and remove it. NewStart and NewEnd are references to added text.
You now set your rich text box to the same start position to insert the new text. Using the TextGet method with the NewStart and NewEnd values returns the text that needs to be inserted to your Rich text box. After doing that, you now can unlock the diction control.
But now, what if the user has manually edited the text or clicked in a new position, to add text in the middle of the document. All these events need to be placed back into the diction control to tell it where to carry on. When the user uses the mouse to select text, you can use the click event to notify the diction control of the new selected text using the TextSelSet method, giving it the start position and length of the selection.
Private Sub RTBox_Click() ' Lock or pause the diction control temporarily. VoiceDic.Lock VoiceDic.TextSelSet RTBox.selstart, RTBox.SelLength ' Release the Lock on the Diction Control VoiceDic.Unlock End Sub
If the user edits the selected text, you can use the keypress event to pass the new text back into the diction control. The TextSet method updates the internal copy of the text. Its parameters are Text, Start position, Text length, and Reason.
Private Sub RTBox_KeyPress(KeyAscii As Integer) Dim s As String ' Lock or pause the diction control temporarily. VoiceDic.Lock ' Set the start of the dictions selected text to the ' same as our Rich Text Box. VoiceDic.TextSelSet RTBox.selstart, 0 'Place the pressed key character in the correct place 'in the diction control. VoiceDic.TextSet Chr$(KeyAscii), _ '(&H10000) RTBox.selstart, RTBox.SelLength, _ VDCT_TEXTCLEAN ' Release the Lock on the Diction Control VoiceDic.Unlock End Sub
By using some of the methods included with the Rich text box, you also can trap Cut, Copy, and Paste events and, by using code simular to the above, place the changes in the Diction Controls Text. Drag and Drop methods also can be added.
Then, you have the problem of some users trying to type and dictate at the same time. You might think this funny, but it does happen. So, what you looking for is a method to either stop the diction control when typing, or prevent typing while dictating. Yor Lock and Unlock methods in the Keypress event take care of pausing the diction control while typing. This still isn't perfect, so you have to look at preventing typing while dictating.
The Diction control gives you two events that you can use to prevent typing during diction—UtteranceBegin and UtteranceEnd. You can use these two events to Enable and Disable the Rich text box whenever there is diction taking place. UtteranceBegin is triggered as soon as the diction control detects a sound, and UtteranceEnd is triggered about .25 - .5 seconds after silence is detected. So, now you can add the following to your code:
Private Sub VoiceDic_UtteranceBegin() RTBox.Enabled = False End Sub Private Sub VoiceDic_UtteranceEnd() RTBox.Enabled = True End Sub
The example code adds a simple but fully functional Voice diction to your application, but there are not many variations to this. The diction control has the option of returning commands, like the Voice command control does, but only in a single menu set. A few advanced options are available with the Diction control, including Changing and Adding new Speakers (Users) to the Recognition Engine, and calling up the Voice Training Dialogs and automatic User detection. Details of these methods are listed in the next article.
In the download file, there is a simple application that uses what you covered in this article as well as the Voice commands that you covered in Part 1.
The next article will look at some of the other Events and Fuctions available in the Voice Command and Diction controls, and you will add the Text to speech control to your simple application in the last article.
A small note: The diction control requires a minimum of three hours voice training to get an average 90% accuracy; even then, it will make what appears to be major mistakes. The Speech recognition engine uses a phonetic alphabet, where some phonemes are very similar, and the engine also uses a common word grouping system to try and increase the accuracy. If one word is miss-recognised somewhere in the middle of the sentence, the engine may change some of the words prior and those that follow to make an understandable sentence. This can be frustrating at times but, more often than not, it does make correct assumptions.
Just remember to talk a bit slower than normal and to mouth the words correctly. Slang words can cause havoc to the recognition. Use "can not" rather than "can't", "is not" or "am not" rather than "ain't", and "mortuary" rather than "morgue."