FlickStart Speech Module

Introduction

The speech module provides another way to trigger the FlickStart commands that drive apps on the phone and watch (or on PCs if you use FlickNet with FlickStart).

FlickStart can be made to turn on the speech recognition process at any time by just tapping the phone or speaking loudly into the microphone. The speech recognition process can be started whether the screen is off or on and regardless of which app is visible.

If the recognition process returns a set of words which match a command in FlickStart, the command will be run.

Using speech as the trigger has an advantage over movement in that the commands can be far more flexibly defined. For instance, with speech, a command can now be given different information on each invocation (eg. you can provide a different contact name to a command which might access the phone's SMS database).

In addition, you can design the matching process so that the tags containing the commands can be left active even if you aren't expecting to be using the tags in the short term. For example you could leave the tag for controling the TV remote app in your phone active even though you can't use it when you aren't home. The speech pattern for controling the TV remote app would be so different to the speech patterns triggering commands in other tags that you wouldn't accidentally trigger the the commands for the TV.

The speech module also uses a lot of text-to-speech. However, the text-to-speech sections only work in Android 5.0 or higher.

Follow this link for a description of the speech-based commands that come built-in to FlickStart.

Engaging the Speech Module

The 'MOVEMENT' tab has a button for turning speech recognition on and off, and a slider for adjusting the trigger volume:

While the speech button is on, FlickStart waits for a louder than normal sound and then calls Google's Speech Recognition engine.

The slider position determines how loud a sound has to be to wake up the speech recognizer. To do a voice based command you first must say something loudly enough to trigger the recognizer. The recognizer will beep to indicate it is ready. Then you say the sequence of words needed to trigger a command (eg. say 'OK' loudly, then say 'sms last' to have FlickStart read out the last received SMS). After you stop speaking the recognizer automatically hands the text to FlickStart or double beeps if it couldn't understand anything. If FlickStart gets some text from the recognizer but can't match the text to a command it will tell you.

Speech-based Commands

A command which uses speech rather than movement as the trigger specifies a speech pattern that is used to match against the text returned by the the speech recognition process.

A simple pattern might be "sms last". For such a pattern FlickStart would check that the speech recognizer returned two words and the words were exactly the two in the pattern.

More complex patterns can leave room to pick up parameters (eg. a contact name) and pass them on to the target app.

After any run of the speech recognizer that returns some text FlickStart runs through its list of speech-based commands trying to match each command's pattern against the returned text.

Any commands that match the returned text are run.

Finding Speech-based Commands

If you go to the 'TAGS' tab and scrol through it, you will find tags whose names end in "(Speech)" indicating (by convention rather than by definition) that the commands in the tag are speech-based:

If we tap on one of these (we will use the SMS (Speech) tag here) we will see the list of commands in the tag:

Any row where it says 'Speech Handheld' in the 'Flick type' column indicates the trigger for the command on that line is a speech-based command rather than movement based.

In the commands we have put together you will find a fairly wordy description, giving an indication of what you should say to make the command run, together with a little elaboration about what the command does.

Where a part of what should be said has square brackets around it ([]), this indicates a place where you put in a number or a word or perhaps a number of words.

Quirks using the Speech Recognizer

Launguage Choices

If you find that the speech recognizer can be started (you will here the beep), but it stops again before you get a chance to say anything (another beep or two), then the cause is likely to be that some langauge choices are conflicting.

The Google Speech Recognizer seems to require that the language used in various place be the same.

For instance if you have the language for the phone as 'English (UK)' and the language for the keyboard is 'English (US)', the speech recognizer will refuse to work.

Contact Names

Speech patterns where a contact name is expected are set to swallow up one or two words as the shorthand for a contact.

If the contact's nickname, given name or family name is enough to uniquely identify the contact in the database, you can just use the one word.

If you need to use two words to uniquely identify a contact, use the given and family names (in any order).

Pausing to Think

The Google Speech Recognizer can be quite picky about pauses while you say something to it.

If you pause for what might seem like a mere moment, the recognizer is likely to stop listening and hand what it has heard to FlickStart for processing without giving you a chance to complete what you had in mind.

Some Words are Too Hard

Sometimes you can find that the speech recognizer has trouble with particular words you say.

If that happens on words which are part of the pattern for a command it will be easier to edit the command replacing the troublesome word with another.

For example if the recognizer regularly has difficulty with the word 'last' (in something like 'sms last'), you could change the word in the pattern to 'recent'.

To change a command, tap on it and edit the 'Speech pattern' field. It's a good idea to also edit the 'Description' field with a matching change to avoid later confusion.

Word Order and Choice

The sample commands are easily adjusted if you find things like word order or word choice don't suit.

For instance, you might find that saying 'sms last' to have FlickStart read out the last received SMS feels wrong while saying 'last sms' feels more natural. You can edit the command easily to make such a change. Go to the list of commands in the SMS tag by starting on the 'MOVEMENT' tab and tapping on the line with 'SMS (Speech)':

Scroll down to the command which is used to read out the last SMS and tap the command:

Tap on the tag with the SMS commands to see the list of commands:

Tap on the command for 'sms last' to get the command details panel:

Then edit the 'Speech pattern' field from 'sms last' to 'last sms':

Finally, exit the panels with the 'Save' button till you get back to the complete list of tags and the change is complete.

Using apps that make Sounds

Apps like music players generate a lot sound if they are directed to the speaker on the phone and that sound will interfere with attempts to use speech recognition.

In scenarios where you want to combine sound output and speech recognition you can overcome the problems by using either wired or bluetooth headsets so that microphone hears you voice but not the sound from your app.

Making the Recognizer and Commands Active

If you try and start the recognizer by making a loud noise and nothing happens, its probably because you haven't make speech recognition active via the button on the 'MOVEMENT' tab.

If you say something which you are sure matches a command but FlickStart says it didn't match an active command, then it's likely that you haven't made the tag containing the command active. Check the 'TAGS' tab. Only tags with green backgrounds are going to match against the speech recognizer.

Making Speech-based Commands

Once you are in the panel where you fill in the details for a command, select the 'Sensor device' spinner:

That will leave the command creation panel ready to make a speech-based command. You will need to define what the user has to say in order to trigger the command:

Check the section on using patterns in speech commands in the manual for information about structuring the speech pattern field.