Offline Voice Command and Text-To-Speech

      No Comments on Offline Voice Command and Text-To-Speech

You all heard of Siri 🙂 It’s an online Voice Command and Text-To-Speech engine. Recently I ran into a great framework for offline voice command and text to speech, and it works great. It’s called OpenEars and you can read more about it on that link.

I wanted to try it out, so I created a really simple app to test out the framework, I was surprised at how easy it was to use, and how well it worked. So let’s get down to the code.

Prepare the project

Download your copy of the framework from OpenEars website, unpack it and drag the ‘Framework’ directory to XCode project navigator. Now all you have to do is linkup against ‘AVFoundation’ and ‘AudioToolbox’ frameworks, and you’re set to go.

offlinespeech

In your view controller header file import:

#import <OpenEars/OpenEarsEventsObserver.h>

And mark you class as implementing that delegate. Let’s move to the implementation file. If you don’t want text-to-speech all you will need are these three imports:

#import <OpenEars/LanguageModelGenerator.h>
#import <OpenEars/AcousticModel.h>
#import <OpenEars/PocketsphinxController.h>

If you want to use text-to-speech just add these two imports (on top of those three you added previously):

#import <Slt/Slt.h>
#import <OpenEars/FliteController.h>

Implementation

Next you will need to create instance variables and allocate controllers for your observer, pocket sphinx controller and, if you want, slt and flite controller, so let’s do that:

- (void)createEventsObserver {
    if (!_eventsObserver) {
        _eventsObserver = [OpenEarsEventsObserver new];
        _eventsObserver.delegate = self;
    }
}

- (void)createPocketSphinxController {
    if (!_pocketSphinxController) {
        _pocketSphinxController = [PocketsphinxController new];
        _pocketSphinxController.outputAudio = YES;
    }
}

- (void)createSlt {
    if (!_slt) {
        _slt = [Slt new];
    }
}

- (void)createFLiteController {
    if (!_fliteController) {
        _fliteController = [FliteController new];
    }
}

Now we’ll create the commands array, and start listening to the commands:

- (void)createCommandsArrayAndStartListening {
    NSArray *commandsArray = [NSArray arrayWithObjects:@"RED", @"BLUE", @"GREEN", @"YELLOW", nil];
    
    LanguageModelGenerator *languageModelGenerator = [[LanguageModelGenerator alloc] init];
    
	NSError *error = [languageModelGenerator generateLanguageModelFromArray:commandsArray
                                                             withFilesNamed:@"FirstOpenEarsDynamicLanguageModel"
                                                     forAcousticModelAtPath:[AcousticModel pathToModel:@"AcousticModelEnglish"]];
    NSDictionary *dynamicLanguageGenerationResultsDictionary = nil;
	if([error code] == noErr) {
		dynamicLanguageGenerationResultsDictionary = [error userInfo];
		
		NSString *lmPath = [dynamicLanguageGenerationResultsDictionary objectForKey:@"LMPath"];
		NSString *dictionaryPath = [dynamicLanguageGenerationResultsDictionary objectForKey:@"DictionaryPath"];
        
        [_pocketSphinxController startListeningWithLanguageModelAtPath:lmPath
                                                      dictionaryAtPath:dictionaryPath
                                                   acousticModelAtPath:[AcousticModel pathToModel:@"AcousticModelEnglish"]
                                                   languageModelIsJSGF:NO];
	}
}

All that’s left is implement the delegate callback, and you’re set:

- (void)pocketsphinxDidReceiveHypothesis:(NSString *)hypothesis
                        recognitionScore:(NSString *)recognitionScore
                             utteranceID:(NSString *)utteranceID {
    if ([hypothesis isEqualToString:@"RED"]) {
        [self redButtonAction:nil];
    } else if ([hypothesis isEqualToString:@"BLUE"]) {
        [self blueButtonAction:nil];
    } else if ([hypothesis isEqualToString:@"GREEN"]) {
        [self greenButtonAction:nil];
    } else if ([hypothesis isEqualToString:@"YELLOW"]) {
        [self yellowButtonAction:nil];
    } else {
        [_fliteController say:[NSString stringWithFormat:@"Unknown command %@", hypothesis]
                    withVoice:_slt];
    }
}

As you can see from the callback I’m just checking if the detected command matches any from the array and I’m calling a dummy action on it. The callback will only return the words you provided in the array, however it might provide you with a few words in sequence, if you say ‘Blue Red’ without a 1 second pause between the words you will get a ‘BLUE RED’ hypothesis, that’s when the text-to-speech engine will kick in, and the slt voice will say ‘Unknown command…’

Well this has been a fun little tutorial, hope you enjoy it and play around with it, you can download the code here DAVoiceControlTest.

Have a nice day 🙂
Dejan

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.