AVSpeechSynthesizer is a part of the AVFoundation framework that's used to convert text to spoken audio. It's a great tool for converting text to speech and it works surprisingly well. In this short article we'll be using AVSpeechSynthesizer on a small demo app.

Text to Speech

A long, long time ago I wrote an article on how to use offline speech recognition. Part of that article covers the text to speech capability of the OpenEars library. If you need the speech recognition this is a great option. But if you only need the text to speech capability then AVSpeechSynthesizer is a better option. It's very simple to use and you don't have to install a third-party library to use it. AVSpeechSynthesizer works completely offline, that's an added bonus. After initialising the class, all you have to do is pass it an instance of AVSpeechUtterance and it will read out the text. In most cases it will be enough for you to simply initialise the AVSpeechUtterance instance with the default parameters, since it works so well. You can have some fun and fine tune the utterance using custom values for the pitch and the rate of speech. But you probably don't want that high-pitched helium voice in your production app, right :) Let's jump into a quick example and see it at work...

The Example

We'll use an existing app we wrote a while back when we talked about dependency injection. You can check out the branch here. It's a simple app that connects to 'themoviedb' and downloads a list of popular movies. You can then display some details about a certain movie. Our app will speak out the title of the selected movie and it will read out the short description if the user taps the button. The app will also have some sliders so we can control the pitch, rate and volume. The interface will be pretty simple:

We'll have some private properties that we'll use later:

SwiftAVSpeechSynthesizerVars.swift

private let speechSynthesizer = AVSpeechSynthesizer()
private var pitch: Float = 1.0
private var rate = AVSpeechUtteranceDefaultSpeechRate
private var volume: Float = 1.0

At the top is our synthesizer, we initialise it immediately. A thing to note is that the 'rate' property has a lower and upper bound defined by 'AVSpeechUtteranceMinimumSpeechRate' and 'AVSpeechUtteranceMaximumSpeechRate'. So we'll have to set our rate slider to those min and max values:

SwiftAVSpeechSynthesizerRateSlider.swift

self.rateSlider.minimumValue = AVSpeechUtteranceMinimumSpeechRate
self.rateSlider.maximumValue = AVSpeechUtteranceMaximumSpeechRate
self.rateSlider.value = AVSpeechUtteranceDefaultSpeechRate

The slider actions are pretty simple, they only update the local variables:

SwiftAVSpeechSynthesizerActions.swift

@IBAction func onPitchChanged(_ sender: UISlider) {
    self.pitch = sender.value
}
    
@IBAction func onRateChanged(_ sender: UISlider) {
    self.rate = sender.value
}
    
@IBAction func onVolumeChanged(_ sender: UISlider) {
    self.volume = sender.value
}

At the top of the interface we have a 'Read' button. We want this button to be disabled while the synthesizer is speaking. We can easily do this in the delegate callbacks:

SwiftAVSpeechSynthesizerDelegate.swift

extension DetailsViewController: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {
        self.readButton.isEnabled = false
    }
    
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        self.readButton.isEnabled = true
    }
}

When our view is loaded we'll have the synthesizer read out the title of the movie. And the code couldn't be simpler:

SwiftAVSpeechSynthesizerSpeakSimple.swift

let utterance = AVSpeechUtterance(string: self.titleLabel.text ?? "")
self.speechSynthesizer.speak(utterance)

All you have to do is initialise the 'AVSpeechUtterance' with the string and that's it. We could have used an if-let here, but for the sake of clarity we didn't :) If you wish to control the pitch, rate and volume, you can certainly do so. You can see that in our next example:

SwiftAVSpeechSynthesizerSpeak.swift

@IBAction func read() {
    let utterance = self.getUtterance(self.detailsLabel.text ?? "")
    
    self.speechSynthesizer.speak(utterance)
}
    
private func getUtterance(_ text: String) -> AVSpeechUtterance {
    let utterance = AVSpeechUtterance(string: text)
    utterance.pitchMultiplier = self.pitch
    utterance.rate = self.rate
    utterance.volume = self.volume
    
    return utterance
}

Now, when you run the app and select a movie from the list, you should hear the title being spoken to you. Try and press the 'Read' button and see what happens (turn up your speakers):

You have your app speaking to you with just a few lines of code. Pretty cool, isn't it :)

Conclusion

A while back I wrote about the offline speech recognition and text to speech. That was a fun article to write and a great library as well. But sometimes you just don't need the speech recognition. In that case AVSpeechSynthesizer is perfect for you. The use cases for text to speech are numerous. If your app is constantly in the foreground and you don't want to distract the user while giving the user some feedback, you could simply read out the important info to the user. Navigation apps are a perfect example for this. But you could easily have a news reader app that reads the news articles to the user. With all that being said, you definitely shouldn't use the AVSpeechSynthesizer for accessibility as an aid to visually impaired. There's an API for that and you should use that instead. This was a short and sweet article. I hope you've learned something new today and that you had fun :) You can find a complete example in the GitLab repo along with the code snippets. Have a nice day :) ~D;

agostini.tech

Using AVSpeechSynthesizer

Text to Speech

The Example

Conclusion

More resources