Tell me a story - Using Amazons polly service to create Text to Speech audio and store in an object store.

A friend of mine recently gifted me my very first Audio book. I read quite a bit but the book in question was only available in audio format. It took me a while, 16hrs and 57 mins spread over long walks, dish washing, some hoovering and some passive gaming but i finally finished the book. I really enjoyed it and likely will 'read' an audio-book again.

The book was called "Heavens River - by Dennis E. Taylor" great book if you have read the others in the series.

With this in mind i've taken a look at "Polly" Amazons text to speech engine.

Amazon Polly overview

Its available on the amazon free tier and isn't crazy expensive for playing around with.

Amazon Polly Pricing

The idea is you send the Polly service a block of text and it returns a stream of audio. There are optimizations that can be built in to correct some of the oddness that typically comes with text to speech engines like polly. The first one is a programmable "Lexicon" the lexicon is a way to add some nuances to the speech pronunciation. The Lexicon is made via a toolset called the "Speech Synthesis Markup Language" which is an xml like structure to define different ways that you want the speech engine to output words and phrasing.

The second big feature they have is a wide array of voices with some being "Neural" ones. The Neural voices try to read the text the way a human would. As such these ones are more expensive.

Getting started with Polly is pretty easy once you have an Aws account.

To run the script below you'll need an amazon account with a token keypair setup . Then you will need both the boto3 python library and requests.

To install boto3 use pip3.

pip3 install boto3

The requests requirement is really just because i wanted to demonstrate reading from a web resource and sending the text directly to polly. In this case the test.txt is just the lyrics to the fresh prince of bel air song.

import boto3
import requests

accesskey="<your amazon access key goes here>"
secretkey="<your amazon secret access key goes here"
polly_client = boto3.Session(aws_access_key_id=accesskey,aws_secret_access_key=secretkey,region_name='eu-west-1').client('polly')
#

tellme = requests.get("https://tlokko.cloud.caringo.com/polly/test.txt",)


response = polly_client.synthesize_speech(VoiceId='Joey',
                OutputFormat='mp3',
                Text=tellme.text)



file = open('speech.mp3', 'wb')
file.write(response['AudioStream'].read())
file.close()

The final section of the script is to save the file locally as an mp3 file. Its a fun toy and i can see how you would use something like it in the future to make audiobooks for the masses. I think you'll still miss out on some of the character that a voice actor adds to something like this. Note , the example above has zero optimizations that the service advertises , no SSML or Neural learning voices here. This is a link to the sample.

Amazon Polly Audio

Sample Text used


By Tony Lokko