Brain

Dynamically Modulating Spoken Language

Working with the text to speech engine is a lot of fun and has lots of potential for being used in all sorts of ways.  It has one problem though…

It sounds horrible.  It sounds like its from a bad 80’s sci-fi movie.

Sine waves different frequencies

Now, TTS voices have come a long ways since the 80’s.  I realize that the default voice included with Windows 7 is light years better than what would have been possible even just a few years ago.  But I wanted my project to have a good voice that is understandable and just plain sounds good.

I started looking at different voices available online.  There are a lot out there.  I wasn’t sure if all of them would actually work with the basic Windows TTS setup or not.  And a lot of them are quite expensive.  I finally came across several blog posts that talked about some add-on voices that work with Windows and are fairly inexpensive and still sound good.

Ivona has a bunch of very competitively priced voices that all sound very good (http://www.ivona.com/us/voices/).  They have a demo you can download so I figured that would be the smartest choice to start with.  Got home from work, installed the demo and tried them out with my current demo script.  To my surprise everything worked without a hitch and now my demo sounded quite a bit more like a real person or digital assistant.  Being very pleased with the quality of the voices I went through all the voices they had available and chose one.

IVONA 2 voice Salli is the voice I ended up selecting (http://www.ivona.com/us/products/voice-salli/).  To me it sounded the clearest and very understandable.  I ran the demo version a bit longer with a bunch of strings of text to see how it sounded and finally broke down and just purchased it.  At the time it was only $45 (price may have changed since then).

Now that I have a decent voice for my digital assistant, I started looking for a way to test it out with a bunch of text.  Currently all I have been doing is listening to generated audio… but I wanted to generate a lot of audio and not have to sit at my desk.  I started looking into ways to generate the audio and output it to an audio file vs. having to just play it out through the computer sound system.

After a lot of looking I found some links that all had hints and partial ideas/solutions and finally one that had exactly what I needed.

The jampal mp3 library (http://jampal.sourceforge.net/ptts.html) ptts file is exactly what I wanted. It is simple and supported input from a text file as well as command line parameters and also offered some additional control on how to output the resulting speech audio.

Now that I have a way to convert text to audio files, I started looking for a good text source to try.  I looked around my computer and tried it with a few random license.txt files and other readme files.  None of those are that interesting.  Then I remembered a great source for large amounts of text. Project Gutenberg!  I quickly navigated my way there and selected a text from their library of content.  Alice In Wonderland was a perfect test.

Below I’ve attached 2 test files and the excerpt that I tested with originally as I didn’t want to wait the first time for the full text to be converted. I was also unsure of how much disk space it would take.

[list-attachments]

As you can easily hear, the Salli voice is hands down easier to listen to and sounds much more realistic in most cases.  Of course this is also a very raw example as I did not put any effort into processing the text and adding various markup (http://www.w3.org/TR/2001/WD-speech-synthesis-20010103/) to enhance the quality of the output.