First Speech Recognition and Text To Speech Test

After my initial burst of motivation I successfully setup a quick script using Python, pywin32 and pyspeech (python 3 version) to both react to specific voice commands and also to respond to them with basic spoken responses.

First hurdle I’ve run into is using Python 3.x.  Not many libraries have been ported over yet to it.  But I’m usually one to stick with the latest and greatest no matter how challenging it may be.  So.. with a flurry of keystrokes and mouse clicks I happened upon the following blog post ( with exactly what I needed. Also, the pyspeech examples page had some good comments to additionally add and fix issues I had with getting it up and running (

With everything installed I quickly copied and pasted the examples from the pyspeech page and soon had my first script listening for voice commands and responding with simple responses like “Hello” and “Goodbye”. Not easily satisfied I took it a step further and started building functions to respond to requests for the current time and date.  These weren’t difficult as they used easily available information.

Next I quickly tied in weather requests for current conditions and the forecast.  This was accomplished fairly easily using pywapi and the google weather api.

Next I figured it wouldn’t be to hard to start linking this to my music library.  More specifically, my 3 Logitech Squeezebox’s I have throughout my house.  I quickly found the PyLMS library to quickly add in support for connecting to my squeezebox’s.  With a little more coding I quickly had my script accepting command to start the music, stop it, change to the next song and also change the volume.  Ultimately I’ll probably have my squeezebox’s setup to broadcast the audio playing off my computer, so the responding voice from Brain can be heard throughout the house.  But for now its a pretty cool demo.

Here are some of the python listeners I built for voice commands:

speech.listenfor(['stop program', 'goodbye computer', 'goodbye jarvis', 'goodbye brain'], stopprogram)
speech.listenfor(['Computer', 'Brain', 'Jarvis', '* Computer', '*+ Brain', '*+ Jarvis'], conversationstartcallback)
speech.listenfor(['what time *+', 'what is the time', 'what *+ time *+', '*+ what time *+', "what's *+ time", "what's *+ time *+", "*+ what's *+ time *+"], whattimeisit)
speech.listenfor(['*+ what *+ weather *+', '*+ what is *+ weather *+', "*+ what's *+ weather *+", "What's *+ weather"], weathernow)
speech.listenfor(['*+ what *+ date *+', '*+ what is *+ date *+', "*+ what's *+ date *+", "what's *+ date *+", "what's *+ date", "what day is it"], whatdateisit)
speech.listenfor(["what *+ temperature", "what's *+ temperature"], weathernowtemp)
speech.listenfor(["*+ What's *+ forecast", "What's *+ forecast", "What *+ forecast", "*+ What *+ forecast", "What's *+ forecast *+"], weatherforcast)
speech.listenfor(["*+ mute music", "*+ mute audio", "pause music", "*+ pause music", "*+ pause audio", "stop music", "*+ stop music", "*+ stop audio", "music stop", "*+ music stop"], audiopause)
speech.listenfor(["play music", "play audio", "*+ play music", "*+ play audio", "*+ resume music","resume music","resume audio", "*+ resume audio", "*+ audio start", "*+ music start"], audioplay)
speech.listenfor(["volume *+"], volumechange)
speech.listenfor(["next track", "music next", "next song", "*+ next track", "*+ go to *+ next track", "*+ next song", "*+ go to *+ next song"], audionext)
speech.listenfor(['*+ what *+ track *+', '*+ what *+ song *+', "what is this music", "what is playing", "what is this song", "what is this track", "what song is this", "what track is this"], musicinfo)

Definitely not the most efficient right now, but it worked for the demo.  I think I’ve decided on a different solution to processing the text for commands, but that will be another post.

I’ve attached a short video demo of the test I created.  The buzzing in the middle is due to filming with my phone and getting a text.  As you’ll see, relying on specific non-natural language commands can result in easily forgetting the correct command or term.

[FMP width=”640″ height=”360″][/FMP]

I have also uploaded the file that I currently have installed and that I have modified slightly to get it working. Below are the attached files for this post:



  • Jacob Valenta
    May 22, 2012 - 5:52 pm | Permalink

    I was the one that wrote the python 3.2 port for pyspeech. As I recall it could only handle TTS and crashed when tying to start recognition.

    Did it magical work for you? Or did you do
    Something? I see that ou reference some other issuers on getti it up and running…is this what you are referring to?

    • May 22, 2012 - 6:05 pm | Permalink

      there were a few errors I found through other peoples comments… like line 176:

      return any(returns) # was at least one listening?

      I changed to:

      #return any(returns) # was at least one listening?
      for x in returns:
      ____if x:
      ________return True
      ________return False

      And I also found some tab formatting issues I think it was with the file. But other than that… installed with windows 7 64bit, python 3, pywin32, and this version of pyspeech. So I’m not sure if maybe I just got lucky.

    • May 22, 2012 - 6:16 pm | Permalink

      I’ll upload and attach my modified and currently installed file. Don’t remember if I modified anything else. It’s been a few days since I got it up and running.

  • Leave a Reply

    Powered by: Wordpress