How to use text to speech and speech to text conversion with Microsoft Speech Platform 11?

Download: MSP_11.zip - TTS and STT helper files
Download: Microsoft_Speech_Platform.zip - the whole example

This article gives information about how to use text-to-speech and speech-to-text with Microsoft Speech Platform (Version 11) in relation with Ozeki VoIP SIP SDK. After reading through this page you will be able to use it for reading out loud texts in different languages, and recognize incoming voices. Below you can see what you will need for creating your own solution:

text to speech conversion
Figure 1 - Text to speech conversion

Download: Microsoft Speech Platform - Software Development Kit (SDK) (Version 11)
Download: Microsoft Speech Platform - Runtime (Version 11)
Download: Microsoft Speech Platform - Runtime Languages (Version 11)

What is Microsoft Speech Platform 11 used for?

Text to speech conversion means that a program reads up the text you have typed in. This can be useful when for example a mute person wants to communicate with voice calls. Text to speech conversion can also be used in interactive voice response (IVR) systems if you want to have the IVR tree navigation information read out by the computer.
You can learn more about this conversion from the How to use TextToSpeech article.

Speech recognition can be used for a lot of thing. It basically works with some standard algorithms that recognize words. The most essential usage of this technology is when you want to communicate with a deaf person using voice call. You talks into the microphone and at the other end the deaf user can see what you said in written form.
You can learn more about this feature from the How to implement Voice Recognition article.

speech too text conversion
Figure 2 - Speech to text conversion

How to use Microsoft Speech Platform 11 in C#?

There are a few important steps which must be made before you can start to develop you softphone (or other application):

  1. Download and install the Microsoft products, listed above. Please note that, at the language selection languages are being separated with "SR" (Speech Recognition) and "TTS" (Text-To-Speech) tags in order to their purposes.
  2. Download the MSP_11.zip file, which contains two classes:
    • MSSpeechPlatformSTT: a class for voice recognition
    • MSSpeechPlatformTTS: a class for Text-To-Speech
    You will have to add these files to your project.
  3. Create a new Visual Studio project, and:
    • Add reference to Ozeki VoIP SIP SDK.
    • Add reference to Microsoft.Speech.dll similar way you did with ozeki.dll.
    • Add the above downloaded classes to the project.
    • If you are using 64 bit edition, make sure about at the project properties "Build" tab, the "Prefer 32-bit" checkbox is not checked, otherwise it may couse errors.

After these steps, you can begin to develop your softphone. Since you are already familiar with Text-To-Speech and Speech-To-Text implementations, only the new steps and taks will be introduced here.

Text-To-Speech: in the case of this conversion, you have to use the TextToSpeech object's AddTTSEngine() method, to pass a new instance of the MSSpeechPlatformTTS to it. After that, you can reach the available voices by calling the object's GetAvailableVoices() method, and you can set the selected one with the ChangeLanguage() one.

Speech-To-Text: you should set a new instance of MSSpeechPlatformSTT as new engine to the SpeechToText object with its ChangeSTTEngine() method. After that, you can reach all the available recognizers with the object's GetRecognizers() method, and you can set the selected one with the object's ChangeRecognizer() method.

Usage example of Microsoft Speech Platform 11 in C#

´╗┐using System;
using System.Threading;
using Ozeki.Media;

namespace Microsoft_Speech_Platform
    class Program
        static Speaker _speaker;
        static Microphone _microphone;
        static MediaConnector _connector;
        static TextToSpeech _tts;
        static SpeechToText _stt;

        static void Main(string[] args)
            _microphone = Microphone.GetDefaultDevice();
            _speaker = Speaker.GetDefaultDevice();
            _connector = new MediaConnector();



            while (true) Thread.Sleep(10);

        static void SetupTextToSpeech()
            _tts = new TextToSpeech();
            _tts.AddTTSEngine(new MSSpeechPlatformTTS());

            var voices = _tts.GetAvailableVoices();
            foreach (var voice in voices)
                if (voice.Language.Equals("en-GB"))
                    _tts.ChangeLanguage(voice.Language, voice.Name);

            _connector.Connect(_tts, _speaker);
            _tts.AddAndStartText("Hello World!");

        static void SetupSpeechToText()
            string[] words = {"Hello", "Welcome"};
            _stt = SpeechToText.CreateInstance(words);
            _stt.WordRecognized += stt_WordRecognized;
            _stt.ChangeSTTEngine(new MSSpeechPlatformSTT());

            var recognizers = _stt.GetRecognizers();
            foreach (var recognizer in recognizers)
                if (recognizer.Culture.Name == "en-GB")

            _connector.Connect(_microphone, _stt);

        static void stt_WordRecognized(object sender, SpeechDetectionEventArgs e)
            Console.WriteLine("Word recognized: {0}", e.Word);

Related Pages