Shows how to use SpeechToText Google API

OZEKI OZEKI VOIP SIP SDK
High performance VoIP SDK for .Net developers

E-mail: info@voip-sip-sdk.com

Quick start

Quick start guide Get started on using Ozeki VoIP SDK

Example projects Check out our example projects

Sitemap voip-sip-sdk.com sitemap

Download Download Ozeki VoIP SDK

Installation steps A step-by-step guide on installing the SDK

Licensing Read about Ozeki VoIP SDK licenses
Download
Manual

Package contents Read about the SDK package contents

Data sheet Check out Ozeki VoIP SDK datasheet

On-line manual Read our on-line manual on VoIP technologies

Developers guide Read our developers guide on using the SDK

API reference book Ozeki VoIP SDK class library documentation

Softphone development Read about VoIP softphone development

PBX development Read about VoIP PBX development

Callcenter & CRM development Read about call center and CRM development

Webphone development Read about VoIP webphone development

Mobile development Read about Mobile VoIP app development
Tutorial

Course 1 How to develop a softphone in C#

Course 2 How to build a VoIP PBX in C#

Course 3 How to create an IVR system in C#
Support

Request support Request technical support

On-line chat Ask for live help

Training at Ozeki Learn Ozeki VoIP SDK fundamentals

FAQ Frequently Asked Questions

E-mail Write us an email

Telephone Contact us on phone

Office address Office location
How to buy
Contact

Softphone Development

Call Center Development

VoIP CRM Integration

Alert systems

IP Camera

Mobile phones and platforms

Ozeki VOIP SIP SDK

< Video codecs | Install Google Cloud Speech >

How to convert Speech to Text using C#
with the help of Google Cloud Platform

Download:

google-stt.zip

This example demonstrates how to implement the speech-to-text feature in c#, which is able to convert audio data to text messages. The conversion is based on the powerful Google Cloud Speech API. The converted data can be an audio file, audio stream or real time human voice as well. Any audio supported by Ozeki VoIP SIP SDK is accepted. To understand this article, please read the following tutorial as well:
How to configure Google Cloud Platform to your Ozeki VoIP SDK projects

You can choose from all Google translation API supported languages.

An internet access is required. To try this example, you need to have Ozeki VoIP SIP SDK installed, and a reference to OzekiSDK.dll should be added to your Visual Studio project.

speech to text conversion — Figure 1 - Speech to Text conversion

What is Speech-to-Text used for?

A speech-to-text (STT) system converts normal speech from multiple languages into text. Users can set speech inputs and save them as text files, so later on the files can be read or analysed. You can use the text results for several purposes. For example you could store phone conversations in written forms. You can also store texts in an SQL database, forward them in e-mail or SMS or search keywords in them.

Speech-to-Text refers to the ability to listen to an audio stream and converting it to a text message. STT engines with different languages, dialects and specialized vocabularies are available through the Google Cloud Speech API. Check if your required language is supported.

How to implement Google speech-to-text feature
in your Ozeki VoIP SIP SDK project?

First you will need to register to the Google Cloud Platform, than you need to set the API access credentials on your operating system and install the Google Cloud Speech SDK. After the installation is finished you will need to reboot your computer to test the example codes in Ozeki VoIP SIP SDK. Here is a detailed tutorial on how you can set up and try your examples.

The sample projects can be downloaded from here (google-stt.zip). Each project contains a basic example that combines the functionality of our SDK and the features provided by the Google Cloud Speech API, presented in a simple C# class, GoogleSTT. The GoogleSTT class demonstrates how to implement Speech-to-Text functionality with OzekiSDK powered by the robust Google Cloud Speech API. A wide variety of languages can be given as parameter (e.g. an instance is created at line 25 to 27 in the C# example below). This instance can be attached to the call through the correct sender object (line 29). The instance in the current example can recognise United Kingdom English speech arriving through the microphone and converts it to text messages.

Microphone signals are converted to text
in C# using the Google Cloud Speech API

'Program.cs'

using Ozeki.Media;
using System;

namespace Google_Speech_To_Text_V1
{
    class Program
    {
        static MediaConnector connector;

        static Microphone microphone;

        static GoogleSTT googleSTT;

        public static void Main(string[] args)
        {
            Console.OutputEncoding = System.Text.Encoding.UTF8;

            connector = new MediaConnector();
            microphone = Microphone.GetDefaultDevice();
         
            var format = new WaveFormat(48000, 16, 1);

            microphone.ChangeFormat(format);

            googleSTT =
            new GoogleSTT(GoogleLanguage.English_United_Kingdom,
            							format.AsVoIPMediaFormat());

            connector.Connect(microphone, googleSTT);

            microphone.Start();

            googleSTT.Start();

            Console.WriteLine("Speak !!");

            Console.ReadLine();

            Console.WriteLine("Disconnect");

            connector.Disconnect(microphone, googleSTT);
           
            Console.WriteLine("Google dispose");

            googleSTT.Dispose();
            googleSTT = null;

            Console.WriteLine("microphone dispose");

            microphone.Dispose();
            microphone = null;

            Console.WriteLine("connector dispose");

            connector.Dispose();
            connector = null;
        }
    }
}

'GoogleSTT.cs'

This 'GoogleSTT.cs' example class is capable to provide Speech-to-Text functionality through the Google Cloud Speech API. You can write classes similar to 'GoogleSTT.cs'.

From line 80 to 86 you can see the results of the speech-to-text conversation. The 'result.Alternatives' is a list of objects containing every possible result and the confidence level of each result. When speech is converted by the Google Cloud servers, the servers can understand speech as multiple texts and render a confidence level to each them from 0.0 to 1.0.

This example selects the text with the biggest confidence value and writes it on the console.

using Ozeki.Media;
using Google.Cloud.Speech.V1Beta1;

using System;
using System.Threading.Tasks;
using System.Threading;
using System.Linq;

namespace Google_Speech_To_Text_V1
{
    class GoogleSTT : AudioReceiver
    {
        SpeechClient speech;
        SpeechClient.StreamingRecognizeStream streamingCall;

        Task printResponses;

        AudioFormat _format;

        private string _languageCode;

        private GoogleLanguage _language;
        public GoogleLanguage Language
        {
            get { return _language; }
            set
            {
                _language = value;
                _languageCode = _language.GetCode();
            }
        }

        public GoogleSTT(string languageCode)
            : this(GoogleLanguageExt.GetGoogleLanguageFromCode(languageCode),
            		new AudioFormat())
        { }

        public GoogleSTT(GoogleLanguage languageCode, AudioFormat format)
        {
            Language = languageCode;

            SetReceiveFormats(format);

            _format = format;

            Init();
        }

        private void Init()
        {
            speech = SpeechClient.Create();

            streamingCall = speech.StreamingRecognize();

            streamingCall.WriteAsync(
               new StreamingRecognizeRequest()
               {
                   StreamingConfig = new StreamingRecognitionConfig()
                   {
                       Config = new RecognitionConfig()
                       {
                           Encoding =
                           RecognitionConfig.Types.AudioEncoding.Linear16,
                           SampleRate = _format.SampleRate,
                           LanguageCode = _languageCode,
                           MaxAlternatives = 5
                       },
                       InterimResults = true,
                   }
               });

            printResponses = Task.Run(async () =>
            {
                while (await streamingCall.ResponseStream.MoveNext(
                    default(CancellationToken)))
                {
                    foreach (var result in streamingCall.ResponseStream
                        .Current.Results)
                    {
                        if (result.IsFinal)
                        {
                            var top =
                            result.Alternatives.OrderBy(x => x.Confidence).First();

                            Console.WriteLine(top.Transcript);
                        }
                    }
                }
            });
        }

        object writeLock = new object();

        public bool IsRunning { get; private set; }

        public void Stop()
        {
            IsRunning = false;
        }

        public void Start()
        {
            IsRunning = true;
        }

        protected override void OnDataReceived(object sender, AudioData data)
        {
            if (!IsRunning) return;

            lock (writeLock)
            {
                var request = new StreamingRecognizeRequest();
                request.AudioContent = Google.Protobuf.ByteString
                            .CopyFrom(data.Data, 0, data.Data.Length);

                try
                {
                    streamingCall.WriteAsync(request).Wait();
                }
                catch (Exception e)
                {
                    streamingCall.WriteCompleteAsync();
                    Init();
                }
            }
        }

        protected override void Dispose(bool disposing)
        {
            Stop();

            if (printResponses != null)
            {
                printResponses = null;
            }

            if (streamingCall != null)
            {
                streamingCall = null;
            }

            if (speech != null)
            {
                speech = null;
            }

            base.Dispose(disposing);
        }
    }
}

More information

< Video codecs | Install Google Cloud Speech >

Home > Product information > Online manual > Developers Guide > Softphone Development > Basic softphone examples > Google SpeechToText

Page: 7604 | 18.119.255.94 | 79.99.42.43 | Login

Privacy | Terms of use

How to convert Speech to Text using C# with the help of Google Cloud Platform