Google Cloud Text-to-Speech Just Got Better

text-to-speech-min

At the beginning of this year (February to be exact), the Google cloud team announced that it was making a wide range of updates to the GCP AI for text-to-speech and speech-to-text functionality. These updates included the introduction of things like multiple new device profiles, multi-channel recognition and a host of exciting new languages synthesized by the introduction of the WaveNet AI system.

At the end of August, Google built on that announcement by announcing that the number of variants in Google Text-to-Speech has now increased by almost 70%. There are now a total of 33 languages covered by the Alphabet DeepMind WaveNet AI, including, Hindi, Greek, Finnish, Czech, Mandarin Chines, Vietnamese, and many more. All of these come with at least one of their own AI-generated voices. Additionally, Google has also brought another 76 new voices for companies to choose from when it comes to finding the right sound for their text-to-speech functionality. This means that Cloud Text-to-Speech from Google now comes with 187 voices in total.

There's also a total of 95 WaveNet voices to choose from too, which means you really have no reason not to find the ideal voice for your AI strategy.

Transforming Google Cloud Text-to-Speech

According to the Google Cloud Team, the decision to make these vital upgrades to the selection of voices and languages on Text-to-Speech means that developers will have more opportunities than ever. If you've been waiting for the perfect opportunity to add text-to-speech functionality into your app or website, now could be it. Developers finally have the tools they need to reach millions of people around the globe with the same intelligent application.

What's more, if you weren't already impressed by the huge upgrades that Google has made to its Text-to-Speech functionality -you're sure to feel differently in the near future. Google's experts have already announced that they intend to deliver a host of additional languages in the months to come too. This means that companies will be able to unlock text-to-speech for a broad range of new use cases. For instance, you can use the AI enhanced tool for:

Call center IVR and routing your customer conversations
Interacting with IoT devices around the home and in the case
Designing audio books and downloads

The update to the WaveNet range of voices is particularly exciting for today's developers. If you haven't encountered WaveNet before, it mimics things like intonation and tone, to create a more human voice. According to Google experts, WaveNet is excellent for delivering better voice experiences via "prosody" because it can identify tonal patterns in common speech.

WaveNet produces far more convincing voices than other AI speech generation options. According to Google, WaveNet has already bridged the quality gap with human speech options with improvements of 70% based on mean opinion score. Additionally, WaveNet is more efficient too. It runs on Google's tensor processing units, or TPUs, and accesses custom chips with circuits specifically optimized for AI model training. This means that a voice sample lasting one second takes around 50 milliseconds to design.

The Best Cloud Text-to-Speech Experience

The new selection of updates to Cloud Text-to-Speech solutions from Google helps the GCP to leapfrog other companies like Amazon Poly and Microsoft Azure when it comes to the number of AI devices available on offer. Google believes that this exciting upgrade will make its AI-enhanced offering the go-to option for companies that need to access intelligently generated voices for things like contact center IVRs and custom apps.

Now that customers are expecting to hear human-sounding voices when they interact with IoT devices, call into contact centres, or listen to audio versions of text-based content, it's important to have access to the right text-to-speech solution. Companies that can offer more realistic and human-sounding voices will be able to offer better experiences for their customers. What's more, thanks to Google, those experiences can now be provided in a wide variety of languages and countries.

Text to Speech from Google is free to use for businesses today up to the first million characters that you process with the GCP API.