Microsoft has been on a mission to incorporate artificial intelligence (AI) into its products, ranging from consumer-focused Microsoft Office to business-focused Copilot 365. At its most recent Microsoft Ignite 2023 conference, Microsoft revealed numerous new AI-based products, including Copilot Studio and Windows AI Studio, and the renaming of Bing Chat to just Copilot. The company also launched Azure AI Speech, a text-to-speech avatar program that can assist in creating talking avatar videos. It is now being rolled out in public preview. Learn everything there is to know about this new feature.
Speech Recognition on Microsoft Azure
The Azure AI Speech text-to-speech avatar converts text into a 2D video of a human-like speaking avatar. According to Microsoft, the Neural text-to-speech Avatar models are trained by deep neural networks using human video recording samples, and a text-to-speech voice model generates the avatar’s voice. Text inputs can create instructional films, product introductions, client testimonials, and other digital interactions.
How does it work?
The text analyzer, TTS audio synthesiser, and TTS avatar video synthesiser are the Azure AI Speech avatar content-generating pipeline processes. First, the user provides text input, which the text analyzer delivers as a phoneme sequence. The TTS audio synthesiser then predicts and synthesises the acoustic properties of the incoming text. Text-to-speech voice models power both of these features. Finally, the neural text-to-speech avatar model predicts the image of lip sync with acoustic features, resulting in the synthetic movie.
The Azure AI Speech service is divided into two tiers. The first is a ready-made neural voice with natural out-of-the-box voices. Users can access it by creating an Azure account and subscribing to the Speech service. Then, users can utilise the Speech SDK or go to the Speech Studio portal to choose from a library of prebuilt voices.
Microsoft, on the other hand, provides the ability to build bespoke neural voices. This function is known as Custom Neural Voice. It is a simple self-service tool for developing a natural brand voice, with restricted access for responsible use. Microsoft presently provides only limited access to this capability.
The AI-Powered Communication of the Future
The TTS avatar from Microsoft is just the beginning of a new age of AI-powered communication. We should expect to see more complex and lifelike avatars that can interact with us in previously thought-to-be-impossible ways as AI technology progresses. These avatars will play a growing part in our lives, assisting us in connecting with people, learning new things, and experiencing the world in novel and exciting ways.