Published on November 16, 2023, 3:29 pm

At the recent Microsoft Ignite 2023 event, an unexpected tool was unveiled that has the ability to create photorealistic avatars and animate them to say things a person never actually said. Known as Azure AI Speech text-to-speech avatar, this feature is now available for public preview. Users can generate videos of an avatar speaking by uploading images of the desired person and writing a script. The tool then trains a model to drive the animation while a separate text-to-speech model reads the script aloud.

According to Microsoft, this new tool allows users to efficiently create videos for various purposes such as training videos, product introductions, customer testimonials, and more. Additionally, this technology enables the creation of conversational agents, virtual assistants, chatbots, and other applications. Avatars can speak in multiple languages and can even leverage AI models like OpenAI’s GPT-3.5 to respond to off-script questions from customers.

While the potential applications of this tool are vast, Microsoft acknowledges the potential for abuse. Similar avatar-generating technologies have been misused in the past to produce propaganda and false news reports. In light of these concerns, most Azure subscribers will only have access to prebuilt avatars initially. Custom avatars are currently limited to certain use cases with restricted access through registration.

The introduction of such a tool raises ethical questions regarding the use of actors’ likenesses without proper compensation or notification. While Microsoft has not explicitly addressed this issue, it requires customers using custom avatars to obtain explicit written permission and consent from avatar talent. They also require customers to include disclosures stating that the avatars have been created with AI.

In addition to this avatar tool, Microsoft’s Ignite event also showcased another generative AI capability called Personal voice within its custom neural voice service. Personal voice allows users to replicate their own voice using just a one-minute speech sample as an audio prompt. This feature is presented as a way to create personalized voice assistants, dub content into different languages, and generate bespoke narrations for stories, audiobooks, and podcasts.

To avoid potential legal challenges, Microsoft has implemented certain restrictions. They prohibit the use of prerecorded speech and require users to give explicit consent in the form of a recorded statement. Microsoft verifies whether the statement matches other training data before allowing customers to synthesize new speech using personal voice. Access to this feature is currently gated behind a registration form, and customers are restricted from using personal voice with open-ended or user-generated content.

While Microsoft didn’t initially provide details on compensation for actors contributing their voices or plans for AI-generated voice watermarking, they later clarified that watermarks would be automatically added to personal voices. However, utilizing Microsoft’s watermark detection service requires approval from the company.

As technology continues to advance, tools like Azure AI Speech text-to-speech avatar and Personal voice offer exciting possibilities but also raise important ethical considerations. With proper safeguards and regulations in place, these generative AI capabilities have the potential to revolutionize various industries by adding a new level of personalization and creativity.


Comments are closed.