written by
Becca Calloway

Coming Soon: Real-Time Audio Translation in Microsoft Edge

real-time audio translation 3 min read

Real-Time Audio Translation

Microsoft is previewing a new feature in its Edge browser, and this one is truly exciting: real-time audio translation on videos.

Language has always created barriers in business. This is especially noticeable as our world has shrunk and become more globalized. A couple of decades ago, language barriers rarely mattered until a business reached a certain scale and went international. But today, we can all access nearly anything, nearly instantly — including a ton of content that isn’t in the language(s) we speak.

This new feature could make that kind of video content instantly accessible to you. And it can make English-language content instantly accessible for others who don’t speak or don’t prefer English.

How AI Audio Translation Works

The name is long and not all that clear, so allow us to clear it up for you. This feature uses AI to do ## things in quick succession:

AI analyzes the audio track of a video.
Generative AI breaks down what that audio is about and ties concepts to times in the video.
AI translation then translates that breakdown into the target language.
Audio processing GenAI tools synthesize the translated information into human-like vocal speech, keeping it in sync with the pace of the video.
Edge places that new, translated audio track as an alternate track and mutes the original.

The end user then watches the video with a new, AI-generated audio track automatically playing a live translation in natural-sounding speech, almost as if the video was professionally overdubbed.

How to Use AI Audio Translation Previews

When you navigate to a supported site (like YouTube), Edge will create a small floating bar with some options related to this feature (once you enable it in settings). You simply tell Edge your preferred language, and then eligible videos in another language will automatically be pulled into this workflow.

Current Limitations

This is a pretty cutting-edge and resource-intensive feature, so there are some limitations for now. First, you’ll need a reasonably powerful computer with at least 12 GB of memory. It seems that Microsoft is doing at least some of the processing for this on-device, which means you need enough power to make it happen.

Second, at the start, only three languages are supported: English, Spanish, and Korean. We’re sure this will expand over time.

Third, as with any GenAI tool, there may be occasional factual mistakes or mistranslations.

Why This Matters For You (and For Us)

Not every small business is regularly dealing in multi-lingual training, so it’s fair to ask the question: why does this new feature matter?

Asheville and this part of the Carolinas are an increasingly global locale. It’s increasingly common to find tourists, residents, and business professionals whose native language is not English. This may be true for parts of your workforce.

We can think of several small-scale ways this new tech could really help small businesses like yours (and like ours):

Use free screen capture tools to record basic computer training videos in English, then encourage workers to play them with their native language selected. They will hear a real-time, natural-sounding translation that aligns with the screen/video content.
Instantly understand video content from overseas suppliers or partners.
Access industry insights or training materials recorded in another language, with zero barrier.

Perhaps even more exciting is how this tech can open doors in business that you previously would have considered impossible. Ordering products from a non-English supplier website, onboarding global outsourced talent (without worrying about translators or localization), and many other tasks could now be possible that would have been completely out of reach or cost-prohibitive before.

That’s it for this week. Got questions about how to use live translation, or about what kind of hardware you need to be ready for all these new AI-driven features in Windows 11 and Microsoft 365? Reach out to our team anytime — we’re here to help.

real-time audio translation