MeetDot: Videoconferencing with Live Translation Captions

The recent pandemic created videoconferencing an indispensable element of our doing work lives.

In buy to assist individuals, who talk various languages, successfully connect, a the latest paper on arXiv.org proposes a videoconferencing alternative with dwell translation captions.

Picture credit rating: Mbrickn through Wikimedia (CC BY four.)

There, members can see an overlaid translation of other participants’ speech in their most popular language. The incoming speech signal is processed in a streaming mode, transcribed in the speaker’s language, and made use of as enter to a machine translation program. The researchers use many functions to empower a better person working experience as easy pixel-intelligent scrolling of the captions or fading text that is probable to adjust.

A extensive analysis suite is implemented to correctly compute metrics like latency, caption flicker, and accuracy and encourage quickly development according to these metrics.

We present MeetDot, a videoconferencing program with dwell translation captions overlaid on display screen. The program aims to aid discussion amongst individuals who talk various languages, therefore lessening conversation obstacles amongst multilingual members. Currently, our program supports speech and captions in four languages and brings together automated speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation tactic to translate the streamed speech, ensuing in caption flicker. On top of that, our program has quite demanding latency specifications to have suitable phone high quality. We put into practice many functions to enhance person working experience and minimize their cognitive load, such as easy scrolling captions and lessening caption flicker. The modular architecture enables us to combine various ASR and MT solutions in our backend. Our program presents an built-in analysis suite to enhance essential intrinsic analysis metrics such as accuracy, latency and erasure. Ultimately, we present an modern cross-lingual word-guessing video game as an extrinsic analysis metric to measure close-to-close program general performance. We program to make our program open up-resource for investigate purposes.

Investigate paper: Arkhangorodsky, A., “MeetDot: Videoconferencing with Dwell Translation Captions”, 2021. Hyperlink: https://arxiv.org/abdominal muscles/2109.09577