SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System
I’ll share system design and research I’ve been working on recently as part of the Speaker, Voice & Language team at Google NYC. It has practical implications, allowing one speaker recognition model to be used for recognizing speakers in 46 languages and generalizes well to unseen languages. Apart from reducing training time and engineering cost, we’ve also proposed a triage mechanism to reduce computational cost and improve latency by combining Text dependent and Text independent components. The paper was accepted to the Interspeech 2021 conference.
Bio
Róża Chojnacka is a research software engineer at Google. She earned her BSc and MSc at University of Warsaw, with a thesis in Semantic Search. She was a Tech Lead manager at Good Lens, contributing to projects such as Apparel detection in photos, Style Match, Spatial and Semantic AR Autocompletion or a viral Art Selfie. Currently works on Automatic Speech Recognition. Her work on computer vision, augmented reality and speech resulted in many patents. Occasionally writes at roza-chojnacka.medium.com.