Language Technologies Institute Colloquium

  • Posner Hall A35 and Zoom
  • In Person and Virtual ET
  • TAKUYA YOSHIOKA
  • Principal Research Manager
  • Micorosoft Cognitive Services Research Group
  • Microsoft Research
Colloquium

New Approaches to Natural Conversation Transcription: Continuous Speech Separation and End-to-End Speaker-Attributed Speech Recognition

Modern speech recognition systems can accurately transcribe pre-segmented utterances recorded in acoustically moderate environments, and they are interwoven in our daily lives. Yet, there are many challenges that must be addressed for the speech recognition technology to become usable more broadly. Audio recordings of unsegmented natural conversations are far more complex in terms of acoustics, linguistics, and turn-taking dynamics, where multiple speakers may talk over each other. Transcribing the natural conversations and attributing the words to the corresponding speakers is still a challenging problem, requiring improvements in speech separation, speech recognition, and speaker diarization as well as orchestrating the various pieces. Our research group is adopting a systems approach to build individual technologies and examine their impacts on the end transcription quality. This talk will describe our findings, with a focus on two lines of research that have emerged from our holistic approach, namely continuous speech separation and end-to-end modeling of speaker attributed speech recognition.

Takuya Yoshioka is a Principal Research Manager at the Microsoft Cognitive Services Research Group, leading its effort for developing natural conversation processing technologies for both human communication and machine transcription. At Microsoft, he developed the continuous speech separation approach and contributed to the development of Conversation Transcription of Microsoft Azure Cognitive Services, which combines speech recognition and speaker diarization and is powering various Microsoft products, including Teams meeting room devices and the transcription feature of Word. Prior to joining Microsoft, he worked at NTT Communication Science Laboratories, where he developed WPE, a popular dereverberation algorithm, and led its effort in the CHiME-3 challenge, where the team won the first place by a large margin.

The LTI Colloquium is generously sponsored by Abridge.

In Person and Zoom Participation. See announcement.

For More Information, Please Contact: 
Keywords: