Siddharth Dalmia

Hi! I am Sid [s-ih-d]. I am a Research Scientist at Google Deepmind, where I am working on Gemini, in particular building reliable evaluations for audio and long-context capabilities.

I graduated with a PhD from the Language Technologies Institute of School of Computer Science at Carnegie Mellon University. I was fortunate to be advised by Florian Metze (now at Meta), Alan W Black and Shinji Watanabe.

During my Ph.D., I worked on making sequence models amenable to resource-constrained scenarios (both data and compute) by exploiting the compositionality principles of system building, like task-simplification, reusability, transferability, and data-pooling, into sequence models used for various speech and language tasks.

I have also spent time doing research at Google Brain (2021), Amazon AWS AI (2020), Facebook AI Research (2019, 2020) and INRIA (2015, 2016). There I was fortunate to work under many amazing mentors that have helped me evolve as a researcher: Yu Zhang, Ron J Weiss, Tara Sainath and Alexis Conneau (Google Brain); Yuzong Liu, Srikanth Ronanki and Katrin Kirchhoff (Amazon AWS AI); Mike Lewis and Abdelrahman Mohamed (Facebook AI Research); Emmanuel Vincent and Irina Illina (INRIA).

I recieved my undergraduate degree in Computer Science from BITS, Pilani (Hyderabad Campus) in 2016.

The best way to reach me is through email - sdalmia[at]cs.cmu.edu

CV /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo
Publications

Outdated! Sorry, I am working on updating this. Meanwhile, please follow google scholar for my latest work.

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding
Siddhant Arora*, Alissa Ostapenko*, Vijay Viswanathan*, Siddharth Dalmia*, Florian Metze, Shinji Watanabe, Alan W Black
The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)

BibTeX / Website / PDF

Differentiable Allophone Graphs for Language Universal Speech Recognition
Brian Yan, Siddharth Dalmia, David Mortensen, Florian Metze, Shinji Watanabe
The 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH 2021)

BibTeX / Code / PDF

ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma*, Brian Yan*, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

BibTeX / Code / PDF

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe
2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)

BibTeX / Code / PDF

Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation
Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, Shinji Watanabe
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2021)

BibTeX / Dataset / PDF

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering
Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black
16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

BibTeX / Project Page / Data / Code / PDF

Transformer-Transducers for Code-Switched Speech Recognition
Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff
46th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

BibTeX / PDF

On Long Tailed Phenomena in Neural Machine Translation
Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze
Findings of the 2020 Conference on Emperical Methods in Natural Language Processing (EMNLP 2020)

BibTeX / PDF / Code

Universal Phone Recognition with a Multilingual Allophone System
Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R Mortensen, Graham Neubig, Alan W Black, Florian Metze
45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

BibTeX / PDF / Code

Towards Zero-shot Learning for Automatic Phonemic Transcription
Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W Black, Florian Metze
34th AAAI Conference on Artificial Intelligence (AAAI 2020)

BibTeX / PDF

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models
Siddharth Dalmia, Abdelrahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer
arXiv 2019

BibTeX / PDF

Cross-Attention End-to-End ASR for Two-Party Conversations
Suyoun Kim, Siddharth Dalmia, Florian Metze
20th Annual Conference of the International Speech Communication Association (InterSpeech 2019)

BibTeX / PDF

Multilingual Speech Recognition with Corpus Relatedness Sampling
Xinjian Li, Siddharth Dalmia, Alan W Black, Florian Metze
20th Annual Conference of the International Speech Communication Association (InterSpeech 2019)

BibTeX / PDF

SANTLR: Speech Annotation Toolkit for Low Resource Language
Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W Black, Florian Metze
20th Annual Conference of the International Speech Communication Association (InterSpeech 2019). Show and Tell Track

BibTeX / PDF / Demo

The ARIEL-CMU Systems for LoReHLT18
Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R Mortensen, Graham Neubig, Eduard Hovy, Alan W Black, Jaime Carbonell, Graham V Horwood, Shabnam Tafreshi, Mona Diab, Efsun S Kayi, Noura Farra, Kathleen McKeown
CMU System Description for Low Resource Human Language Technologies (LoREHLT 2018)

BibTeX / PDF / Project

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Suyoun Kim, Siddharth Dalmia, Florian Metze
57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)

BibTeX / PDF

Phoneme Level Language Models for Sequence Based Low Resource ASR
Siddharth Dalmia, Xinjian Li, Alan W Black, Florian Metze
44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019)

BibTeX / PDF

Situation Informed End-to-End ASR for CHiME-5 Challenge
Suyoun Kim*, Siddharth Dalmia*, Florian Metze
5th International Workshop on Speech Processing in Everyday Environments (CHIME 2018)

BibTeX / PDF

Domain Robust Feature Extraction for Rapid Low Resource ASR Development
Siddharth Dalmia*, Xinjian Li*, Florian Metze, Alan W. Black
7th IEEE Workshop on Spoken Language Technology (SLT 2018)

BibTeX / PDF

Sequence-based Multi-lingual Low Resource Speech Recognition
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black
43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018)

BibTeX / PDF / Code / Slides

Epitran: Precision G2P for Many Languages
David R. Mortensen, Siddharth Dalmia, Patrick Littell
11th International Conference on Language Resources and Evaluation (LREC 2018)

BibTeX / PDF / Code

An Approach for Self-Training Audio Event Detectors Using Web Data
Benjamin Elizalde*, Ankit Shah*, Siddharth Dalmia*, Min Hun Lee*, Rohan Badlani*, Anurag Kumar*, Bhiksha Raj, Ian Lane
25th European Signal Processing Conference (EUSIPCO 2017)

BibTeX / PDF

Robust ASR using neural network based speech enhancement and feature simulation
Sunit Sivasankaran, Aditya Arie Nugraha, Emmanuel Vincent, Juan A Morales-Cordovilla, Siddharth Dalmia, Irina Illina, Antoine Liutkus
14th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2015)

BibTeX / PDF

Academic Services
ICML 2019, LREC 2020, ACL 2020 (SRW), EMNLP 2020, NeurIPS 2020, AACL 2020 (SRW), EACL 2021, AAAI 2021, NAACL 2021, ACL 2021, ICML 2021, INTERSPEECH 2021, ACMMM 2021, EMNLP 2021, NeurIPS 2021, ICLR 2022
Teaching Assistant
Spring 2019: Large Scale Multimedia Analysis, Graduate Course @ CMU

Fall 2019: Speech Recognition and Understanding, Graduate Course @ CMU


Website source from Jon Barron here