Anurag Kumar

PhD Candidate
Machine Learning and Signal Processing Group
Google Scholar

Contact:
GHC 5509
Language Technologies Institute
Carnegie Mellon University
Pittsburgh, PA, USA
Email: alnu [AT] andrew [DOT] cmu [DOT] edu
            alnu [AT] cs [DOT] cmu [DOT] edu

Theory is the first term in the Taylor series expansion of practice. - Thomas Cover

I have had my results for a long time, but I do not yet know how I am to arrive at them. - Carl Friedrich Gauss


Hi! This is my dugout on the web.

I am currently a PhD student in Language Technologies Institute (LTI), School of Computer Science at Carnegie Mellon University. I joined CMU is Fall 2013 and is advised by Prof. Bhiksha Raj, who leads the Machine Learning and Signal Processing group. The name of my research group, Machine Learning and Signal Processing, more or less gives away my broad research interests.
I primarily work on Acoustic Intelligence and Machine Perception of Sounds, which involves developing methods to make machines understand natural sounds like human can. It involves content analysis of audio recordings in terms of sound events as well as natural language understanding for sounds. Large scale Audio Event Detection using weakly labeled and web data is an important part of my research. You can find more information about my research work by looking at my publications. Before joining CMU, I did my undergraduate at Indian Institute of Technology, Kanpur from where I obtained B.Tech-M.Tech Integrated Dual degree in Electrical Engineering in 2013.

I successfully finished my Thesis Proposal in Dec. 2016. My thesis title is Acoustic Intelligence: Machine Perception of Non-Speech Sounds. My thesis committee consists of Prof. Bhiksha Raj (advisor), Dan Ellis (Google), Alex Hauptmann (LTI), LP Morency (LTI) and Rita Singh (LTI). I am graduating in Spring (or may be Summer), 2018.

I interned with Christian Fuegen in the Speech and Audio Team at Facebook Research during the summers of 2017. My work focused on video understanding using Audio Event Detection, relying primarily on weakly labeled data.
Previously, in 2015 summers, I interned with Dinei Florencio in the Multimedia, Interaction, and Communication (MIC) group at Microsoft Research, Redmond. My research there focused on Speech Enhancement using Deep Neural Networks. You can find details about these works by taking a look at the corresponding papers below.

PUBLICATIONS

arXiv Preprints/Tech Reports

A Closer Look at Weak Label Learning for Audio Events [Under Review]
Ankit Shah*, Anurag Kumar*, Alex Hauptmann and Bhiksha Raj [arXiv] (*Equal Contribution)

Features and Kernels for Audio Event Recognition
Anurag Kumar, Bhiksha Raj [arXiv]


Published (Chronological)

Classifier Risk Estimation under Limited Labeling Resources
Anurag Kumar, Bhiksha Raj [arXiv Version]
22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) , (Long Presentation), 2018

Knowledge Transfer From Weakly Labeled Audio Using Convolutional Neural Network For Sound Events and Scenes
Anurag Kumar, Maksim Khadkevich, Christian F├╝gen [arXiv Version]
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
Sets state of art results on Audioset and ESC-50 datasets. Companion Webpage here

Content Based Representations Of Audio Using Siamese Neural Networks
Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj [arXiv Version]
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Framework For Evaluation Of Sound Event Detection In Web Videos
Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj, [arXiv Version]
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data
Anurag Kumar, Bhiksha Raj [arXiv]
in NIPS Workshop on Machine Learning for Audio, 2017

Audio Content based Geotagging in Multimedia
Anurag Kumar, Benjamin Elizalde, Bhiksha Raj [arXiv Version]
in Interspeech, 2017.

An Approach for Self-Training Audio Event Detectors Using Web Data
Ankit Shah, Rohan Badlani, Anurag Kumar, Benjamin Elizalde, Bhiksha Raj [arXiv Version]
in 25th European Signal Processing Conference (EUSIPCO), 2017

Audio Event and Scene Recognition: A Unified Approach using Strongly and Weakly Labeled Data
Anurag Kumar, Bhiksha Raj [arXiv Version]
IEEE International Joint Conference on Neural Networks (IJCNN), 2017.

Discovering Sound Concepts and Acoustic Relations In Text
Anurag Kumar, Bhiksha Raj, Ndapandula Nakashole [arXiv Version]
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017. Companion Webpage here

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording
Benjamin Elizalde, Anurag Kumar, et. al. [arXiv version]
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2016.

Audio Event Detection using Weakly Labeled Data
Anurag Kumar, Bhiksha Raj
in 24th ACM International Conference on Multimedia (ACM Multimedia), 2016.[arXiv Version]

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks
Anurag Kumar, Dinei Florencio [arXiv Version]
in Interspeech, 2016. Companion Webpage here .

Weakly Supervised Scalable Audio Content Analysis
Anurag Kumar, Bhiksha Raj
IEEE International Conference on Multimedia & Expo (ICME), 2016.[arXiv Version]

A Novel Ranking Method For Multiple Classifier Systems
Anurag Kumar, Bhiksha Raj
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

Informedia@ Trecvid 2014 med and mer
CMU Aladdin MED Team

Unsupervised Fusion Weight Learning in Multiple Classifier Systems
Anurag Kumar, Bhiksha Raj [arXiv]

Detecting Sound Objects In Audio Recordings
Anurag Kumar, Rita Singh, Bhiksha Raj
22nd European Signal Processing Conference (EUSIPCO), 2014.

Undergraduate Papers

Monaural Speaker Segregation Using Group Delay Spectral Matrix Factorization
Karan Nathwani, Anurag Kumar and Rajesh Hegde
20th National Conference on Communications (NCC), 2014 [Nominated for Best Paper Award].

Event Detection in Short Duration Audio Using Gaussian Mixture Model and Random Forest Classifier
Anurag Kumar, Rajesh Hegde, Rita Singh and Bhiksha Raj
21 st European Signal Processing Conference (EUSIPCO), 2013.

Audio event detection from acoustic unit occurrence patterns
Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri and Bhiksha Raj
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012 .


OTHERS - Other academic information