Publications

Conference papers

Spectrogram Dimensionality Reduction with Independence Constraints, Kevin W. Wilson and Bhiksha Raj, Submitted to ICASSP 2010

A Hybrid Physial and Statistical Dynamic Articulatory Framework Incorporating Analysis-by-Synthesis for Improved Phone Classification, Ziad Al Bawab, Bhiksha Raj, Richard M. Stern, Submitted to ICASSP 2010

Ultrasonic Sensing for Robust Speech Recognition, Sundararajan Srinivasan, Bhiksha Raj, Tony Ezzat, Submitted to ICASSP 2010

Latent-Variable Decomposition Based Dereverberation of Monaural and Multi-Channel Signals Rita Singh, Bhiksha Raj, Paris Smaragdis, Submitted to ICASSP 2010

Synthesizing Speech from Doppler Signals, Arthur Toth, Bhiksha Raj, Kaustubh Kalgaonkar and Tony Ezzat, Submitted to ICASSP 2010

Knowledge-assisted validation of recognition hypotheses, Benjamin Lambert, Bhiksha Raj and Scott Fahlman, Submitted to ICASSP 2010

Properties and Applications of Ultrasonic Doppler Sensing in Human-Computer Interaction, Chris Harriosn, Bhiksha Raj, Paul Dietz, Submitted to Tangible Embedded and Embodied Interaction (TEI10), 2010

A Sparse Non-Parameteric Approach for Single Channel Separation of Known Sounds, Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj, to be presented at NIPS 2009

Missing Data Imputation for Spectral Audio Signals, Paris Smaragdis, Bhiksha Raj, Madhusudana Shashanka, IEEE International Workshop for Machine Learning in Signal Processing, September 2009

Signal Separation for Robust Speech Recognition based on Phase Difference Information obtained in the Frequency Domain, Chanwoo Kim, Kshitiz Kumar, Bhiksha Raj, Richard Stern, Interspeech 2009

Towards Fusion of Feature Extraction and Acoustic Model Training: A Top Down Process for Robust Speech Recognition, Yu-Hsiang Bosco Chiu, Bhiksha Raj, Richard M. Stern, Interspeech 2009

Towards Speech Synthesis from Elecetromagnetic Articulograph Data using a Physical Model of the Vocal Tract, Ziad Al-Bawab, Lorenzo Turicchia and Bhiksha Raj, Interspeech 2009.

Probabilistic Factorization of Non-Negative Data with Co-occurrence Cosntraints, Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj and Gautham J. Mysore, 8th International Conference on Independent Component Analysis and Signal Separation, 2009

Word Particles applied to Information Retrieval, Evandro Gouvea and Bhiksha Raj, Proc., European Conference on Information Retrieval (ECIR), 2009

One-handed Gesture Recognition using Ultrasonic Doppler Sonar, Kaustubh Kalgaonkar and Bhiksha Raj, Proc. ICASSP 2009

A Joint Decoding Algorithm for Multiple-Example-Based addition of Words to a Pronunciation Lexicon, Dhananjay Bansal, Nishanth Nair, Rita Singh and Bhiksha Raj, Proc. ICASSP 2009

Recognizing Talking Faces from Acoustic Doppler Measurements, Kaustubh Kalgaonkar and Bhiksha Raj, IEEE Intl. Conf. on Automatic Face and Gesture Recognition, 2008

Regularized Non-negative Matrix Factorization with Temporal Dependencies for Speech Denoising, Kevin Wilson, Bhiksha Raj and Paris Smaragdis, Interspeech 2008

Analysis-by-synthesis Features for Speech Recognition, Al-Bawab, Z., Raj, B. and Stern, R., ICASSP 2008.

Ultrasonic Doppler Sensor for Speaker Recognition, Kalgaonkar, K. and Raj, B. ICASSP 2008.

Sparse and Shift-invariant Feature Extraction from Non-negative Data, Smaragdis, P., Raj, B. and Shashanka, M. ICASSP 2008.

Speech Denoising using Non-negative Matrix Factorization with Priors, Wilson, K., Raj, B., Smaragdis, P. and Diwakaran, AICASSP 2008.

Sparse Overcomplete Latent Variable Decomposition of Counts Data, Shashanka, M., Raj, B. and Smaragdis, P., NIPS 2007

Example Driven Bandwidth Expansion, Smaragdis, P. and Raj, B. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA 2007).

Probabilistic Deduction of Symbol Mappings for Extension of Lexicons, Singh, R., Gouvea, E. and Raj, B. Interspeech 2007, September 2007.

Acoustic Doppler Sonar for Gait Recognition, Kalgaonkar, K. and Raj, B. IEEE International Conference on Advance Video and Signal-based Surveillance (AVSS2007), September 2007.

Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures. Shashanka, M., Smaragdis, P. and Raj, B. 7th International Conference on Independent Component Analysis and Signal Separation (ICA 2007), September 2007.

Sensor and Data Systems, Audio-Assisted Cameras and Acoustic Doppler Sensors, Kalgaonkar, L, Smaragdis, P. and Raj, B. To be presented at the IEEE Comptuer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), June 2007.

Separating a Singing Voice from Background Music for Song Personalization. Raj, B.; Smaragdis, P.; Shashanka, M. and Singh, R. Proc. Frontiers of Research in Speech and Music, 2007.

Bandwidth Expansion with a Polya Urn Model. Raj, B.; Singh, R.; Shashanka, M. and Smaragdis, P. IEEE International Conference on Acoustics Speech and Signal Processing, 2007.

Sparse Overcomplete Decomposition for Single Channel Speaker Separation. Shashanke M. Raj, B. and Smaragdis, P. IEEE International Conference on Acoustics Speech and Signal Processing 2007.

An Integrated Approach to Improve Speech Recognition Rate for Non-native Speakers. Yunbin Deng, Xiaokun Li, Chiman Kwan, Roger Xu, Bhiksha Raj, Richard Stern and David Williamson, Proc. Interspeech 2006, Pittsburgh, PA. September 2006.

An Acoustic Doppler Based Front End for Hands-free Spoken User Interfaces. Kalgaonkar, K. and Raj, B. Proc. IEEE Spoken Language Technologies Workshop (SLT), Aruba, Dec 2006.

A Probabilistic Latent Variable Model for Acoustic Modeling. Smaragdis, P.; Raj, B. and Shashanka, M. Proc. Neural Information Processing Systems Workshop on Advances in Models for Acoustic Processing, 2006.

Two New Techniques for Natural Spoken Language Interfaces. Weinberg, G.; Raj, B. and Kalgaonkar, K. User Interface Software Technology 2006, Demonstration Poster.

Latent Dirichlet Decomposition for Single Channel Speaker Separation, Raj, B.; Shashanka, M.V.S.; Smaragdis, P., IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2006

A Robust Voice Activity Detector Using an Acoustic Doppler Radar, Hu, R; Raj, B., IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 171-176, November 2005

Reconstructing Spectral Vectors with Uncertain Spectrographic Masks for Robust Speech Recognition, Raj, B.; Singh, R., IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 27-32, November 2005

Latent Variable Decomposition of Spectrograms for Single Channel Speaker Separation, Raj, B.; Smaragdis, P., IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 17-20, October 2005

Recognizing Speech from Simultaneous Speakers, Raj, B.; Singh, R.; Smaragdis, P.; , Eurospeech, September 2005

Feature Compensation with Secondary Sensor Measurements, Raj, B.; Singh, R., 13th European Signal Processing Conference (EUSIPCO), September 2005

A Comparison between Spoken Queries and Menu-based Interfaces for In-Car Digial Music Selection, Clifton Forlines, Bent Schmidt-Nielsen, Bhiksha Raj, Kent WIttenburg, Peter Wolf, INTERACT 2005, Rome Italy, 2005

A Companding Front End for Noise Robust Automatic Speech Recognition, Jethran Guinness, Bhiskha raj, Bent Schmidt-Nielsen, Lorenzo Turiccia, Rahul Sarpeshkar, IEEE International Conference on acoustics speech and signal processing, 2005.

SpokenQuery: An Alternate Approach to Choosing Items with Speech, Peter Wolf, Joseph Woelfel, Jan Van Gemert, Bhiksha Raj, David Wong, International Conference on Speech and Language Processing, INTERSPEECH 2004, Jeju Korea, 2004

A Minimum Mean Squared Error Estimator for Single Channel Speaker Separation, Aarthi M. Reddy and Bhiksha Raj, International Conference on Speech and Language Processing, INTERSPEECH 2004, Jeju Korea, 2004

Soft Mask estimation for single channel speaker separation, Aarthi M. Reddy, Bhiksha Raj, ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio (SAPA 2004), Jeju Korea, 2004.

A Speech-In List-Out Approach to Spoken User Interfaces, Jan Van Gemert, Bhiksha Raj, Peter Wolf, Joe Woelfel, Human Language Technologies Conference, Boston, 2004.

On tracking noise with Linear Dynamical System Models, Bhiksha Raj, Rita Singh, and Richard M. Stern (2004), IEEE international conference on acoustics speech and signal processing, 2004

Speech recognizer based maximum likelihood beamforming, B. Raj, M. L. Seltzer, and M. J. Reyes-Gomez (2003), Proc. NSF workshop on perspectives on speech separation, Montreal, Canada, 2003.

Multi-Channel source separation by beamforming trained with factorial HMMs, M. J. Reyes-Gomez, B. Raj, and D.P.W.Ellis (2003), Proc. IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), Mohonk, NY, 2003.

Audio-assisted news video browsing using a GMM based generalized sound recognition framework, R. Radhakrishnan, Z. Xiong, B. Raj, and A. Divakaran (2003), Proc. SPIE Internet Multimedia Management Systems IV, Orlando, Florida, 2003.

Classification with free energy at raised temperatures, R. Singh, M. Warmuth, B. Raj and P. Lamere (2003), Proc. European Conf. on Speech Communication and Technology, Geneva, Switzerland 2003.

Design of the CMU Sphinx-4 Decoder, P. Lamere, P. Kwok., W. Walker, E. Gouvea, R. Singh, B. Raj, and P. Wolf (2003), Proc. European Conf. on Speech Communication and Technology, Geneva, Switzerland 2003.


Lossless Compression of Language Model Structure and Word Identifiers,
B. Raj and E.W.D. Whittaker (2003), IEEE Intl. Conf. on Acoustics Speech and Signal Processing, Hong Kong, China 2003.


Multi-Channel Source Separation by Factorial HMMs,
M. J. Reyes-Gomez, B. Raj, and D.P.W.Ellis (2003), IEEE Intl. Conf. on Acoustic Speech and Signal Processing, Hong Kong, China 2003.


Tracking Noise via Dynamical Systems with a Continuum of States,
R. Singh and B. Raj (2003), IEEE Intl. Conf. on Acoustics Speech and Signal Processing, Hong Kong, China 2003.

The MERL SpokenQuery Information Retrieval System: A System for Retrieving Pertinent Documents from a Spoken Query, P. Wolf and B. Raj (2002), IEEE Conference and Multimedia Expo, Lausanne, Switzerland, 2002


Speech Recognizer-based Microphone Array Processing for Robust Hands-Free Speech Recognition,
M. L. Seltzer, B. Raj, R. M. Stern (2002), IEEE Intl. Conf. on Acoustics Speech and Signal Processing, Orlando, Florida 2002.


Distributed Speech Recognition with Codec Parameters,
B. Raj, J. Migdal, and R. Singh (2001), Proc. Automatic Speech Recognition and Understanding Workshop, 2001

Robust Speech Recognition: The case for restoring missing features, B. Raj, M. L. Seltzer, and R.M. Stern (2001), ISCA workshop on consistent and reliable perceptual cues (CRAC), September 2001.

Comparison of width-wise and length-wise compression of language models, E. W. D. Whittaker and B. Raj (2001), European Conference on Speech Communication and Technology (Eurospeech),  September 2001.


Quantization based language model compression, E. W. D. Whittaker and B. Raj (2001), European Conference on Speech Communication and Technology (Eurospeech),  September2001

Calibration of Microphone Arrays for Improved Speech Recognition, M. L. Seltzer and B. Raj (2001), European Conference on Speech Communication and Technology (Eurospeech),  September 2001.


A Boosting Approach for Confidence Scoring,
Pedro Moreno, Beth Logan and Bhiksha Raj, 7th European Conference on Speech Communication and Technology (Eurospeech),  September 2001.


Speech in noisy environments: robust automatic segmentation, feature extraction, and hypothesis combination
, R. Singh, M. L. Seltzer, B. Raj, and R. M. Stern., Proc. IEEE Intl. Conf. of Acoustic Speech and Signal Processing (ICASSP), Salt Lake City, Utah, 2001.


Reconstruction of Damaged spectrographic features for robust speech recognition,

B. Raj,M. Seltzer, and R.M. Stern (2000)
,Proceedings of Intl. Conf. on Speech and Language Processing (ICSLP) Beijing China, Septmeber 2000.

Automatic Generation of Phone Sets and Lexical Transcriptions, R. Singh, B. Raj,and R.M. Stern, (2000), Proc. IEEE Intl. Conf. on Acoustics Speech and Signal Processing (ICASSP), May 2000.


Domain Adduced State Tying for Cross-Domain Acoustic Modelling,
R. Singh, B. Raj, and R.M. Stern (1999), Proc. 6th European Conference on Speech Communication and Technology (Eurospeech), Budapest, September 1999.


Automatic Clustering and Generation of Contextual Questions for Tied States in Hidden Markov Models,
R. Singh, B. Raj, and R.M. Stern (1999), Proc. IEEE Intl. Conf. on Acoustics Speech and Signal Processing (ICASSP), Phoenix, AZ, 1999.


Inference of Missing Spectrographic Features for Robust Speech Recognition,
B. Raj, R. Singh, and R. M. Stern (1998), Proc. Intl. Conf. on Speech and Language Processing (ICSLP), Sydney, Australia, 1998.

The Effects of Background Music on Speech Recognition Accuracy, B. Raj, V. Parikh and R.M. Stern (1997), Proc. IEEE Intl. Conf. on Acoustic Speech and Signal Processing (ICASSP), Munich, Germany, 1997.

Speaker Adaptation and Environmental Compensation for the 1996 Broadcast News Task, V. N. Parikh, B. Raj, and R. M. Stern (1997), Proceedings of the DARPA Speech Recognition Workshop, 1997

Automatic Segmentation, Classification and Clustering of Broadcast News Audio, M. Siegle, U. Jain, B. Raj, and R. M. Stern (1997), Proceedings of the DARPA Speech Recognition Workshop,1997


Cepstral Compensation usingStatistical Linearization
,B. Raj, E. B.Gouvea, and R.M.Stern, (1997). Proc. Of the ESCA Tutoria and Research Workshop on Robust Speech Recognition for Unknown Communiation Channels, April 1997, Pont-au-Mousson, France

Compensation for Environmental Degradation in Automatic Speech Recognition, R. M. Stern, B. Raj, and P. J. Moreno, (1997). Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, April, 1997, Pont-au-Mousson, France, pp. 33-42

 

A Vector Taylor Series Approach For Environment-Independent Speech Recognition, P. J. Moreno, B. Raj, and R. M. Stern, Proc. of the ICASSP, Atlanta, GA, May 1996.

 

Cepstral Compensation By Polynomial Approximation For Environment-Independent Speech Recognition, B. Raj, E. GouvÍa, P. J. Moreno, and R. M. Stern, Proc. of the ICSLP, Philadelphia, PA, Oct. 1996.

 

Adaptation and Compensation: Approaches To Microphone And Speaker Independence In Automatic Speech Recognition, E. B. Gouvea, P. J. Moreno, B. Raj, T. M. Sullivan, and R. M. Stern, Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.

 

Recognition Of Continuous Broadcast News With Multiple Unknown Speakers And Environments, U. Jain, M. A. Siegler, S.-J. Doh, E. Gouvea, P. J. Moreno, B. Raj, and R. M. Stern, Proceedings of the ARPA Workshop on Speech Recognition Technology, Harriman, NY, Morgan Kaufmann, D. Pallett, Ed.

 

Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition, P. J. Moreno, B. Raj, E. GouvÍa, and R. M. Stern, Proc. of the ICASSP, Detroit, Michigan, 1995.

 

A Unified Approach to Robust Speech Recognition, P. J. Moreno, B. Raj, R. M. Stern, Proc. of Eurospeech-95, Madrid, Spain, September, 1995.

 

Approaches to Microphone Independence in Automatic Speech Recognition,P. J. Moreno, U. Jain, B. Raj, and R. M. Stern, Proc. of the Eighth Spoken Language Systems Technology Workshop, 1995.

 

Approaches to Environment Compensation in Automatic Speech Recognition, P. J. Moreno, B. Raj, and R. M. Stern, Proc. 15th International Conference on Acoustics, Trondheim, Norway, Vol. III, pp. 109-112, June, 1995.

 

A Computer Tutor with Voice I/O in Hindi, Rao, P.V.S., Raj, B., Sen, A. and Mallavadhani, G.R., Intl. Conf. on Knowledge Based Computing Systems, Dec. 1996, Mumbai.

 

Journal papers

 

Probabilistic Latent Variable Models as Non-negative Factorizations, Shashanka, M, Raj, B., and Smaragdis, P., Computational Neuroscience, May 2008

Probabilistic Latent Variable Model for Sparse Decomposition of Non-negative Data. Shahanka, M.,Raj, B. and Smaragdis, P. IEEE transactions on Pattern Analysis and Machine Intelligence 2008 (to appear)

Ultrasonic Doppler Sensor for Voice Activity Detection. Kalgaonkar, K. and Raj, B. IEEE Signal Processing Letters, Volume 14, Issue 10, Oct. 2007 Page(s):754 - 757

Shift Invariant Probabilistic Latent Component Analysis. Smaragdis, P. and Raj, B. Journal of Machine Learning Research, 2008 under review.

An FFT-based Companding Front-end for Noise-Robust Speech Recongition. Raj, B.; Turicchia, L.; Schmidt-Nielsen, B. and Sarpeshkar, R. EURASIP Journal on Audio, Speech, and Music Processing Volume 2007 (2007).

Voice Driven Applications in Non-Stationary and Chaotic Environments. Kwan, C.; Li, X.; Lao, D.; Deng, Y.; Raj, B.; Singh, R. and Stern, R. International Journal of Signal Processing, Vo. 3, No. 4, 2006.

Continuous Feature Adaptation for Non-native Speech Recognition. Deng, Y.; Li, X.; Kwan, C.; Raj, B. and Stern, R., International Journal of Signal Processing, Vol. 3, No. 4, 2006.

Soft Mask methods for Single Channel Speaker Separation, Reddy, A.M. and Raj, B. IEEE transactions on Audio Speech and Language Processing, to Appear.

Missing Feature Methods for Robust Automatic Speech Recognition, Bhiksha Raj and Richard M. Stern, IEEE Signal Processing Magazine, Sep 2005
 
Classification in Likelihood Spaces, R. Singh and B. Raj, Technometric, August 2004, vol. 46, no. 3, pp. 318-329(12)
 

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition, M. L. Seltzer, B. Raj, and R. M. Stern, IEEE Trans. on Speech and Audio Processing, 12(5): 489-498, September 2004.

Reconstruction of Missing Features for Robust Speech Recognition, B. Raj, M. L. Seltzer, and R. M. Stern, Speech Communication Journal43(4): 275-296, September 2004

A Bayesian Framework for Spectrographic Mask Estimation for Missing Feature Speech Recognition, M. L. Seltzer, B. Raj, and R. M. Stern, Speech Communication Journal 43(4): 379-393, September 2004.

Speech Recognizer Based Filter Optimization for Microphone Array Processing, M.L. Seltzer, and B. Raj, IEEE Signal Processing Letters, Vol. 10, Issue 3, pp. 69-71, March 2003

Classifier-based non-linear projection for adaptive endpointing of continuous speech, Bhiksha Raj and Rita Singh, Computer Speech and Language, Vol. 17, Issue 1, pp.5-26, January 2003

Automatic generation of sub-word units for speech recognition systems, R. Singh, B. Raj and R. M. Stern (2002), IEEE Transactions on Speech and Audio Processing, vol. 10(2), pp. 89-99, Feb. 2002

Data Driven Environmental Compensation for Speech Recognition: A Unified View. Pedro J. Moreno, Bhiksha Raj, R.M. Stern, Speech Communication, 24, 267-285, 1998

 

VOICE: A Voice-oriented Interactive Computing Environment,Ajuja, R., Raj B., Bondale, N., Furtado, X., Jose, T., Krishnan, S., Poddar, P., Rao, P.V.S., Samudravijaya, K. and Sen, A. Vivek, Spl. Issue on KBCS, Vol. 5, No., 4, 1992.

 

Book Chapters

Speech-based User Interfaces for the Automotive Environment. Schmidt-Nielsen, B., Harsham, B.; Raj, B. and Forlines, C.Handbook of Research on User Interface Design and Evaluation for Mobile Technology, Ed. Joanne Lumsden, Idea Group Reference, 2007.

Sensor and Data Systems, Audio-Assisted Cameras and Acoustic Doppler Sensors. Paris Smaragdis, Bhiksha Raj, Kaustubh Kalgaonkar. Multimodal Surveillance: Sensors, Algorithms and Systems, Eds. Zhigang Zhu and Tom Huang, Artech House, 2007.

Speech recognizer based maximum likelihood beamforming, B. Raj, M. L. Seltzer, and M. J. Reyes-Gomez (2003), Perspecitves on Speech Separation, Pierre Dievenyi (ed.), Kluwer Academic Publishers July 2004.

Signal and Feature Compensation Methods for Robust Speech Recognition, R. Singh, R. M. Stern and B. Raj (2002), CRC Handbook on Noise Reduction in Speech Applications, Ed. Gillian Davis, CRC Press, May 2002

Model Compensation and Matched Condition Methods for Robust Speech Recognition, R. Singh, B. Raj, and R. M. Stern (2002), CRC Handbook on Noise Reduction in Speech Applications, Ed. Gillian Davis, CRC Press, May 2002