Pedro J. Moreno, "Speech Recognition in Noisy Environments," Ph.D Thesis, ECE Department, CMU, May, 1996. Abstract The accuracy of speech recognition systems degrades severely when the systems are operated in adverse acoustical environments. In recent years many approaches have been developed to address the problem of robust speech recognition, using feature-normalization algorithms, microphone arrays, representations based on human hearing, and other approaches. Nevertheless, to date the improvement in recognition accuracy afforded by such algorithms has been limited, in part because of inadequacies in the mathematical models used to characterize the acoustical degradation. This thesis begins with a study of the reasons why speech recognition systems degrade in noise, using Monte Carlo simulation techniques. From observations about these simulations we propose a simple and yet effective model of how the environment affects the parameters used to characterize speech recognition systems and their input. The proposed model of environment degradation is applied to two different approaches to environmental compensation, data-driven methods and model-based methods. Data-driven methods learn how a noisy environment affects the characteristics of incoming speech from direct comparisons of speech recorded in the noisy environment with the same speech recorded under optimal conditions. Model-based methods use a mathematical model of the environment and attempt to use samples of the degraded speech to estimate the parameters of the model. In this thesis we argue that a careful mathematical formulation of environmental degradation improves recognition accuracy for both data-driven and model-based compensation procedures. The representation we develop for data-driven compensation approaches can be applied both to incoming feature vectors and to the stored statistical models used by speech recognition systems. These two approaches to data-driven compensation are referred to as RATZ and STAR, respectively. Finally, we introduce a new approach to model-based compensation with solution based on vector Taylor series, referred to as the VTS algorithms. The proposed compensation algorithms are evaluated in a series of experiments measuring recognition accuracy for speech from the ARPA Wall Street Journal database that is corrupted by additive noise that is artificially injected at various signal-to-noise ratios (SNRs). For any particular SNR, the upper bound on recognition accuracy provided by practical compensation algorithms is the recognition accuracy of a system trained with noisy data at that SNR. The RATZ, VTS, and STAR algorithms achieve this bound at global SNRs as low as 15, 10, and 5 dB, respectively. The experimental results also demonstrate that the recognition error rate obtained using the algorithms proposed in this thesis is significantly better than what could be achieved using the previous state of the art. We include a small number of experimental results that indicate that the improvements in recognition accuracy provided by our approaches extend to degraded speech recorded in natural environments as well. We also introduce a generic formulation of the environment compensation problem and its solution via vector Taylor series. We show how the use of vector Taylor series in combination with a Maximum Likelihood formulation produces dramatic improvements in recognition accuracy.