Privacy Preserving Voice Processing

**Machine Learning for Signal Processing**

Discrete data such as text are often modelled as having been generated by draws from a discrete random variable. Continuous-valued data such as images and sound spectra, on the other hand, are commonly modelled as draws from a continuous valued RV. But how about the intersection of the two?

In this project we investigate this intermediate space. We model the discrete-valued *support* of the continuous-valued RV,
as a discrete-valued RV, and the continuous value at the support as a normalized count of the number of draws of these discrete elements.

For example, the spectrogram of a speech signal shows the energy at a discrete set of frequencies, at a discrete number of time indices. By our model, time and frequency are treated as RVs, and the value of the spectrogram at any time-frequency as the count of the number of draws of that time-frequency pair from a discrete random process.

This model has some suprising properties, providing us with surprisingly simple algorithms for tasks such as monaural source separation, determination of atomic units from sounds, images, video and text, and even potential solutions to problems such as deblurring of images and deconvolution of sounds.

Mathematically, it can be shown to be identical to the popular technique of non-negative matrix factorization. However, it also provides us a simple framework for application of various priors, and also enables us to employ various statistical models and methods that have been developed for discrete data such as text. Conversely, the techniques we develop, particularly the model that obtains * sparse overcomplete decompositions * are observed to be effective models for discrete data.

For more details, click here

Speech Recognition Systems

Unusual Secondary Sensors

Speech Recognition with Spectro-Temporal Models

Knowledge-Base-Augmented Speech Recognition

Sub projects in search of a home:

- Secure Speaker Identification
- Gaussian quantization for fast speech recognition
- Updating the CMU Sphinx trainer for
- Fast initialization
- Discriminative training

- LM adaptation
- Speech recognition on distributed networks using map reduce
- Doppler sensors for denoising
- Patch models
- PLSA and sparse overcomplete decompositions for signal separation and denoising