Compensation Schemes for SPINE
Assumes that the training data is clean
Assumes that the test data has been corrupted by linear filtering and
additive noise.
y = x + h + IDCT(log(1 + exp(DCT( n - x - h))))
y = cepstrum of noisy speech
x = cepstrum of clean speech that was corrupted to give noisy speech
h = cepstrum of impulse response of linear filter
n = cepstrum of noise
Estimates the value of h (cepstrum of impulse response of the
linear filter) and n (cepstrum of the additive noise) and
compensates for them to estimate x from y
The recipe:
- Estimate a Gaussian mixture distribution from the cepstra of clean
training speech
- For each test utterance, obtain ML estimate of linear filter and additive
noise parameters h and n, based on this distribution and the
test utterance itself
- Use a Minimum Mean Squared Error estimator to compensate for the
effect of the linear filter and the additive noise and estimate x
from y
Models the effect of linear filtering and additive noise as a shift
of the means of the Gaussians in the Gaussian mixture distribution
Variances of the Gaussians are assumed to be invariant with increasing
noise and filtering
back
Assumes that the training data is clean
Assumes that the test data has been corrupted by linear filtering and
additive noise.
y = x + h + log(1 + exp( n - x - h)))
(Note - these are log-spectral relations now. No DCT involved)
Estimates the log-spectral values of the impulse response of the
linear filter and the additive
noise and compensates for them.
The recipe:
- Estimate a Gaussian mixture distribution from the log spectra of clean
speech
- For each test utterance, obtain ML estimate of h, the log spectrum
of the impulse response of the linear filter, and the
mean and variance of n mean and variance of the additive noise parameters based on this distribution
and the test utterance itself
- Use a Minimum Mean Squared Error estimator to compensate for the
effect of the linear filter and the additive noise
Models the effect of linear filtering and additive noise as a shift
of the means and a scaling of the variances of the Gaussians in the
Gaussian mixture distribution
Unlike CDCN, variances are updated. Processing is done in the log-spectral
domain, rather than the cepstral domain
More effective than CDCN, but also more unstable (blows up if the
linear filter/additive noise model is incorrect)
back
Assumes that the training data is clean
Assumes that the test data has been corrupted by linear filtering and
additive noise.
Estimates the log-spectral values of the impulse response of the
linear filter and the additive
noise and modifies HMMs to account for them
The recipe:
- For each test utterance, obtain ML estimate of linear filter and the
mean and variance of the additive noise parameters based on clean speech
HMM and the test utterance itself
- Modify the means and variances of the Gaussians in the recognizer
to account for the channel and the noise
- Decode using modified recognizer
Models the effect of linear filtering and additive noise as a shift
of the means and a scaling of the variances of the Gaussians in the
Gaussian mixture distribution
Computationally and implementationally far more complex than VTS
Requires two passes of decoding (one to obtain a hypothesis, the
other to obtain noise and channel estimates based on this hypothesis.
This can be iterated)
back
Assumes that the training data has been corrupted by linear filter and
additive noise
Assumes that the test data has also been corrupted by linear filtering and
additive noise.
Estimates linear filter and additive noise cepstral values both
during training and decoding
The recipe:
- For each training utterance, estimate linear filter and additive noise
parameters using current HMM parameters. Compensate utterance for linear filter
and noise before adding to training buffers
- For each test utterance, obtain ML estimate linear of filter and the
mean and variance of the additive noise parameters based on clean speech
HMM and the test utterance itself
- Modify the means and variances of the Gaussians in the recognizer
to account for the channel and the noise
- Decode using modified recognizer
Models the effect of linear filtering and additive noise as a shift
of the means and a scaling of the variances of the Gaussians in the
Gaussian mixture distribution
Computationally and implementationally very complex, espeically during
training
back
Assume means of Gaussians have been transformed using an Affine transform
Estimate parameters of this transform and update the means
Recognize using updated means
Greater effectiveness using principal component MLLR or
inter class MLLR for small utterances (ref. Sam Joo Doh)
back
Train separate models for separate noise conditions
Model optimal HMM for test data as an interpolation of these models
Learn interpolation factor somehow (e.g. from estimated SNR)
Interpolate between the various models to obtain HMMs for decoding
For more details, refer to Juan Huerta
back
The simplest noise reduction scheme
The recipe:
- Obtain running estimate of noise spectrum based on silence regions
- Subtract noise spectrum from the power spectrum of noisy speech
- Compute cepstra from power subtracted spectrum
Can be performed on both training and test speech
back
Throw everything in and train
No additional processing during decoding
back