DistribSet method: cdcn (Codebook Dependent Cepstral Normalisation)

Syntax: <distrib set> cdcn <distrib1> <distrib2> <feature> [optional flags]

Example:CDCNdss cdcn CDCN SIL(I) LOGSPEC -itcount 30

For the <feature> the CDCN algorithm is executed. The <distrib set> contains the joined CDCN distribution <distrib1>. Although <distrib1> holds the Information for silence and speech <distrib2> is needed to get the number of codebook vectors for silence. As result the channel compensated version of <feature> is placed in the feature designated in the codebook-set desription file.

Optional flags:

-itcount <number>: Number of Iterations: <number> can be any Number greater than 0. <number> Iterations are performed to estimate the additive noise and the linear distortion. While working on clean speech a value between 6 and 10 is sufficient. If the SNR of the processed data is low a greater Number of Iterations can lead to a better Channelkompensation
-n <feature FMatrix>: Return estimated additive noise: The estimated value of the additive noise is returned.
-q <feature FMatrix>: Return estimated linear distortion: The estimated value of the linear distortion is returned.
-f <feature FMatrix>: Return the a posteriori probability: The calculated a posteriori probabilities of the final Iteration are returned.

Basics

CDCN is an algorithm which performs channel compensation both for linear distortion and additive noise. To get an estimation of the linear distortion and the additive noise CDCN has an own codebook which represents natural speech. The codebook is divided into two parts: one part is for silence and the other part is for speech. The a posteriori probabilities calculated with the information of the codebook is used to iteratively estimate the linear distortion and the additive noise.

The correct name for the implemented version should be CDLSN (Codeword Dependent Log-Spectral Normalisation) but there are very little differences to the original CDCN and so we opted for CDCN. The basic difference is that we work in the log-spectral and not in the cepstral domain. We can consider this as a implementation detail because of the linear properties of the Fourier transform. For more information see: [1] Acero Alejandro. "Acoustical and Environmental Robustness in Automatic Speech Recognition", Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh Pennsylvania 15231 13.9.1990

How to use CDCN

The use of CDCN can be divided in two steps:

Train the CDCN codebook.

Use CDCN.

Train the CDCN codebook

CDCN needs a codebook with two parts one for silence and one for speech. To get the codebook we train two separate codebooks. The resulting codebooks are joined together by an initialisation step before CDCN needs them.

First of all we need a number of description files.

Codebook set description file.
Distribution set description file.
Distribution tree description file.
Feature description file (for the training of the CDCN Codebook).

With them we perform the training of the two codebooks.

Note: The shown files are only examples. Especially the feature description file differs from system to system.

Codebook set description file for CDCN:

SIL             CDCNFEA                 50      30      DIAGONAL 
SPEECH          CDCNFEA                 200     30      DIAGONAL

Distribution set description file for CDCN:

SIL(|)          SIL           
SPEECH(|)       SPEECH

Distribution tree description file for CDCN:

ROOT-b          {0=SIL} LSPEECH LSIL -  -
ROOT-m          {0=SIL} LSPEECH LSIL -  -
ROOT-e          {0=SIL} LSPEECH LSIL -  -

LSPEECH         {}      - - - SPEECH(|)
LSIL            {}      - - - SIL(|)

Feature description file (for the training of the CDCN Codebook):

To give the CDCN codebook an idea of channel compensated speech we use meansub.

#--------------------------------------------------------------------------
#fes    command         name            source          parameter
#--------------------------------------------------------------------------
$fes    readADC         ADC             $arg(ADCFILE)   -h $arg(ADCHEADER) \
                                                        -v 0 -offset mean
#----------------- mel filter bank ----------------------------------------
set melN 30

$fes   spectrum        FFT             ADC             16ms 


if { [llength [objects FBMatrix matrixMEL]] != 1} {
   set points [$fes:FFT configure -coeffN]
   set rate   [expr 1000 * [$fes:FFT configure -samplingRate]]
   [FBMatrix matrixMEL] mel -N $melN -p $points -rate $rate
}
$fes   filterbank       MEL             FFT            matrixMEL

$fes   log              CDCNFEA         MEL            1.0 1.0
$fes   meansub          CDCNFEA         CDCNFEA        -a 0

Use CDCN:

To use cdcn there is very little to do.

Initialise the data-structures with "cdcninit" (only once!).
Use the DistribSet method cdcn.

Initialise the data-structures with "cdcninit":

If we have the trained CDCN codebook parts we use the TCL script "cdcn.tcl" to initialise the data-structures. The script loads the description files, then loads the corresponding codebook weights and joins the two codebooks "SIL(I)" and "SPEECH(I)" to one codebook with the name "CDCN".

# -----------------------------------------------------------------------
#initialise CDCN -> basic Object = CDCNdss
# -----------------------------------------------------------------------
source ../cdcn_desc/cdcn.tcl
cdcnInit        $SID -dssdesc ../cdcn_desc/cdcnDistribSet -dssparam ../cdcn_create_melv/3i.dss.gz \
                     -cbsdesc ../cdcn_desc/cdcnCodebookSetMel -cbsparam ../cdcn_create_melv/3i.cbs.gz
# -----------------------------------------------------------------------

Use the DistribSet method cdcn:

A little feature description file shows the usage of cdcn.

#--------------------------------------------------------------------------
#fes    command         name            source          parameter
#--------------------------------------------------------------------------
$fes    readADC         ADC             $arg(ADCFILE)   -h $arg(ADCHEADER) \
                                                        -v 0 -offset mean
#----------------- mel filter bank ----------------------------------------
set melN 30

$fes   spectrum        FFT             ADC             16ms 


if { [llength [objects FBMatrix matrixMEL]] != 1} {
   set points [$fes:FFT configure -coeffN]
   set rate   [expr 1000 * [$fes:FFT configure -samplingRate]]
   [FBMatrix matrixMEL] mel -N $melN -p $points -rate $rate
}
$fes   filterbank       CDCNFEA         FFT            matrixMEL
$fes   log              MCEP            CDCNFEA            1.0 1.0

#----------------- CDCN ----------------------------------------
global CDCNdss
CDCNdss cdcn            CDCN            SIL(|)         MCEP   -itcount 30

#----------------- cepstrum ----------------------------------------
set cepN 13

if { [llength [objects FMatrix matrixCOS]] != 1} {
   set n [$fes:CDCNFEA configure -coeffN]
   [FMatrix matrixCOS] cosine $cepN $n -type 1
}
$fes    matmul          MCEP            CDCNFEA          matrixCOS
#----------------- context -----------------------------------------------

Remarks:

In the referred implementation of CDCN it only makes sense to place "cdcn" directly behind the logarithmisation of the spectrum. Otherwise the calculation of the correction vectors won't work! (see [2])

There is a possibility to use other transforms like third square-root instead of the logarithm. But this can't be done without internal changes in the "cdcn" method.

References:

[1] Acero Alejandro.: Acoustical and Environmental Robustness in Automatic Speech Recognition; Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh Pennsylvania 15231 13.9.1990

[2] Baumgärtner Rainer.: Diplomarbeit: Kanalkompensation in der Spracherkennung; Universität Karlsruhe, Institut für Logik, Komplexität und Deduktionsysteme 1996

rainerb@ira.uka.de