Homework 7
Out: Feb-03 Due: Feb-08 Wednesday night (12:00)
To submit: Send to Stan (scjou@cs.cmu.edu) the NFS path containing your work.

In this homework we are going to train condext-dependent (CD) speech recognizers using previously generated training labels. The whole training process includes LDA, sampling, K-means, and label (and/or Viterbi and/or EM) training. Please follow the steps below:
- As described in Exercise-7, here this script should use the CD setup you generated in Homework-6 (cbs, dss, etc) and use the training labels generated in Homework-5. You should train on the whole training set 0[36]*.
- Follow Exercise-7, estimate the LDA transform. Note that you should estimate the LDA on -7/+7 adjacent frames and cut the LDA dimension to 40.
- Sample the training set.
- Run K-means on the samples and store the GMMs to LDAd40/CD-{cbs|dss}.i0.*.
- Do label training for 12 iterations with split and merge followed by two iterations of Viterbi training. To do label training, run the methods path bload your_label_file and path map hmm (compared to path viterbi hmm or path fwdBwd hmm). For split-and-merge, set the maxGaussians to 32 and mergeThresh to 128. Store the trained GMMs to LDAd40/CD-{cbs|dss}.i12.*.
- Train another two systems, but cut the LDA dimension to 32 and 24, respectively. Store the trained GMMs to LDAd32/ and LDAd24/ . I.e. repeat the steps above (LDA, sampling, K-means, train), but with different LDA cuts.
-
Please send the NFS paths of your work to Stan.
Last modified: Fri Feb 3 12:53:29 EST 2006
Maintainer: scjou@cs.cmu.edu.