Homework 5
Out: Jan-17 Due: Jan-22 Sunday night (12:00)
To submit: Send to Stan (scjou@cs.cmu.edu) the NFS path containing your work.
In this homework we are going to experiment with various training setup, analyze the scores, visualize the paths, and write training labels. Please follow the steps below:
- While we've defined the speakers 0[36]* as the training set, here additionally we define the speaker 090 as the development set (or cross-validation set). Please build the database for the development set if necessary.
- Re-write your start-up.tcl to append the tcl-code for loading the flat cbs and dss parameters and opening the training database.
- Follow Exercise-5 to train the context-indepedent speech recognizer:
Step a. Source start-up.tcl
Step b. Train 10 iterations using fwdBwd
Step c. Train 10 iterations using viterbi
Step d. Train 10 iterations using fwdBwd, with split-and-merge
Step e. Train 10 iterations using viterbi, with split-and-merge
- Note that for split-and-merge, here please set the maxGaussians to 32 and mergeThresh to 128. To run Forward-Backward, it is suggested here to use a topN option like this: path fwdBwd hmm -topN 2000 in order to speed up and avoid crash due to huge memory consumption.
- To analyze the score, use path viterbi hmm as shown in Exercise-5. For every utterance in the training set, compute the score and accumulate it, and accumulate the frame number. After one iteration of training, compute the average score using the accumulated score and accumulated frame number. Also, you can do this on the development set.
- To visualize the training result, use path stateMatrix gamma as shown in Exercise-5. In the training script, you should save the gamma FMatrix in order to visualize it later using gamma display. Name the gamma file as gamma.$UTTID.$iter.fmat . Note that you should not visualize gamma during training; instead, you should treat the visualization as a post-process step.
- Remember to save the training results! (cbs/dss) :-)
- Task 5.1: Training analysis.
(i) For every iteration, compute the average scores on the training set. Make a plot to show how the scores improve along training iterations.
(ii) Pick any three iterations that make interesting plots of gamma visualization on one utterance of the training set. Submit the visualization plot. The purpose is to show the training progress.
(iii) Do (i), but on the development set.
(iv) Do (ii), but on the development set.
(v) After 40 iterations, save the training label using the path method bsave for every training utterance. Name the label as LBL5.1/${SPKID}/${UTTID}.lbl.gz .
- Task 5.2: Do the same thing as Task 5.1, except to use fwdBwd only. i.e. change Step c and e to fwdBwd . Save the labels to LBL5.2/ .
- Task 5.3: Do the same thing as Task 5.1, except to use viterbi only. i.e. change Step b and d to viterbi . Save the labels to LBL5.3/ .
-
Please send the NFS paths of your work to Stan. Also make a simple report containing your visualization of gamma graphs and score graphs. The preferred report format is .pdf or .ps. Word .doc is acceptable. The purpose of the report is to visualize the training progress, so you don't need to write a lot of words; showing the graphs is enough.
Last modified: Mon Jan 16 13:07:12 EST 2006
Maintainer: scjou@cs.cmu.edu.