11-756 / 18799D Design and Implementation of ASR Systems

11-756/18799D ASR: Assignment 7, Training Phoneme Models

In this assignment we will train phoneme HMMs from the digit recordings used in Assignment 6 (both your own recordings and Aurora).

Problem 1: Train Phoneme models from continuous speech digit recordings. For problem 1, use the small corpus of continuous digit recordings you recorded for Assignment 6.

Specifically, you will have to train models for the following phonmes. Model each phoneme using THREE emitting states:

AX, AH, AY, EH, EY, F, IH, IY, K, N, OW, R, S, T, TH, UW, V, W, Z

To do so, express each of the digits from zero through nine as phoneme sequences using the following dictionary:

ONE:    W AX N
TWO:    T UW
THREE:  TH R IY
FOUR:   F OW R
FIVE:   F AY V
SIX:    S IH K S
SEVEN:  S EH V EH N
EIGHT:  EY T
NINE:   N AY N
ZERO:   Z IY R OW

The training procedure is now no different from that of training word models from continuous recordings, except that you will now be training phoneme models.

You will use the dictionary to represent your digit sequences as phoneme sequences. Now, using the procedure used to train digit models from continuous recordings, you can train phoneme models. Model silences as earlier (i.e add a silence model at the beginning and end of recordings, and at locations where you have known pauses between words).

As an example, if you recorded the digit sequence 123456 as training data, and you have silences at the beginning and end of the recording, you would represent the digit sequence in the following manner to train your phoneme models:

SIL W AX N T U TH R IY F OW R F AY V S IH K S SIL

If you know you actually also paused between 3 and 4 in the recording, you'd model it as:

SIL W AX N T U TH R IY SIL F OW R F AY V S IH K S SIL

For recognition, compose digit models from the phoneme models you have trained using the dictionary above. Recognize the same digit sequences you recognized for problem 1 of assignment 6. Note: You will NOT be recognizing phoneme sequences. You will compose word models and perform recognition of words!

The one key new concept you will have to use now is initialization. In Assignment 6 you initialized word models from your isolated word recordings. We do not have isolated phoneme recordings, so that procedure cannot be used.

Instead we will use the following procedure, still using the original isolated word recordings as our training set for initialization:

Problem 2: Train phoneme models from the Aurora training data handed to you for Assignment 6. Use the models trained in problem 1 as initialization. Compose word models from the trained phoneme models and use those to recognize the test set. Report performance.

The aurora data have an additional digit: "OH". For this, use the following dictionary entry:

OH:  OW

You will note that the phoneme required for this is already trained from problem 1, so you will need no additional work to initialize this model.

As before, use the loopy digit grammar to recognize the test set.

Due date: 8 May 2013