Homework 8
Out: Feb-10 Due: Feb-15 Wednesday night (12:00)

To submit: Send to Stan (scjou@cs.cmu.edu) the NFS path containing your work.

In this homework we are going to build language models (LMs). Please follow the steps below:

  1. Use the SRI LM Toolkit and follow Exercise-8 to do the Tasks below. The training, development, and test sets are /project/Class-11-753/data/CH/trl.utf8.set/trl.utf8.{train|dev|test} , respectively.
  2. Task 8-2: Build word based language models (1/2/3-Gram) for Mandarin given the training data and measure the perplexity on training and development set.
  3. Task 8-3: Build character based language models (1-6-Gram) for Mandarin given the training data and measure the perplexity on training and development set.
  4. Task 8-4: Collect more language model data and add them to the training data. Build language models and measure the perplexity.

Last modified: Fri Feb 10 15:07:04 EST 2006
Maintainer: scjou@cs.cmu.edu.