Exercise9
Decoding with the Single Pass Decoder

# make sure we use utf-8 encoding
encoding system utf-8

source start-up9.tcl
# This loads the search dictionary!! (utf-8)
# data/CH/trl.utf8.dict/dictionary.utf8.desc
# load the acoustic models from homework7
# This requires the correct feature description file!
# Therefore, do not forget to load the LDA matrix from homework 7!
#
# Because some of the objects required for search 
# (LCMSet, RCMSet) query a list of phones and tags from 
# the SenoneSet it is necessary during the start-up to 
# create the SenoneSet with phones and tags objects!
# E.g. 
#  SenoneSet sns {dsStream}  -phones phonesSet:PHONES -tags tags 


# open database
# IMPORTANT: for demonstration purposes we will us the utterances from training
# However, in the homework9 you should use the data reserved 
# for system tuning to tune the parameters of the search.
set uttDB utterance
DBase db
db open ${uttDB}.dat ${uttDB}.idx    -mode "r"


# The single pass decoder does not use the AModelSet
# instead it uses the Phonetic Hidden Markov Model Set
# However we still need the entry node for the topology tree
PHMMSet phmmSet topoTree ROOT

# Tis object queries and stores left context models from word boundaries
# This are the models used for different left word contexts
LCMSet lcmSet phmmSet

# Tis object queries and stores right context models from word boundaries
# This are the models used for different right word contexts
RCMSet rcmSet phmmSet


# This is a word base language model from homework8
set lmFile ../lm/train.3.arpabo.gz
[LingKS lm NGramLM] load $lmFile


# the search vocabulary is the list of words which can be in the hypotheses of the decoder
# there are two types of words:
# Words that are known by the LingKS object and words that we call filler.
# Filler words get a constant penalty independent of the context they occur.
# Very often silence is not modeled within the language model, and therefore it is treaded as filler word.
# 
# The file format for the search vocabulary is very simple, 
# with one word per line (maybe quoted in {}) and if the 
# word should be treaded as a filler word a "1" as second element
# However, some words are required which are "(" and ")" to model utterance start and end.
# These words are mapped automatically to "<s>" and "</s>".
# The following are the first 6 lines of data/CH/trl.utf8.dict/train.svocab
# {(}
# {)}
# {$} 1
# {国务院}
# {召开}
# {第}

set svocabFile data/CH/trl.utf8.dict/train.svocab
SVocab svocab dict
SVMap svmap svocab lm
svocab read $svocabFile
svmap map base

# Search Network
STree stree svmap lcmSet rcmSet

# Linguistic Tree (propagates Language Model trough the search network)
LTree ltree stree

# the single pass decoder
SPass spass stree ltree


# The right configuration of the components is very important!

# configure the language model cache (trade off between memory and speed)
# However, if the cache gets to large it can slow down the recognition because
# the cache also need time to initialize/clear
ltree configure -cacheN 50 -ncacheN 10


# language model weight, word transition penalty and filler penalty
# these values are very important to achieve a good performance
# Unfortunately the settings of these values depend on many factors 
# and usual are optimized by searching the parameter space
svmap configure -phonePen 0.0 -wordPen 0.0 -filPen 30 -lz 30

# to get a result in a reasonable time the search space for decoding
# has to be pruned. The single pass decoder has several beams that can be set
spass configure -stateBeam 130
spass configure -morphBeam 80
spass configure -wordBeam  90

# a good relation between transN and morphN seems to be factor of 4
spass configure -transN 20 -morphN 5

# compute the features for utterance spk030_utt1
set xKey spk030_utt1
set uttInfo [db get $xKey]
fs eval $uttInfo

# do the decoding
spass run

# get the hypotheses from the decoder with timeout stamps
set hypo1 [spass.stab trace -v 0]

# get the hypotheses from the decoder with time stamps
set hypo2 [spass.stab trace -v 2]

# Question: What is a word graph (lattice)?
# Question: What is a word graph good for?

# we did not collect the word graph during decoding
# if the glat sub-object has a topN > 0 configured
# a word graph is created
spass.glat configure -topN 100 -alphaBeam 100
spass run

set ref {国务院 召开 第 ４５ 次 常务 会议 李鹏 主持 讨论 通过 矿产 资源 法 修正案 和 电影 管理 条例}
# Align computes the lattice error rate
# Make sure that sentence start/end "(|)" is provided 
spass.glat align "( $ref )" -v 1



spass run
spass.glat configure
# {-name spassLat} {-useN 0} {-nodeN 4301} {-linkN 5619} {-topN 100} {-alphaBeam 100.000000} {-singularLCT 0} {-expert 0} {-status CREATE} {-frameShift 0.010000}
spass.glat connect
spass.glat configure
# {-name spassLat} {-useN 0} {-nodeN 4301} {-linkN 6567} {-topN 100} {-alphaBeam 100.000000} {-singularLCT 0} {-expert 0} {-status CREATE} {-frameShift 0.010000}

# TASK: Write two Tcl procedure spassWriteTRL glatWriteTRL which append the hypotheses to a file
# w1 w2 w3 (key)

For the scoring tool sclite:

#
# Settings for sctk-2.1.1
#
# SGML.pl is not in the default search path for perl modules
# Therefore we have to add the following line
setenv PERL5LIB /usr/lib/perl5/vendor_perl/5.8.5
setenv PATH     /project/Class-11-753/tools/sctk-2.1.1/bin:$PATH
setenv MANPATH  /project/Class-11-753/tools/sctk-2.1.1/man:${MANPATH}


#
# sclite -i swb -h ${file1}.trn trn -r ${file2}.trn trn -o all dtl
#
# e.g. "sclite -i swb -h dev_z30_p0_f30.hypo trn -r data/CH/trn.utf8.set/trn.utf8.dev trn -o all dtl"
# This will create the files:
# dev_z30_p0_f30.hypo.dtl, dev_z30_p0_f30.hypo.sys, dev_z30_p0_f30.hypo.pra and dev_z30_p0_f30.hypo.pra
# These file contain a summary of the performance and a detailed analysis of errors
# When you report word error rate (WER) you find this information in the *.dtl file
#

# Task: Compute the hypotheses for different decoder settings and calculate the error rates (details see homework)

Last modified: Wed Feb 01 11:56:49 Eastern Standard Time 2006
Maintainer: tschaaf@cs.cmu.edu.

Exercise9 Decoding with the Single Pass Decoder

Exercise9
Decoding with the Single Pass Decoder