19 Nov 1999
Sphinx Speech Group, CMU-SCS (rkm@cs.cmu.edu)
Bigram Backoff Language Model
·Two issues with large vocabulary bigram LMs:
·With vocabulary size V and N word exits per frame, NxV cross-word transitions per frame
·Bigram probabilities very sparse; mostly “backoff” to unigrams
·Optimize cross-word transitions using “backoff node”:
·Viterbi decision at backoff node selects single-best predecessor
Lexicon
A
B
Backoff node
A’s bigram successors
B’s bigram successors