Disclaimer : Information provided here may not represent the standpoints of Carnigie Mellon University or CMU Sphinx Group.
Here are the steps.
You could download the extended code at here .
This is not a standard distribution (as you could find in Sourceforge), so compilation is a little bit different. Try the following:
> ./autogen.sh ; ./autogen.sh
> make
The compilation takes about 5 minutes. At the end you would find things you need at ./src/programs/ . You will need decode, decode_anytopo and lm_convert
Make sure you also test the code. You should have line count 81 if you grep PASS from the output of the test log
You could downlaod the extension at here .
You **need** to compile the code in 32 bit mode. I am not very good in make and configure. So I just give you one good hack at here.
a, Go to ./src/Makefile.in, change "CFLAGS := @CFLAGS@" to "CFLAGS := @CFLAGS@ -DTHIRTYTWOBITS"
Then do standard dance: configure, make
b, make sure you test the code by make test32. You should get 22 counts at this case.
To create the LM, just following the standard procedure as you could find in version 2
To use the LM, you could just feed the LM into decode and decode_anytopo. However, we strongly recommed you to use lm_convert to first convert the model to DMP format.
By default, if the number of words in the lm in ARPA lm format has more than 65536 words, lm_convert will automatically choose to use a binary layout with 32 bits data structure. If you want to enforce this feature, just use format-type DMP32 in the output format.
Binaries layouts in both sphinx 3.x and CMU-Cambridge Language Modeling Toolkit are the most difficult issue in the development. Here are some hints of how to use the tools without getting hurt.
As a final note, the code at this point (20060415) is still not very well tested, do kindly inform me about any positive/negative results you have. At a certain point, we will also incorporate the code into the canonical Sphinx. I wish you have fun when you use this ltoolkit.