Sphinx-3 Decoders Codewalk

Mosur K. Ravishankar (aka Ravi Mosur)
Sphinx Speech Group
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
rkm@cs.cmu.edu
03 Mar 2000

What's Around and Where
s3.2
s3 (Original)
s2 FBS8

What's Around and Where

Name	Remarks
CVSROOT	Location: `/net/alf20/usr1/rkm/cvsroot` Not everything listed below is under CVS, however
s3.2	Location: `$CVSROOT/s3.2` Fast Sphinx-3 decoder using lextree organization: 5-10x real time speed on large vocabulary tasks Continuous density acoustic models only Batch-Mode operation only `gausubvq`: Sub-vector clustered acoustic model building Needed for fast acoustic model evaluation Documentation: `s3.2/doc/s3.html`: User manual `s3.2/doc/s3-2.ppt`: Decoder tutorial
libutil	Location: `$CVSROOT/libutil` Miscellaneous utilities needed by s3.2 (some of them by Eric Thayer and Paul Placeway): Platform-independent data types Command-line arguments parsing Hash tables Heap structures (for sorting) Memory allocation CPU usage profiling Error reporting Not Sphinx specific
s3	Location: `/net/alf20/usr2/rkm/s3` Original Sphinx-3 decoder Slow; 50-100x real time speed on large vocabulary tasks Any kind of acoustic model (discrete, semi-continuous, continuous, others) Major applications: `s3decode` and `s3decode-anytopo`: Speech-to-text Decoding `s3align`: Forced alignment `s3allphone`: Allphone decoding `s3astar`: A* search, nbest generation `s3dag`: Shortest-path search Other utilities: `stseg-read`: State-segmentation binary file reader `sen2s2`: Sphinx-II "sendump" file creation from Sphinx-3 acoustic model Documentation: `s3/doc/s3.html`: User manual; needs to be updated
s2 (fbs8)	Location: Open source (search for "CMU Sphinx") Sphinx-II decoder Real-time operation Semi-continuous, Sphinx-II acoustic models only (Sphinx-II format) User applications support: Compiled into a library with a straightforward API for building speech-enabled applications Continuous-listening support Dynamic language model loading and switching Several test applications: Basic dictation with and without "push-to-talk" Basic audio recording and playback Audio segmentation using the continuous listener Additional recognition modes: Forced alignment Allphone decoding A* search, nbest generation Shortest-path search
lm3g2dmp	Location: `$CVSROOT/lm3g2dmp` Conversion from "arpabo" format language model file to binary ("dump") format used by all decoders
SphinxOCX	Location: `//ppro7/c/users/rkm/SphinxOCX` OCX wrapper for the fbs8 libraries Windows NT only
data	Location: `/net/alf20/usr/rkm/SHARED` An attempt at collecting all available data and models under one roof (not entirely successful): Cepstrum files Control files HMMs (acoustic models) Language models Dictionaries Implemented mainly via symbolic links, rather than physical copies

s3.2

See s3.2/doc/s3.html.

 s3types.h
 logs3.c
 logs3.h
 bio.c
 bio.h
 vector.c
 vector.h
 corpus.c
 corpus.h
 agc.c
 agc.h
 cmn.c
 cmn.h
 feat.c
 feat.h
 dict.c
 dict.h
 lm.c
 lm.h
 lmtest.c
 mdef.c
 mdef.h
 mdeftest.c
 cont_mgau.c
 cont_mgau.h
 subvq.c
 subvq.h
 svqtest.c
 fillpen.c
 fillpen.h
 tmat.c
 tmat.h
 wid.c
 wid.h
 dict2pid.c
 dict2pid.h
 dict2pidtest.c
 kbcore.c
 kbcore.h
 hyp.h

 main.c
 ascr.c
 ascr.h
 beam.c
 beam.h
 hmm.c
 hmm.h
 kb.h
 lextree.c
 lextree.h
 vithist.c
 vithist.h

 gausubvq.c
 gautest.c

s3 (Original)

See s3/doc/s3.html (needs to be updated), and s3/tests.

s2 FBS8

Open source (search for CMU Sphinx)

Ravishankar Mosur

Last modified: Wed Mar 8 13:25:39 EST 2000

Sphinx-3 Decoders Codewalk

Contents

What's Around and Where

s3.2

s3 (Original)

s2 FBS8