Sphinx Decoders

Mosur K. Ravishankar (aka Ravi Mosur)
Sphinx Speech Group
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

What's Around and Where

In the table below, you will find information about the several versions of the Sphinx decoder. Descriptions in gray are older versions, and are not being maintained. Follow the links to find instructions on how to download the packages.

Name	Remarks
Sphinx-3.5 (Live-mode APIs, Speaker Adaptation and Code Convergence)	Location: Open source, release available. Based on Sphinx-3.4 and Sphinx 3.0. Live-mode APIs are implemented an thoroughly tested Speaker adaptation based on single regression class is implemented. Four tools of s3.0 are now incorporated in Sphinx 3.x `align (s3align in 3.0)`: Forced alignment `allphone (s3allphone in 3.0)`: Allphone decoding `astar (s3astar in 3.0)`: A* search, nbest generation `dag (s3dag in 3.0)`: Shortest-path search The feature extraction libraries of sphinx 3 is now EXACTLY the same as SphinxTrain Corresponding changes in SphinxTrian Two new tools are introduced: `mllr_solve` : compute the regression matrix based on MLLR algorithm. `mllr_transform` : given a regression matrix, this program converts the mean based on linear transformation. The command line interface of all SphinxTrain's tools are now unified.
Sphinx-3.4 (Fast GMM computation)	Location: Open source, module archive_s3/s3.4 in cvs tree. Based on Sphinx-3.3, Fast GMM computation is implemented frame down-sampling CI-based GMM selection VQ-based and SVQ-based Gaussian Selection Support of SVQ with arbitrary number of sub-vectors. Phoneme look-ahead Support class-based LM and multiple LM
Sphinx-3.3 (fast decoder)	Location: Open source, module archive_s3/s3.3 in cvs tree. Fast Sphinx-3 decoder using lextree organization: 5-10x real time speed on large vocabulary tasks (measured at 1999) Continuous density acoustic models only Batch-Mode or live operation Other tools `gausubvq`: Sub-vector clustered acoustic model building, needed for fast acoustic model evaluation
Sphinx-3.2	Location: Open source, module archive_s3/s3.2 in cvs tree. Same features as s3.3, but capable of batch-mode operation only.
Sphinx-3 (slow decoder)	Location: Open source, module archive_s3/s3 in cvs tree. Original Sphinx-3 decoder Slow; 50-100x real time speed on large vocabulary tasks (measured at 1999) Any kind of acoustic model (discrete, semi-continuous, continuous, others) Major applications: `s3decode` and `s3decode-anytopo`: Speech-to-text Decoding `s3align`: Forced alignment `s3allphone`: Allphone decoding `s3astar`: A* search, nbest generation `s3dag`: Shortest-path search Other utilities: `stseg-read`: State-segmentation binary file reader `sen2s2`: Sphinx-II "sendump" file creation from Sphinx-3 acoustic model
Sphinx-2 (fbs8)	Location: Open source, release available. Sphinx-II decoder Real-time operation Semi-continuous, Sphinx-II acoustic models only (Sphinx-II format) User applications support: Compiled into a library with a straightforward API for building speech-enabled applications Continuous-listening support Dynamic language model loading and switching Several test applications: Basic dictation with and without "push-to-talk" Basic audio recording and playback Audio segmentation using the continuous listener Additional recognition modes: Forced alignment Allphone decoding A* search, nbest generation Shortest-path search

Maintained by and

Last modified: Sun Sep 12 12:30:01 EDT 2004