Email address protected by JavaScript.
Please enable JavaScript to contact me.

Sphinx Decoders

Mosur K. Ravishankar (aka Ravi Mosur)
Sphinx Speech Group
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213

What's Around and Where

In the table below, you will find information about the several versions of the Sphinx decoder. Descriptions in gray are older versions, and are not being maintained. Follow the links to find instructions on how to download the packages.

Name Remarks
Sphinx-3.5 (Live-mode APIs, Speaker Adaptation and Code Convergence)
  • Location: Open source, release available.
  • Based on Sphinx-3.4 and Sphinx 3.0.
  • Live-mode APIs are implemented an thoroughly tested
  • Speaker adaptation based on single regression class is implemented.
  • Four tools of s3.0 are now incorporated in Sphinx 3.x
    • align (s3align in 3.0): Forced alignment
    • allphone (s3allphone in 3.0): Allphone decoding
    • astar (s3astar in 3.0): A* search, nbest generation
    • dag (s3dag in 3.0): Shortest-path search
  • The feature extraction libraries of sphinx 3 is now EXACTLY the same as SphinxTrain
  • Corresponding changes in SphinxTrian
  • Two new tools are introduced:
    • mllr_solve : compute the regression matrix based on MLLR algorithm.
    • mllr_transform : given a regression matrix, this program converts the mean based on linear transformation.
  • The command line interface of all SphinxTrain's tools are now unified.
Sphinx-3.4 (Fast GMM computation)
  • Location: Open source, module archive_s3/s3.4 in cvs tree.
  • Based on Sphinx-3.3, Fast GMM computation is implemented
    • frame down-sampling
    • CI-based GMM selection
    • VQ-based and SVQ-based Gaussian Selection
    • Support of SVQ with arbitrary number of sub-vectors.
  • Phoneme look-ahead
  • Support class-based LM and multiple LM
Sphinx-3.3 (fast decoder)
  • Location: Open source, module archive_s3/s3.3 in cvs tree.
  • Fast Sphinx-3 decoder using lextree organization:
    • 5-10x real time speed on large vocabulary tasks (measured at 1999)
    • Continuous density acoustic models only
    • Batch-Mode or live operation
  • Other tools
    • gausubvq: Sub-vector clustered acoustic model building, needed for fast acoustic model evaluation
Sphinx-3.2
  • Location: Open source, module archive_s3/s3.2 in cvs tree.
  • Same features as s3.3, but capable of batch-mode operation only.
Sphinx-3 (slow decoder)
  • Location: Open source, module archive_s3/s3 in cvs tree.
  • Original Sphinx-3 decoder
  • Slow; 50-100x real time speed on large vocabulary tasks (measured at 1999)
  • Any kind of acoustic model (discrete, semi-continuous, continuous, others)
  • Major applications:
    • s3decode and s3decode-anytopo: Speech-to-text Decoding
    • s3align: Forced alignment
    • s3allphone: Allphone decoding
    • s3astar: A* search, nbest generation
    • s3dag: Shortest-path search
  • Other utilities:
    • stseg-read: State-segmentation binary file reader
    • sen2s2: Sphinx-II "sendump" file creation from Sphinx-3 acoustic model
Sphinx-2 (fbs8)
  • Location: Open source, release available.
  • Sphinx-II decoder
  • Real-time operation
  • Semi-continuous, Sphinx-II acoustic models only (Sphinx-II format)
  • User applications support:
    • Compiled into a library with a straightforward API for building speech-enabled applications
    • Continuous-listening support
    • Dynamic language model loading and switching
  • Several test applications:
    • Basic dictation with and without "push-to-talk"
    • Basic audio recording and playback
    • Audio segmentation using the continuous listener
  • Additional recognition modes:
    • Forced alignment
    • Allphone decoding
    • A* search, nbest generation
    • Shortest-path search
 
Maintained by and
Last modified: Sun Sep 12 12:30:01 EDT 2004