Sphinx-3 Decoders Codewalk

Mosur K. Ravishankar (aka Ravi Mosur)
Sphinx Speech Group
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
rkm@cs.cmu.edu
03 Mar 2000

Contents

What's Around and Where

NameRemarks
CVSROOT
s3.2
  • Location: $CVSROOT/s3.2
  • Fast Sphinx-3 decoder using lextree organization:
    • 5-10x real time speed on large vocabulary tasks
    • Continuous density acoustic models only
    • Batch-Mode operation only
  • gausubvq: Sub-vector clustered acoustic model building
    • Needed for fast acoustic model evaluation
  • Documentation:
libutil
  • Location: $CVSROOT/libutil
  • Miscellaneous utilities needed by s3.2 (some of them by Eric Thayer and Paul Placeway):
    • Platform-independent data types
    • Command-line arguments parsing
    • Hash tables
    • Heap structures (for sorting)
    • Memory allocation
    • CPU usage profiling
    • Error reporting
  • Not Sphinx specific
s3
  • Location: /net/alf20/usr2/rkm/s3
  • Original Sphinx-3 decoder
  • Slow; 50-100x real time speed on large vocabulary tasks
  • Any kind of acoustic model (discrete, semi-continuous, continuous, others)
  • Major applications:
    • s3decode and s3decode-anytopo: Speech-to-text Decoding
    • s3align: Forced alignment
    • s3allphone: Allphone decoding
    • s3astar: A* search, nbest generation
    • s3dag: Shortest-path search
  • Other utilities:
    • stseg-read: State-segmentation binary file reader
    • sen2s2: Sphinx-II "sendump" file creation from Sphinx-3 acoustic model
  • Documentation:
s2 (fbs8)
  • Location: Open source (search for "CMU Sphinx")
  • Sphinx-II decoder
  • Real-time operation
  • Semi-continuous, Sphinx-II acoustic models only (Sphinx-II format)
  • User applications support:
    • Compiled into a library with a straightforward API for building speech-enabled applications
    • Continuous-listening support
    • Dynamic language model loading and switching
  • Several test applications:
    • Basic dictation with and without "push-to-talk"
    • Basic audio recording and playback
    • Audio segmentation using the continuous listener
  • Additional recognition modes:
    • Forced alignment
    • Allphone decoding
    • A* search, nbest generation
    • Shortest-path search
lm3g2dmp
  • Location: $CVSROOT/lm3g2dmp
  • Conversion from "arpabo" format language model file to binary ("dump") format used by all decoders
SphinxOCX
data
  • Location: /net/alf20/usr/rkm/SHARED
  • An attempt at collecting all available data and models under one roof (not entirely successful):
    • Cepstrum files
    • Control files
    • HMMs (acoustic models)
    • Language models
    • Dictionaries
  • Implemented mainly via symbolic links, rather than physical copies

¤

s3.2

See s3.2/doc/s3.html.

 s3types.h
 logs3.c
 logs3.h
 bio.c
 bio.h
 vector.c
 vector.h
 corpus.c
 corpus.h
 agc.c
 agc.h
 cmn.c
 cmn.h
 feat.c
 feat.h
 dict.c
 dict.h
 lm.c
 lm.h
 lmtest.c
 mdef.c
 mdef.h
 mdeftest.c
 cont_mgau.c
 cont_mgau.h
 subvq.c
 subvq.h
 svqtest.c
 fillpen.c
 fillpen.h
 tmat.c
 tmat.h
 wid.c
 wid.h
 dict2pid.c
 dict2pid.h
 dict2pidtest.c
 kbcore.c
 kbcore.h
 hyp.h

 main.c
 ascr.c
 ascr.h
 beam.c
 beam.h
 hmm.c
 hmm.h
 kb.h
 lextree.c
 lextree.h
 vithist.c
 vithist.h

 gausubvq.c
 gautest.c
¤

s3 (Original)

See s3/doc/s3.html (needs to be updated), and s3/tests.

¤

s2 FBS8

Open source (search for CMU Sphinx)

¤

Ravishankar Mosur
Last modified: Wed Mar 8 13:25:39 EST 2000