
This is so small it isn't documentation, but
is a reminder about what to do when training

See 
    http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html
    http://www.speech.cs.cmu.edu/sphinxman/s3manual.html
for much more details

This is a quick note for how the scripts to train S3 models
and convert them for use with S2.

Got to a new directory where you have the space
   mkdir time
   cd time
Set up the directory structure, scripts and config files
   $SPHINXTRAINDIR/scripts_pl/setup_SphinxTrain time
The main config file is put into etc/sphinx_train.cfg

Put you waveform files in wav/

Create the following files
   etc/time.fileids  
       List of file ids for each wav file e.g.
            time0001
            time0002
            ...
   etc/time.phone
       The phones in the phone set
            SIL
            AA
            AE
            ..
   etc/time.transcription
       Word transcription of each file, must end in (FILEID), these
       are probably upper case words, but can include filler words too
           <s> THE TIME IS NOW ... </s> (time0001)
   etc/time.filler
       List of filler words and their pronunciation (using the phones) e.g
           <s> SIL
           </s> SIL
           <sil> SIL
           /NOISE/ +NOISE+

Create a dictionary with (only works for US English)

   bin/make_dict etc/time.transcription

WIll create etc/word.known etc/word.unknown files,. check them
once you are happy, 

   mv etc/word.known etc/time.dic

Make the melcep feature files

   bin/make_feats etc/time.fileids

Now we can start on the basic perl scripts, ther
results will be put in perl_time.html, which you can view as things
progress.  Note the scripts aren't guaranteed, and problems do occur, 
though often the error message is actually indicative of the error.

There a number of larger choices in building models, one is to
build continous models or semi-continuous.  Only semi-continous
can be used by Sphinx2 

Doesn't do enough checking, but may help
   ./scripts_pl/00.verify/verify_all.pl 
Can take several minutes
   ./scripts_pl/01.vector_quantize/slave.VQ.pl
Initial BW training
   ./scripts_pl/02.ci_schmm/slave_convg.pl
   ./scripts_pl/03.makeuntiedmdef/make_untied_mdef.pl
   ./scripts_pl/04.cd_schmm_untied/slave_convg.pl
   ./scripts_pl/05.buildtrees/make_questions.pl
   ./scripts_pl/05.buildtrees/slave.treebuilder.pl
   ./scripts_pl/06.prunetree/slave.state-tie-er.pl
   ./scripts_pl/07.cd-schmm/slave_convg.pl
   ./scripts_pl/08.deleted-interpolation/deleted_interpolation.pl
     
===========================================================================
Chart of training process 


                         Training chart for the
                         sphinx2  trainer
                        =========================
                                OBSOLETE
               (The sphinx2 trainer is no longer used in CMU)
                We build sphinx3 semi-continuous models and convert them


                         Training chart for the
                         sphinx3  trainer
                        =========================
                             type of model
                                   |
                    ----------------------------------
                    |                                |
               continuous                      semi-continuous
                    |                                |
                    |                         vector-quantization
                    |                                |
                    ----------------------------------
                                   |...make ci mdef
                                   |...flat_initialize CI models
                             training CI models
                                   |...make cd untied mdef
                                   |...initialize
                                   |
                             training CD untied models
                                   |
                                   |
                                   |
                             decision tree building
                                   |...prune trees
                                   |...tie states
                                   |...make cd tied mdef
                             training CD tied models
                                   |
                                   |
recursive            ----------------------------------
gaussian splitting.. |                                |
                 continuous models              semi-continuous models
                     |                                |
                     |                                | 
                -----------                           |
                |         |                    deleted interpolation
          decode with   ADAPT                         |
          sphinx3                                     |---ADAPT
          decoder                                     |     |
                                                ----------------
                          make cd tied mdef ... | .............|
                          with decode dict and  |           convert to
                          pruned trees          |           sphinx2
                                         decode with           |
                                         sphinx3               |
                                         decoder               |     
                                                               |
                                                            decode with
                                                            sphinx2
                                                            decoder
                                                  (currently opensource
                                                   and restricted to
                                                   working with sampling
                                                   rates 8khz and 16khz.
                                                   Once the s3 trainer is
                                                   released, this will have
                                                   to change to allow
                                                   people who train with
                                                   different sampling rates
                                                   to use this decoder)

  

======================================================================

