Creating an Initial Environment

Before you can start to do training or recognition with Janus you need to set up an initial environment of architecture description files and other necessary files. If you had a look at the "what does Janus need" page you know what kinds of environment files are needed. You have to provide all but the architecture description files and the weight files. The weight files will of course be generated when Janus is training, and if you want just a standard architecture, then you can use ready-to-run Tcl-scripts that will create all the needed description files as follows.

The default architecture is a single stream Gaussian mixture system, with three different states per allophone, it is initially fully continuous and context-independent. The silence phone is modeled with a single state. Usually, noise phones are also modelled with a single state because we don't expect any learnable temporal structure in noises.

The Janus-Made Description Files

All of the Janus description files are human-readable and human-editable. Generally, it is possible to create them using a text editor or some UNIX text tools (sed, awk, etc.), but, unless you need some very special kind of recognizer it is possible to let Janus create the needed description files itself, by only giving it a few necessary details.

At this point we will not discuss, the format or the purpose or the usage of the description files. If you are interested in the theoretic background of e.g. decision trees or Gaussian mixtures, then read some good articles or books about these topics. And if you are interested in implementation or usage details, refer to the Janus manual. On this page we will only describe how the get a first initial environment.

For an initial environment we want Janus to create a codebook description file, a distribution description file, and a distribution tree. In the simple context independent case this means that we have the same number of codebooks as the number of distributions, and the same number of leaf nodes in the distribution tree. We have three root nodes in the tree, one for the beginning allophone segments, one for the middle segments, and one for the ending segments. So, imagine we have a phoneme set of three phonemes, one of which is a silence phone: A, B and S. Then we'd have seven distinct acoustic models (we often call them incorrectly senones), namely three A segments: A-b, A-m, and A-e, three B segements: B-b, B-m, and B-e, and one S segment: S-m. Such that our codebook description file would have a codebook for each of these seven models, the distribution description file would have seven distributions, each defined over its corresponding codebook, and the distribution tree would look like this:

You can see that there are three different nodes types in the tree, the root nodes, have no real question, and the same successor for all theoretic answers to the dummy question. Then come nodes with questions, each of which is asking whether the phone at position 0 (i.e. the central, and in the context-independent case, the only phone) is an A or a B or an S. The yes-successors are always leaf nodes without questions and without successors, they only hold the name of the model to be used. The no-successors continue to walk through the phoneme set until there is no phoneme left, then there is no no-sucessor defined. It is pretty easy to build such a tree incrementally, by adding single models to it. Should we decide that we also want a C-b model then we could simply traverse the tree starting at node Root-b, following the no-successors until there is no more no-successor. There we can insert a new question node "O=C?" whose yes-successor would be the node pointing to model C-b, and whose no-successor would be empty. and thus ready for another model to be added.

The procedure to create an initial environment is as simple as follows:

For each element in a list of models, add the model to the distribution tree and create a same-named distirbution and a codebook. After all models have been added, all distributions and codebooks created, write the description files.

You can have a look at the initCI and the addModel scripts for more details about this procedure.

The Self-Made Description Files

If you think a bit about how Tcl and Janus work, you will easily find out, that it is very well possible to build an initial system without having to use any description files. These files can be created on the fly by Janus, some files might be so small that their contents can be defined as string-constants in the script that is executed by Janus. So why do we still use such files. Well, there are three (at least) reasons: one is speed, loading a saved file is in most cases significantly faster than creating the information on the fly, and the other is uniformity. When a system has developed to a higly complicated context-dependent recognizer, the description files are still human-readable but not very easily-readable. They tend to become very large, such that they can't seriously be defined as constants in scripts, and if you want to use the same kind of scripts for simple as well as for complex systems, the best way is to use description files for all of them.

The usually manually created description files are the transition model set, the topology set, the topology tree, a list of phonemes, and a feature description file.

In most cases these files can be take from some other existing system. The do-it-yourself pages to this topic will show you some examples.

Want Something Different

If you want something different, say e.g. multiple streams from the start, or context dependent models from the start, or uncommon naming of distributions or codebooks or tree nodes, you should take the initCI and the addModel scripts and modify them to suit your needs.