The default architecture is a single stream Gaussian mixture system, with three different states per allophone, it is initially fully continuous and context-independent. The silence phone is modeled with a single state. Usually, noise phones are also modelled with a single state because we don't expect any learnable temporal structure in noises.
At this point we will not discuss, the format or the purpose or the usage of the description files. If you are interested in the theoretic background of e.g. decision trees or Gaussian mixtures, then read some good articles or books about these topics. And if you are interested in implementation or usage details, refer to the Janus manual. On this page we will only describe how the get a first initial environment.
For an initial environment we want Janus to create a codebook description file, a distribution description file, and a distribution tree. In the simple context independent case this means that we have the same number of codebooks as the number of distributions, and the same number of leaf nodes in the distribution tree. We have three root nodes in the tree, one for the beginning allophone segments, one for the middle segments, and one for the ending segments. So, imagine we have a phoneme set of three phonemes, one of which is a silence phone: A, B and S. Then we'd have seven distinct acoustic models (we often call them incorrectly senones), namely three A segments: A-b, A-m, and A-e, three B segements: B-b, B-m, and B-e, and one S segment: S-m. Such that our codebook description file would have a codebook for each of these seven models, the distribution description file would have seven distributions, each defined over its corresponding codebook, and the distribution tree would look like this:
You can see that there are three different nodes types in the tree, the root nodes, have no real question, and the same successor for all theoretic answers to the dummy question. Then come nodes with questions, each of which is asking whether the phone at position 0 (i.e. the central, and in the context-independent case, the only phone) is an A or a B or an S. The yes-successors are always leaf nodes without questions and without successors, they only hold the name of the model to be used. The no-successors continue to walk through the phoneme set until there is no phoneme left, then there is no no-sucessor defined. It is pretty easy to build such a tree incrementally, by adding single models to it. Should we decide that we also want a C-b model then we could simply traverse the tree starting at node Root-b, following the no-successors until there is no more no-successor. There we can insert a new question node "O=C?" whose yes-successor would be the node pointing to model C-b, and whose no-successor would be empty. and thus ready for another model to be added.
The procedure to create an initial environment is as simple as follows:
For each element in a list of models, add the model to the distribution tree and create a same-named distirbution and a codebook. After all models have been added, all distributions and codebooks created, write the description files.
You can have a look at the initCI and the addModel scripts for more details about this procedure.
The usually manually created description files are the transition model set, the topology set, the topology tree, a list of phonemes, and a feature description file.
In most cases these files can be take from some other existing system. The do-it-yourself pages to this topic will show you some examples.