The Senones Module - Overview

This module implements the object classes SenoneSet and Senone. A single senone can not be created from Tcl (wouldn't make much sense anyway). A senone set uses generic lists to maintain its senones. It it possible to add or delete senones during runtime. But you better not do this, unless you really know what you are doing, because many other modules rely on the SenoneSet object which they are using, to remain unchanged. Besides maintaining a set of senones, there are other quite important tasks performed by the senones module, namely the computation of HMM emission probabilities, the accumulation of training data, and the updating (optimization) of a system's acoustic parameters. The senones module hides how these tasks are performed from the rest of the system. It is also the senones module's job to make sure that the scores for the current utterance are computable (i.e. inform the feature module to make the needed features). Since we can think of many very different ways to compute HMM emission probabilities (e.g. Gaussian mixtures, neural nets, hybrids, etc.), the senone module refers to a generic score computer. Whenever a new way of computing scores is to be added to JANUS it must conform with the definition of a score computer.

Please note that we are misusing the therm "senone". When we talk about a senone, we don't always mean what Mei-Yuh Hwang meant in her PhD thesis. Originally the term "senone" meant a generalized subtriphone. In JANUS we call all atomic acoustic units senones, even if they are not generalized (e.g. in context independent systems or in not-clustered context dependent systems). For us, a senone is the smallest speech unit, for wich we can compute HMM emission probabilities.

What Is a Senone?

A senone is modeled by a set of streams and their corresponding stream weights. I.e. the HMM emission probability for a senone and a given frame is the weighted sum of outputs from any number of streams. (If you consider the output of a stream to be a log probability, then a weighted sum of logprobs with multiplicative weights becomes a weighted product of probabilities with exponential weights.) The internal representation of a senone, thus, consists of an array of stream identifiers and an equally sized array of stream weights, and an equally sized array of class indices. When a score is computed, the class index of a stream is given to the stream's score computer. The return value, then, is what we called the stream's output. When we are using Gaussian mixtures, then a class index would be the index of a distribution. If we were using a neural net, then this index would probably be the index of an output node of the net, or some subnet identifier.

Usage of the Senones Module

This paragraph does not give you details about the Tcl methods and their syntax. Instead it should explain what can be done with senones and what this is good for. Look at the links at the end of this page to get more details about Tcl, syntax, scripts etc.

Creating a SenoneSet

Usually you will create a SenoneSet object from Tcl somewhere in your training or testing script. Then you will read a senones description file to fill the contents of the SenoneSet object. After that it should be usable. When creating a SenoneSet, you must also supply a score computer for every stream. In the case of Gaussian mixtures you would use a DistribSet object as a score computer. Of course, you can use the same score computer for different streams.

Preparing for Score Computation

It is the senones module's job to prepare the system for a new utterance. Although it is possible for the user to inform the feature module manually, about a new utterance, in order to trigger the computation and creation (preprocessing) of the needed features, this is not recommended for two reasons: a) The user must know how to trigger the feature computation, and b) there might be many FeatureSet objects, which could have been exchanged by each other, and the user does neither know which FeatureSet is to be used, nor wich Feature subobjects are needed. Making the feature module compute not needed features can not only waste CPU time and memory, but also cause unwanted behaviour. You can do it nevertheless, if you want to do something special, like if you want the feature module to ignore the feature-file information from the task's database, and you want to force it to use some other file. Other similar situations do exist. Especially when you want to play around with preprocessing thechniques or just look at some features, then you will prefer triggering the feature computation yourself, manually.
Therefore there is a function (snsFeatEval()) and a method (featEval) in the senones module to which you can give an utterance's entry from the task database. This entry will be passed to all the FeatueSet objects that could be involved in score computation, together with a list of their respective Feature subobjects that will be accessed during score computation. After this, HMM emission probabilities can be computed. So, calling snsFeatEval can be interpreted as: "please prepare to compute scores for the utterance with the given description".

Accumulating Training Data

The accumulation of training data is done by calling a SenoneSet method (accu) giving the path that was created by a forced alignment (or from labels) as the argument. The senones module will then call all of the used score computers for each of the path's cells. It doesn't care about what the score computers will actually do. A Gaussian mixture score computer will accumulate counts and sums of squares and means etc., a neural net will compute an error function and accumulate backprop data. The accumulation is proportional to a training factor. This factor itself is a product of three factors: a) a user-supplied training factor (+1.0 for regular training, -1.0 for negative or corrective training), b) the gamma-value from the alignment path (which is always 1.0 in Viterbi paths), and c) the stream weight (which is effective only if the same class of a score computer is used in different streams).

Updating Acoustic Parameters

Well, that's easy. Just call the SenoneSet's update method. The senones module will then inform all involved score computers about the update command. After that it doesn't care what the score computers are actually doing. All this hiding of score computer functions by the senones module is there to allow uniform training scripts for any kind of score computer. The JANUS designers' goal was to be able to plug in or out any score computer at will, without having to modify anything else in the system (well, yes, you still have to inform the senones module about them).

Definition of a Score Computer

A score computer doesn't actually exist. There is no module or object with such a name. There might be distributions, or backprop nets, or whatever, but not a mere score computer. Any object class which should be usable as a score computer must be defined as a structure with a number of exactly specified fields from the top. These fields contain function pointers for computing a score, accumulating training data, updating parameters, returning a list of used features, returning the number of currently available frames, and others. Have a look at the C source code documentation for further details.

Further information about the module: