next up previous
Next: STRESS ASSIGNMENT Up: Issues in Building General Previous: LETTER-PHONE ALIGNMENT


Once an alignment is found we can train a phone prediction model. In our work we have used decision tree technology [3] as we feel this is simple and produces compact models. We also feel that other learning techniques would not produce significantly better results.

For each letter in the alphabet of the language we trained a CART tree given the letter context (three either side) to predict epsilon, phone or double phone from the aligned data. One can build a single tree without any significant difference in the accuracy but building separate trees is faster and allows for parallelization.

We split the data into train and test data by removing every tenth word from the lexicon. This means that the data set contains only one occurrence of each word and hence word frequency is ignored. Another factor is that as these lexicons usually contain many morphological variations, it is likely there will be a similar word or words in the training set.

We removed short words (under four letters) from the training and test sets as these words are typically function words which in general may have non-standard pronunciations, or are abbreviations (e.g. ``aaa'' as /t r ih p ah l ey/) which have little or no relationship with their pronunciation. Also, where part of speech information was available, we removed all non-content words. The reasoning is that unknown words are typically not the most common words and in general unknown words will have more standard pronunciations rather than idiosyncratic ones.

We have so far tried this technique on four lexicons, Oxford Advanced Learners Dictionary of Contemporary English (OALD) (British English) [10], CMUDICT (US English) [4], BRULEX (French) [5] and the German Celex Lexicon [1].

Lexicon Letters Words
OALD 95.80% 74.56%
CMUDICT 91.99% 57.80%
BRULEX 99.00% 93.03%
DE-CELEX 98.79% 89.38%
CMUDICT, although also English, does not get as good results compared with OALD as it contains many more ``foreign'' words, particularly names, which are much harder to predict without any higher level information (such as ethnic origin).

The above results are the best results achieved after testing various parameters in the CART building process. Particularly we varied the ``stop'' value which specifies the minimum number of examples necessary in the training set before a question is hypothesized to distinguish the group. Normally the smaller the stop value the more over-trained the models may become. However the following table shows the results for OALD, tested on held out data, while changing the stop value

Stop Letters Words Size
8 92.89% 59.63% 9884
6 93.41% 61.65% 12782
5 93.70% 63.15% 14968
4 94.06% 65.17% 17948
3 94.36% 67.19% 22912
2 94.86% 69.36% 30368
1 95.80% 74.56% 39500
As the stop value is reduced, the size of the model increases. The model size is the total number of questions and leaf nodes in the generated CART trees. However it appears that more finely tuned data is always better, such that even with stop value 1 the model is not over-trained.

Note that comparisons with other LTS training techniques are not that easy. As when the train/test sets differ, and when the domains differ there can be no direct comparisons. For example if we remove proper names from the OALD and train and test on the remainder our word correct score goes up to 80%. However the above results compare favorably with other systems using similar data sets (e.g. [8]).

next up previous
Next: STRESS ASSIGNMENT Up: Issues in Building General Previous: LETTER-PHONE ALIGNMENT
Alan W Black