t2p: Text-to-Phoneme Converter Buildert2p? t2p is a
public domain package in
Perl
for building grapheme-to-phoneme rules from
pronunciation dictionaries. In other words,
it builds letter-to-sound rules for
pronouncing words given a set of example
pronunciations, like the
CMU
Pronouncing Dictionary.
s/($text)/speech $1/eg;
t2p takes in a pronunciation dictionary, such
as the CMU
Pronouncing Dictionary, and builds Decision Trees
that model the words.
Here's what the CMU dictionary looks like: (from 0.6d)
...
LEX L EH1 K S
LEXICAL L EH1 K S IH0 K AH0 L
LEXICOGRAPHER L EH2 K S IH0 K AA1 G R AH0 F ER0
LEXICON L EH1 K S IH0 K AA2 N
LEXIE L EH1 K S IY0
LEXINE L EH1 K S AY0 N
LEXINGTON L EH1 K S IH0 NG T AH0 N
LEXIS L EH1 K S IH2 S
LEXMARK L EH1 K S M AA2 R K
LEXUS L EH1 K S AH0 S
...
It's a list of words and the associated phonemes, in order. What
t2p does
is take all words together and find the first rule that makes the best
predictive split of the data, then keeps doing that on subsets until
it makes a tree of decisions.
The resulting Perl code looks something like
if ($att{'L'} eq 'H') {
if ($att{'L1'} eq 'A') {
if ($att{'R1'} eq 'A') {
if ($att{'L3'} eq 'G') {
if ($att{'R3'} eq '-') {
return 'HH';
}
return '_';
}
if ($att{'L3'} eq 'H') {
return '_'; # unique at depth 4
}
if ($att{'L3'} eq 'U') {
if ($att{'L2'} eq 'J') {
return 'AE'; # unique at depth 5
}
return 'HH';
}
where L is the letter itself,
L1 is the first letter to the left,
R1 is the first to the right, and
so on. The return value is the output of the
transducer for each letter, given context. Thus,
it's a context-sensetive rewrite system for a
grammar of limited depth.
In collaboration with Alan Black and Vincent Pagel, we have made a number of these packages. Results of similar form are used in Festival, a free, source-available system from the University of Edinburgh, and the MBRDICO, MBROLA dictionary compression. the MBRDICO code produces much smaller results in C; this packages is a Perl implementation used to build the base.
t2p Perl code is available as a tarred, gzipped file
from
http://www.cs.cmu.edu/~lenzo/t2p/code .
s/($text)/speech $1/eg; in
The Perl
Journal number 12, Winter, 1998.