How to use normalize text
-------------------------

1. filter the input text using one of the filter_* programs
This transforms each text into a bare-bones sgml representation
based on paragraph units.

2. run normalize
Does the basic speaking-out transformation and sentence segmentation.
Verbalized pronunciation is assumed.  We try to follow the style used
by Doug Paul, though the details of this are based essentially on 
examination of his processed corpus, not on actual specifications.
There are differences in the two translators, though on the whole the
result is comparable; we put effort into getting somewhat different 
things right and the careful reader will note systematic differences
between the two texts.  Nevertheless the two sets of results appear to 
be compatible, as might be inferred from our ongoing language-modeling 
work.  A vocabulary file containing strings that should be treated as
integral words can be optionally specified; words in all caps not on
this list will be spelled out.

3. remove verbalized punctuation (using strip_vp).
This of course is optional.

Typical invocation:

filter_sgml < input_text_file | normalize -v cmudict_vocab > output_file


Notes:

c translations of mitalk code: /afs/cs/user/alex/speak/mitalk/src/
