Newsgroups: comp.speech
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!zombie.ncsc.mil!news.duke.edu!news-feed-1.peachnet.edu!gatech!swrinde!ihnp4.ucsd.edu!nmt.edu!cww
From: cww@nmt.edu (Colin Wightman)
Subject: Re: Word timing marks for continuous speech?
Message-ID: <1995Feb28.160829.25787@nmt.edu>
Sender: news@nmt.edu
Nntp-Posting-Host: kiwi.nmt.edu
Organization: New Mexico Tech EE/Physics Department
References: <87220001@hpcc01.corp.hp.com> <meb.203.0037B078@teleport.com>
Date: Tue, 28 Feb 1995 16:08:29 GMT
Lines: 15

generating an accurate time-alignment between a recorded speech
utterance and its transcription (which is known {\em a priori}) is a
standard problem that everybody runs into eventually. Conceptually, it
is a much easier problem than general recognition since the search
space is so much smaller, but practically there are a number of issues
which make it sort of tricky. The easiest way to do alignments is to
let somebody else worry about it and buy a commercial software package
that does this. Entropic sells the "Aligner" package which provides
both batch and interactive tools for doing alignments and actually
produces a full phonetic transcription as well, also time aligned. If
you really want to do this yourself, try using a normal HMM-based
recognizer but constrain the word grammar to allow only the correct
transcription. 


