Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!warwick!uknet!pipex!uunet!zib-berlin.de!news.th-darmstadt.de!fauern!lrz-muenchen.de!ue701af.ppp.lrz-muenchen.de!user
From: draxler@cis.uni-muenchen.de (Christoph Draxler)
Subject: Re: extracting words from TIMIT
Message-ID: <draxler-040294002600@ue701af.ppp.lrz-muenchen.de>
Followup-To: comp.speech
Sender: news@news.lrz-muenchen.de (Mr. News)
Organization: CIS University of Munich
References: <9401291513.AA26464@dcs.shef.ac.uk>
Date: Thu, 3 Feb 1994 23:26:00 GMT
Lines: 61

In article <9401291513.AA26464@dcs.shef.ac.uk>, M.Crawford@dcs.shef.ac.uk
(Malcolm Crawford) wrote:

> Does anybody have s/w for extracting from the TIMIT database waveforms  
> (or simply start and end sample numbers) for "induvidual" words?

I know only very little about what the TIMIT database actually contains,
but I can tell you how we extract the signal fragments corresponding
to words (in fact, the smallest addressable unit is a single phone)
in the PhonDat database of spoken German.

The PhonDat database contains data on four levels of representation:

   - orthography
   - citation form
   - phonetic transcription
   - signal

The orthographic and citation form representations are independent
of the utterance actually produced by speakers; the phonetic 
transcription maps a signal fragment to a phonetic symbol in IPA 
notation, e.g.

   segment(13375, 2783, [501, 304, 502])

meaning that an accented long a (given through a list of IPA numbers)
is associated to a signal fragment of length 2783 samples beginning
at sample number 13375 (...in the transcription file xyz)

The data of the upper three levels of representation is held in a 
Prolog database; access to the signal is possible by formulating a
database query in Prolog using the toolbox of "predefined query
predicates".

Instead of going into detail here, I'll give you an example:

  ?- display_data("'a,plosive,fricative",Cnt).

is a predefined query predicate which takes as its first argument
a search pattern (here the accented a followed by some plosive and
some fricative) and returns in its second argument the number of
signal fragments found. The list of all signal fragments is written
into a file which can be accessed by other (e.g. signal processing
or display applications).

If now you are interested have a look at our paper at Eurospeech 93
entitled "Prolog tools for accessing the PhonDat database of spoken
German", the authors are C.Draxler, B.Eisen, H.G. Tillmann.

If you need further information, send me an e-mail.

Christoph

-- 
------------------------------------------------------------
Christoph Draxler
CIS Centre for Information and Language Processing
Ludwig-Maximilians-University Munich   Tel: +49 +89 211 0664
Wagmuellerstr. 23                      Fax: +49 +89 211 0674
D 80538 Munich                   draxler@cis.uni-muenchen.de
------------------------------------------------------------
