Newsgroups: comp.speech
Path: pavo.csi.cam.ac.uk!doc.ic.ac.uk!decwrl!sdd.hp.com!saimiri.primate.wisc.edu!caen!destroyer!fmsrl7!lynx.unm.edu!umn.edu!math.fu-berlin.de!fauern!lrz-muenchen.de!mac_server.cis.uni-muenchen.de!user
From: draxler@cis.uni-muenchen.de (Christoph Draxler)
Subject: Re: Phonetic Alphabets
Message-ID: <draxler-260293162516@mac_server.cis.uni-muenchen.de>
Followup-To: comp.speech
Sender: news@news.lrz-muenchen.de (Mr. News)
Organization: CIS University of Munich
References: <sandra.730726768@argon>
Date: Fri, 26 Feb 1993 15:32:37 GMT
Lines: 89

In article <sandra.730726768@argon>, sandra@argon (Sandra Swagten) wrote:

> At SPEX we are currently trying to build up an speech archive.  In this
> archive we also want to store transcriptions.  The problem with
> transcriptions is that people deliver it in different alphabets.
> We have the following questions:
>     - Which alphabet would you suggest we should use in this archive?
>       IPA seems to be the most extensive alphabet.
>     - Is there a (standard) computer representation available for IPA?
>     - Are there any  mapping tables, methods, programmes to convert the
>       different alphabets (COST-CPA, SAMPA, ...) into IPA?

Hi Sandra,

we have a similar problem here, and we solved it by using
Prolog to represent mapping tables, e.g.

   sampa_ipa(a,304).

meaning that the sampa symbol "a" is represented using the 
IPA number ([Esling 90]) 304.

For specific machines, you simply create similar tables, e.g

   ipa_mac(304,222).

meaning that the IPA number is represented as the ASCII character
222 on the macintosh computer.

A nice property of this approach using binary relations is that
these relations are in third normal form, and that 

A very promising coding scheme is to use UniCode Codes ([UniCode
92]. The UniCode consortium attempts to create a global ASCII
character set based on two bytes, and there is a section reserved
for the IPA symbols. However, UniCode is not yet implemented on
any computer - but most major manufacturers have agreed on the
necessity of such a global character set. Further nformation can be
found in Byte July 91 (if I remember correctly).


Why Prolog? 
===========
Mapping tables are indexed automatically by the Prolog system,
making lookup very fast. Furthermore, the tables can be read
by humans...and queries may be run forward and backward, i.e.

if you know the sampa symbol you can get the IPA no, 

   ?- sampa_ipa(a,X).
   X = 304

if you  know the IPA number you can retrieve the sampa symbol, 

   ?- sampa_ipa(X,304).
   X = a

if you know both you can check whether this relation is true, and 

   ?- sampa_ipa(a,304).
   yes

if you know none you can retrieve all pairs one after the
other...

   ?- sampa_ipa(X,Y).
   X = b, Y = 101;
   X = d, Y = 103;

   etc.

...and extensional definitions can be mixed freely
with intensional definitions (i.e. mix computation rules
with direct mappings).

Furthermore, most Prolog systems have a foreign language interface
to C, so that binding it to applications should be easy.


Christoph


------------------------------------------------------------
Christoph Draxler
CIS Centre for Information and Language Processing
Ludwig-Maximilians-University Munich   Tel: +49 +89 36 40 72
Leopoldstr. 139                        Fax: +49 +89 361 6199
D 8000 Munich 40                 draxler@cis.uni-muenchen.de
------------------------------------------------------------
