Newsgroups: sci.lang,sci.lang.translation,soc.culture.russian,alt.english.usage,comp.lang.perl.misc
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!miner.usbm.gov!news.er.usgs.gov!stc06.ctd.ornl.gov!fnnews.fnal.gov!cbgw1.lucent.com!cbgw2.lucent.com!worldnet.att.net!howland.erols.net!EU.net!sun4nl!cwi.nl!news
From: Stijn van Dongen <stijnvd@cwi.nl>
Subject: transliteration library
Content-Type: text/plain; charset=us-ascii
Message-ID: <32FF4BA8.41C6@cwi.nl>
Sender: news@cwi.nl (The Daily Dross)
Nntp-Posting-Host: aak.cwi.nl
Content-Transfer-Encoding: 7bit
Organization: Center for Mathematics and Computer Science
Mime-Version: 1.0
Date: Mon, 10 Feb 1997 16:24:08 GMT
X-Mailer: Mozilla 3.01 (X11; I; IRIX 6.3 IP32)
Lines: 109
Xref: glinda.oz.cs.cmu.edu sci.lang:70109 sci.lang.translation:12293 comp.lang.perl.misc:64916

Hi,

        Recently I have written a perl 5 library, which for example
 allows one to search for different transliterations corresponding with
 the same source string.
 ______________________________________________
 There is no interface, only some documentation. If you want to
experiment
 with it, you have to invest time and *know perl 5*. If you want to have
 very specific tools, you have to build them yourself. However, the
 basic parsing tools (and more) have all been written.

 In principle one can define both the source alphabet and the target
 alphabet to be any collections of strings. For each source morpheme
 and for each transliteration system, one or several rewrite rules
mapping
 the source morpheme to a target morpheme may be defined. A rewrite rule
 is labeled with the system or systems from which it is derived.
 Example for cyrillic-latin transliteration:


# 27 character:

        ':s' => {       Eng => { 1 => 'shch'},
                        Fr  => { 1 => 'chtch'},
                        Ger => { 1 => 'schtsch'},
                        Hol => { 1 => 'sjtsj'},
        }

# 6 character:

        e =>    {       Eng => { 1 => 'e', 2 => 'ye' },
                        Fr  => { 1 => 'e',},
                        Ger => { 1 => 'je', 2 => 'e' },
                        Hol => { 1 => 'je', 2 => 'e', 3 => 'jo' },
        }

# suffix:

        'i-n-$' => {    Sufx  => { 1 => 'ine', 2 => 'ien'}
        },

 When supplied with a list of source strings in source notation
 and a list of target strings, it is possible to search for relations:

[l-o-b-a-x-;e-v-s-k-i-j]                # source string
        [l|o|b|a|x|;e|v|s|k|i-j-$]      # source parse
                [Lobachevski]           # target string, parse omitted
here.
                [Lobachevsky]
        [l|o|b|a|x|;e|v|s|k|i|j]
                [Lobachevskii]
                [Lobachevskij]
[x-i-n-;c-i-n]
        [x|i|n|;c|i-n-$]
                [Chintchine]
                [Khinchine]
                [Khintchine]
        [x|i|n|;c|i|n]
                [Chin\v{c}in]
                [Chintchin]
                [Chintschin]
                [Hin\v{c}in]
                [Hincin]
                [Khinchin]
                [Khintchin]

 It is also possible to list the language scores with a given
source-target
 pair, based on weights associated with the source morphemes which have
to be
 supplied by the user:
        chebyshev
                Eng     9
                Fr      3
                Hol     3
                Ger     1
        tchebycheff
                Fr      9
                Eng     1
                Ger     1
                Hol     1
        tschebyschef
                Ger     7
                Spell   2
                Eng     1
                Fr      1
                Hol     1
        tchebyshev
                Eng     6
                Fr      6
                Hol     3
                Ger     1
        tschebychef
                Fr      4
                Ger     4
                Spell   2
                Eng     1
                Hol     1


 If you are interested, you can follow up on this (but do not
 cross-post), mail me at stijnvd@cwi.nl, or grab a tarred bundle of
 files from http://www.cwi.nl/ftp/stijnvd/, file TRANSLITERATION.tar.uu


        Kind regards,

                Stijn van Dongen
