From crabapple.srv.cs.cmu.edu!cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!noc.near.net!howland.reston.ans.net!usc!elroy.jpl.nasa.gov!ames!koriel!sh.wide!wnoc-kyo!atrwide!atr-la!awb Mon Jul 19 18:23:38 EDT 1993 Article: 533 of comp.ai.nat-lang Xref: crabapple.srv.cs.cmu.edu comp.ai.nat-lang:533 Newsgroups: comp.ai.nat-lang Path: crabapple.srv.cs.cmu.edu!cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!noc.near.net!howland.reston.ans.net!usc!elroy.jpl.nasa.gov!ames!koriel!sh.wide!wnoc-kyo!atrwide!atr-la!awb From: awb@itl.atr.co.jp (Alan W Black) Subject: Re: computational morphologies In-Reply-To: jahargra@dante.nmsu.edu's message of 15 Jul 1993 01:48:54 GMT Message-ID: Sender: news@itl.atr.co.jp (USENET News System) Nntp-Posting-Host: as53 Organization: ATR Interpreting Telecommunications Research Labs.,Japan References: <222d26INNich@dns1.NMSU.Edu> Date: Fri, 16 Jul 1993 00:11:37 GMT Lines: 150 In article <222d26INNich@dns1.NMSU.Edu> jahargra@dante.nmsu.edu (HARGRAVE III) writes: |From: jahargra@dante.nmsu.edu (HARGRAVE III) |Greetings, | | I'm looking for information on the following topics: | |1. What is the current state-of-the-art in computational morphology? |I'm familiar with the KIMMO-like systems. Any newer alternatives? I |have seen some work on syllable-based morphologies and I would like to |know more. | I (personally) feel that the KIMMO two level finite state transducer model is pretty good for many languages. It has been tried for a number of quite different languages and been quite successful. The problem with the original Koskenniemi model is that it only offers finite state morphosyntax as well as finite state morphographemics (these two aspects of finite stateness are often confused). Finite state morphosyntax means that only continuation classes are specified for each morpheme class. The work I was involved in allowed morphosyntax to be specified as a feature grammar thus offering much more power and making it much easier to specified more complex morpho-syntax (that is which morphemes can be joined). Our work is described in the MIT PRess book Computational Morphology and code is available by anonymous ftp from scott.cogsci.ed.ac.uk[129.215.144.3]:/pub/phonology/tools/MAP/MAP3.1.tar.Z But enough of the plug and let me try to answer your question. There are a number of possible extensions to Koskeniemi type models which have been discussed as opposed to different paradigms (see next paragraph). Still within the KIMMO-like systems there are systems that make the symbols of the transducers non-atomic and include features. That is instead on breaking a word into its individual characters we break it does to a number of feature structures on for each characters (or phoneme). Morphological rules that act on these feature descriptions of the components of morphemes rather than the simple atomic symbols. This should make the specification of phonological rules easier. Work by Trost (see refs below) is along those lines. Making the "characters" more complex means less need for diacritics (and using upper and lower case to try and make distinctions between items that really require a more general descriptive formalism). Also higher level rule formalisms than the Koskeniemi context sensitive rewrite rules have been proposed including one by myself (Black 87) which was later expanded for use in Phonology (Pulman and Hepple (Cambridge Computing Laboratory, UK) but I don't have the full reference for that). Proposals for expanding the two-level model for semitic language like Arabic have also been made, typically expanding the number of levels (though formally one can still look at them as finite state transducers) work such as Kay87 is an example. But all of the above are really still Koskeniemi like systems. An alternative model is the work in what is called paradigmatic morphology. It concentrates of describing classes and inheritance between classes (mostly for morpho-syntax). Such systems are described in Jo Calder's PhD thesis (Calder 89) and as used in Lynn Cahill's system MOLUSC (Cahill 90). The idea is different from the morphosyntax in both the KIMMO system and our own and is an interesting way to try to systematically deal with morphology in languages such as Latin, (there was a student at Edinburgh looking at this with respect to Slavic languages but I'm not sure of the current state of that). |2. Any work done on morphology induction? Either rule-based or |statistical. | Yes there is work on this though I'm not very familiar with it. An Edinburgh MPhil some years ago by Andy Golding offered such a system (see refs below). I have been aware of later systems but unfortunately don't have the references. | My intent is to implement a generic morphology workstation |where where users can interactivley develop high coverage morphologies |for use in other systems. This is part of my masters project. | Sounds interesting. The problem I find with English morphology is that it can be viewed as very simple and almost any system can do a reasonable job on it, or very complex if you wish to include various aspects of derivational morphology (how far do you go? do you wish to decompose "microbioology" to micro-bio-ology ?) and deal with all exceptions with general rules rather than just deals with them as exceptions. Other languages offer much more interesting cases. good luck Alan * Alan W Black --- ATR Interpreting Telecommunications Laboratories * 2-2 Hikaridai email: awb@itl.atr.co.jp Seika-cho, Soraku-gun, tel: (+81) 7749 5 1314 Kyoto 619-02, Japan fax: (+81) 7749 5 1308 @inproceedings(trost90, key = "Trost", title = "The Application of Two Level Morphology to non-concatenative {G}erman morphology", author = "Trost, H.", booktitle = "Proceedings of 13th International Conference on Computational Linguistics", pages = "371-376", year = 1990 ) @inproceedings(black87, key = "Black et al.", author = "Black, A. and Ritchie, G. and Pulman, S. and Russell, G.", title = "Formalisms for Morphographemic Description", booktitle = "Proceedings of 3rd Conference of the European Chapter of the Association for Computational Linguistics", year = 1987, pages = "11-18" ) @inproceedings(kay87, key = "Kay", author = "Kay, M.", title = "Nonconcatenative Finite-State Morphology", booktitle = "Proceedings of 3rd Conference of the European Chapter of the Association for Computational Linguistics", year = 1987, pages = "2-10" ) @inproceedings(calder89, key = "Calder", author = "Calder, J.", title = "Paradigmatic Morphology", booktitle = "Proceedings of 4th Conference of European Chapter of the Association for Computational Linguistics", year = 1989, address = "Manchester", pages = "58-65" ) @inproceedings(cahill90a, key = "Cahill", author = "Cahill, L.J.", title = "Syllable-based morphology", booktitle = "Proceedings of 13th International Conference on Computational Linguistics", year = 1990, address = "Helsinki", pages = "48-53" )