Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!hookup!news.moneng.mei.com!howland.reston.ans.net!news.sprintlink.net!siemens!ellen
From: ellen@scr.siemens.com (Ellen Voorhees)
Subject: Re: A universal System for Representing Ideas
Message-ID: <CyJx5A.3sv@scr.siemens.com>
Originator: ellen@sol.scr.siemens.com
Sender: news@scr.siemens.com (NeTnEwS)
Nntp-Posting-Host: sol.scr.siemens.com
Organization: Siemens Corporate Research, Princeton NJ
References: <3878rq$rue@mango.aloha.com> <TED.94Oct21084025@ilios.crl.nmsu.edu> <TED.94Oct29092506@ilios.crl.nmsu.edu>
Distribution: usa
Date: Mon, 31 Oct 1994 19:07:58 GMT
Lines: 48


In article <TED.94Oct29092506@ilios.crl.nmsu.edu>, ted@crl.nmsu.edu (Ted Dunning) writes:
>
>In article <X4941028122431@FOGHORN> Milind_S_Pandit@ccm.jf.intel.com (Milind S. Pandit) writes:
>
>   > overall, this sentence has 1,682,601,984 combinations of word senses.
>   > resolving these senses is highly non-trivial and the best programs for
>   > sense tagging only get about 70% correct on a very limited set of test
>   > examples.
>
>   Can you provide references to such programs?  Do they use semantic
>   knowledge, or merely part-of-speech?
>
>actually i don't have any references handy.
>
>some of the programs work via handcrafted rules which look for context
>cues. 
>
>the best programs weight statistical evidence from several sources.
>rebecca bruce (from our lab :-) had a very nice paper in this last
>years acl.  perhaps i can convince her to put it on the comp-lg
>archive.
>
>
My colleagues and I have devised statistical classifiers that include both
a global, topical component and a local, syntactic component to resolve word
senses.  We call the combination of global and local information a contextual
representation as it characterizes the contexts in which a sense can be used.
Tests using 6 senses of the highly ambiguous noun `line' and 4 senses
of the verb `serve' show the classifiers can correctly classify known
senses over 70% of the time.  Later modifications to the classifier
(as yet unpublished) have improved the accuracy to just over 80%.

Details of the work are in the papers:

Corpus-Based Statistical Sense Resolution.
      Claudia Leacock, Geoffrey Towell, and Ellen M. Voorhees.
      Proceedings of the 1993 ARPA Workshop on Human Language Technology.
      Plainsboro, NJ. March, 1993.

Towards Building Contextual Representations of Word Senses Using
      Statistical Models.  Claudia Leacock, Geoffrey Towell, and Ellen Voorhees.
      SIGLEX Workshop: Acquisition of  Lexical Knowledge from Text, 
      ACL, June, 1993

Ellen Voorhees
Siemens Corporate Research, Inc.
ellen@scr.siemens.com
