MIME-Version: 1.0 Server: CERN/3.0 Date: Monday, 06-Jan-97 21:58:42 GMT Content-Type: text/html Content-Length: 8275 Last-Modified: Wednesday, 07-Feb-96 21:14:04 GMT
This paper defines a new machine learning problem to which standard machine learning algorithms cannot easily be applied. The problem occurs in the domain of lexical acquisition. The ambiguous and synonymous nature of words causes the difficulty of using standard induction techniques to learn a lexicon. Additionally, negative examples are typically unavailable or difficult to construct in this domain. One approach to solve the lexical acquisition problem is presented, along with preliminary experimental results on an artificial corpus. Future work includes extending the algorithm and performing tests on a more realistic corpus.
Ph.D. proposal Building accurate and efficient natural language processing (NLP)
systems is an important and difficult problem. There has been
increasing interest in automating this process. The lexicon, or the
mapping from words to meanings, is one component that is typically
difficult to update and that changes from one domain to the next.
Therefore, automating the acquisition of the lexicon is an important
task in automating the acquisition of NLP systems. This proposal
describes a system, WOLFIE (WOrd Learning From Interpreted Examples),
that learns a lexicon from input consisting of sentences paired with
representations of their meanings. Preliminary experimental results
show that this system can learn correct and useful mappings. The
correctness is evaluated by comparing a known lexicon to one learned
from the training input. The usefulness is evaluated by examining the
effect of using the lexicon learned by WOLFIE to assist a parser
acquisition system, where previously this lexicon had to be
hand-built. Future work in the form of extensions to the algorithm,
further evaluation, and possible applications is discussed.
To appear in the Proceedings of the Fifth International Workshop on
Inductive Logic Programming. This paper presents a method for learning logic programs
without explicit negative examples by exploiting an assumption of
output completeness. A mode declaration is supplied for the
target predicate and each training input is assumed to be accompanied
by all of its legal outputs. Any other outputs generated by an
incomplete program implicitly represent negative examples; however,
large numbers of ground negative examples never need to be generated.
This method has been incorporated into two ILP systems, CHILLIN and
IFOIL, both of which use intensional background knowledge. Tests on
two natural language acquisition tasks, case-role mapping and
past-tense learning, illustrate the advantages of the approach.
A system, WOLFIE, that acquires a mapping of words to their semantic
representation is presented and a preliminary evaluation is performed.
Tree least general generalizations (TLGGs) of the representations of input
sentences are performed to assist in determining the representations
of individual words in the sentences. The best guess for a meaning
of a word is the TLGG which overlaps with the highest percentage of
sentence representations in which that word appears.
Some promising experimental results on a non-artificial data
set are presented.
Proceedings of the Twelfth National Conference on
AI, Seattle, WA, July 1994. (AAAI-94)
A new inductive learning system, LAB (Learning for ABduction), is
presented which acquires abductive rules from a set of training examples.
The goal is to find a small knowledge base which, when used abductively,
diagnoses the training examples correctly and generalizes well to unseen
examples. This contrasts with past systems that inductively learn rules that
are used deductively. Each training example is associated with potentially
multiple categories (disorders), instead of one as with typical learning
systems. LAB uses a simple hill-climbing algorithm to efficiently
build a rule base for a set-covering abductive system. LAB has been
experimentally evaluated and compared to other learning systems and an expert
knowledge base in the domain of diagnosing brain damage due to stroke.
M.A. Thesis, Department of Computer Sciences, University of Texas at Austin, July 1993.
A new system for learning by induction, called LAB, is presented. LAB
(Learning for ABduction) learns abductive rules based on a set of training
examples. Our goal is to find a small knowledge base which, when used
abductively, diagnoses the training examples correctly, in addition to
generalizing well to unseen examples. This is in contrast to past systems,
which inductively learn rules which are used deductively. Abduction is
particularly well suited to diagnosis, in which we are given a set of symptoms
(manifestations) and we want our output to be a set of disorders which explain
why the manifestations are present. Each training example is associated with
potentially multiple categories, instead of one, which is the case with typical
learning systems. Building the knowledge base requires a choice between
multiple possibilities, and the number of possibilities grows exponentially
with the number of training examples. One method of choosing the best
knowledge base is described and implemented. The final system is
experimentally evaluated, using data from the domain of diagnosing brain damage
due to stroke. It is compared to other learning systems and a knowledge base
produced by an expert. The results are promising: the rule base learned is
simpler than the expert knowledge base and rules learned by one of the other
systems, and the accuracy of the learned rule base in predicting which areas
are damaged is better than all the other systems as well as the expert
knowledge base.