Newsgroups: comp.lang.prolog
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!newsfeed.cit.cornell.edu!newsstand.cit.cornell.edu!news.kei.com!newsfeed.internetmci.com!in2.uu.net!allegra!alice!pereira
From: pereira@alta.research.att.com (Fernando Pereira)
Subject: Re: Inductive Logic Programm, EBG and Uncertainty
In-Reply-To: ries+@CS.CMU.EDU's message of 6 Nov 1995 18:43:33 GMT
X-Nntp-Posting-Host: alta.research.att.com
Message-ID: <PEREIRA.95Nov7210647@alta.research.att.com>
Sender: usenet@research.att.com (netnews <9149-80593> 0112740)
Organization: AT&T Bell Laboratories
References: <47ll0l$qqb@cantaloupe.srv.cs.cmu.edu>
Date: Wed, 8 Nov 1995 02:06:47 GMT
Lines: 47

In article <47ll0l$qqb@cantaloupe.srv.cs.cmu.edu> ries+@CS.CMU.EDU (Klaus Ries) writes:
   Dear lang.prolog'ers,
   there is in the meanwhile so much work on learning with
   clause-logics that I am really curious whether there has been some work
   that extends this to a logic that contains uncertainty, e.g.
   in the disguise of probabilities or in the disguise of fuzzy-logic.

   My interest is especially in restricted forms of logic, that allows induction
   over HUGE databases.
   Since my research objective is to find better statistical language models
   and a model that just uses the last two words to estimate the next is by far the
   best I wonder if logical structures could contribute here.
I don't know the origin of this folk-pseudo-result that "trigrams are
best" among all current statistical language models in existence, but
it's just not true, 4- and 5-grams perform significantly better than
trigrams in some speech recognition tasks (see results of last year's
ARPA NAB tests, especially the AT&T entry that used 5-grams). But I
take it that your real interest is to move beyond n-grams to models
with more structure. 

A first question you may want to consider is whether the applications
you are interested in require models that output probability estimates
for configurations/analyses, or instead just need to make some
decision (eg. does a noun phrase start/end between these two
consecutive words). In the latter case, you don't need a probabilistic
model (although a probabilistic model may be worth considering). Then
existing concept learning frameworks may be applicable. See for
example the work of Ray Mooney (UT Austin) specifically in NLP
applications, of William Cohen (AT&T), or, further from logic
programming, Eric Brill (Johns Hopkins).
   One possibily that has been explored is to partition the history of the text
   using decision trees (e.g. with ID3, CART, etc.).
   Since decision trees are not even able to handle 2 independent features without duplications
   I wonder if a restricted form of logic -- say some
   " FUZZY-Probabilistic-Datalog without function symbols " :) 
   has been explored before to inductively derive stochastical models.
Daphne Koller (Stanford) has suggested connections between her earlier
work on probabilistic logics and Bayesian networks. I don't know how
far that work has gone, though.

-- 
Fernando Pereira
2B-441, AT&T Bell Laboratories
600 Mountain Ave, Murray Hill, NJ 07974-0636
pereira@research.att.com


