Newsgroups: comp.ai.nat-lang
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!europa.eng.gtefsd.com!howland.reston.ans.net!EU.net!ieunet!news.ul.ie!ul.ie!mcelligotta
From: mcelligotta@ul.ie (Annette McElligott)
Subject: summary of query regarding statistical analysis of text
Message-ID: <mcelligotta.196.052C96EF@ul.ie>
Lines: 101
Sender: usenet@ul.ie
Nntp-Posting-Host: 136.201.8.26
Organization: University of Limerick, Ireland
X-Newsreader: Trumpet for Windows [Version 1.0 Rev B]
Date: Tue, 6 Sep 1994 13:02:03 GMT

To all who replied to my query regarding the application of statistical and quantitative analysis of text, a sincere thank you.

The following is a summary of these responses.

Annette McElligott.
========================================================================

From:	jost@itd.nrl.navy.mil (Jost)

You should certainly take a look at E. Charniak's _Statistical Language Learning_, published this year by the MIT press, it is a very good introduction.

BBN, a company in Cambridge, Mass, USA, does a lot of this; if you contact bates@bbn.com (Lyn Bates) or weischedel@bbn.com (Ralph Weischedel) they might be able to give you some information.

The "old" information theory stuff (Shannon, Weaver, Wiener, etc.) is worth looking at also.

I work in this field...it is very interesting, and you can get some fairly spectacular results.

========================================================================
From:	oli@ims.uni-stuttgart.de (Oliver Christ)

did you have a look at Charniak's book which was recently published? It's ``Statistical Language Learning'', by Eugene Charniak, Bradford Book/MIT Press, 208 pp, 80 illus, $25.00. 

Prof. Koehler, Univ of Trier, Germany, recently started a journal, ``Journal on Quantitative Linguistics'', I think, published by Swets & Zeitlinger (but I'm not sure about that). 

A good list of references can be found in the special issues about corpus processing of Computational Linguistics (19:1 and 19:2).

Then, there's Black/Garside/Leech: ``Statistically-driven computer grammars of English: the IBM/Langaster approach'', Rodopi, 93. 

Perhaps you may also want to have a look in a small corpora-related bibliography I put on our ftp server (ftp.ims.uni-stuttgart.de:/pub/corpora/corpora.bib), there are some titles on statistical NLP included.

========================================================================
From:	lapalut@sadhu.inria.fr (Lapalut)



========================================================================
From:	mike@odyssey.ucc.ie (Mike McElligott)

I presume that you are familiar with the vector space approach etc since that seems to be
a favourite of Dr. Sutcliffe.  You might find the following two articles interesting.

"USING STATISTICS IN LEXICAL ANALYSIS" Kenneth Church et al.
Lexical Acquisition Lawrence Erlbaum 1991 pp 115-164

"USING STATISTICAL METHODS TO IMPROVE KNOWLEDGE-BASED NEWS CATEGORIZATION" Paul S. Jacobs, IEEE Expert April 1993 pp 13-23.

========================================================================
From:	et@cogsci.edinburgh.ac.uk (et)

You might be best off to just order the recently published bibliography of the field.





========================================================================
From:	juola@taylor.cs.Colorado.EDU (Patrick Juola)

The journal Computational Linguistics devoted two special issues last year to the subject.  It's a hot enough topic that any major CL conference typically has a few c papers on it.

========================================================================
From:	melamed@unagi.cis.upenn.edu (melamed)

For the past 4 or 5 years, procedings of the ACL, EACL, COLING and MT conferences have contained a great many papers on empirical CL.  There have also been two AAAI Symposia on the topic in 1992, and the procedings thereof might help you.

Zelig Harris started the field here at Penn in the '50s.  If you want to read classics, you could start with his books.

Much of the current empirical CL research is still coming out of Penn.  Recently, there have been two influential theses published in this field at Penn, one by Eric Brill, the other by Philip Resnik.  These are available as tech. reports.

========================================================================
From:	mock@hyperion.cs.ucdavis.edu (Kenrick Mock)

There are a couple of recent AAAI workshops on statistical NLP.  You could get the proceedings from the AAAI folks for around $30 a pop, I believe Eugene Charniak also has a book out, entitled "Statistical Language Learning".

========================================================================
From:	powers@ist.flinders.edu.au (David Powers)

There are relevant (new) ACL SIGs, SIGNLL (Natural Language Learning) and SIGTEXT (or is it SIGCORPORA).

Eugene Charniak has just brought out a book through MIT Press, Statistical Language Learning - which is a good introductory text.

The conference NeMLaP is bein held in the UK in Septemeber and has a large focus in this area.  

Computational Linguistics has something relevant in most issues, see e.g. March 94, December 92 (I don't have my 93 issues here at the moment.  I think there is a special issue in the pipeline).

Then there is language learning, I have edited several proceedings, one which concentrates on statistical approaches is the 92 SHOE proceedings (Extraction of Hierarchical Structure) from ITK, KUB, Tilburg Holland (Dfl 15, email walter@kub.nl)

There are email lists, empiricists and ai-stats that are highly relevant.

========================================================================
From:	schuetze@parc.xerox.com (Hinrich Schuetze)

A good book to start with would be Charniak's Statistical Language Learning which just came out.

========================================================================
From:	Kaori_Shima@SONORA.MT.CS.CMU.EDU (Kaori_Shima)

"Self-organized Language Modeling for Speech Recognition" by F. Jelinek has been recommended to me by several people.  The paper talks about the basics of language modeling for the speech recognition task.

========================================================================

