From miles@minster.york.ac.uk Tue Feb 1 21:44:36 EST 1994 Article: 1149 of comp.ai.nat-lang Xref: glinda.oz.cs.cmu.edu comp.ai.nat-lang:1149 Path: honeydew.srv.cs.cmu.edu!fs7.ece.cmu.edu!europa.eng.gtefsd.com!howland.reston.ans.net!pipex!uknet!yorkohm!minster!miles From: miles@minster.york.ac.uk Newsgroups: comp.ai.nat-lang Subject: Re: robust parsing Message-ID: <760098770.17211@minster.york.ac.uk> Date: 1 Feb 1994 10:32:50 GMT References: <2ij8pbEbcr@uni-erlangen.de> Reply-To: miles@minster.york.ac.uk (miles) Followup-To: Distribution: world Organization: Department of Computer Science, University of York, England Lines: 116 Keywords: In article <2ij8pbEbcr@uni-erlangen.de> mela@linguistik.uni-erlangen.de (Manuela Boros) writes: > >I am looking for literature on robust parsing of natural >language, either in spoken or written form. >- What kind of mistakes are frequently, how can they be > detected and how can they be automatically repaired? >- Which grammar types have been found to be especially > suited for this kind of problem? >- Who in computational linguistics is working on this? > >Many thanks in advance. > >--------------- >Manuela Boros >mela@linguistik.uni-erlangen.de There is a lot of literature regarding robust parsing of nl. For corrective approaches, look at Computational Linguistics 9. These systems try to map some sentence that the grammar cannnot generate ito something that it can. For grammar learning approaches, look at the corpus linguistics literature for stochastic approaches, or at the machine learning lit. for their symbolic cousins. Here are a few examples: @inproceedings {Bris92, title = "Robust {S}tochastic {P}arsing {U}sing the {I}nside-{O}utside {A}lgorithm", booktitle = "Proceedings of the workshop on statistically-based techniques in Natural Language Processing, San Jose, California", author = "Ted Briscoe and Nick Waegner", pages = "not known", notes = "To appear", year = 1992} @book{Gars87, editor = "Garside, R. and G. Leech and G. Sampson", title = "The {C}omputational {A}nalysis of {E}nglish: {A} {C}orpus-based Approach ", publisher = "Longman", year = 1987} @book {Blac93, title = "Statistically driven computer grammars of {E}nglish: the {IBM}-{L}ancaster approach", publisher = "Rodopi", year = 1993, editor = "Ezra Black and Roger Garside and Geoffrey Leech"} @techreport { Sout91, author = "C. Souter and T. O'Donoghue and E.S Atwell", title = "Training {P}arsers with {P}arsed {C}orpora", institution = "University of Leeds School of Computer Studies", year = 1991, type = "Report 91.2"} @techreport { Sout92, author = "C. Souter and E.S Atwell", title = "A {R}ichly {A}nnotated {C}orpus for {P}robabilistic {P}arsing", institution = "University of Leeds School of Computer Studies", year = 1992, type = "Report 92.13"} @inproceedings {Osbo93a, author = "Miles Osborne and Derek Bridge", booktitle = "Workshop on Machine Learning Techniques and Text Analysis, Vienna, Austria", title = "Learning unification-based grammars and the treatment of undergeneration", year = 1993} @inproceedings {Osbo93b, author = "Miles Osborne and Derek Bridge", booktitle = "Grammatical Inference Colloquim, Essex University", title = "Inductive and deductive grammar learning: dealing with incomplete theories", year = 1993} @inproceedings {Carr92, author = "Glenn Carroll and Eugene Charniak", booktitle = "{AAAI}-92 {W}orkshop {P}rogram: {S}tatistically-{B}ased {NLP} {T}echniques, San Jose, California", title = "Two {E}xperiments on {L}earning {P}robabilistic {D}ependency {G}rammars from {C}orpora", year = 1992} @inproceedings {Bril92, author = "Eric Brill and David Magerman and Mitchell Marcus and Beatrice Santorini", booktitle = "{AAAI}-92 {W}orkshop {P}rogram: {S}tatistically-{B}ased {NLP} {T}echniques, San Jose, California", title = "Deducing {L}inguistic {S}tructure from the {S}tatistics of {L}arge {C}orpora", year = 1992} @book{Berw85, author = "Berwick, Robert C.", title = "The acquisition of syntactic knowledge", publisher = "MIT Press", year = 1985} My thesis (currently being written!) deals with grammar learning for natural language (end of plug). Hope this helps Miles