From corpora-request@uib.no Fri May 28 11:57:25 1993 Date: Fri, 28 May 93 09:57:25 +0200 From: ide@grtc.cnrs-mrs.fr (Nancy Ide) To: corpora@hd.uib.no, tei-l@uicvm.BITNET, lexical@NMSU.Edu, Subject: For publication: Text Software Initiative The Text Software Initiative ---------------------------- An international effort to promote the development and use of free text software The widespread availability of large amounts of electronic text and linguistic data in recent years has dramatically increased the need for generally available, flexible text software. Commercial software for text analysis and manipulation covers only a fraction of research needs, and it is often expensive and hard to adapt or extend to fit a particular research problem. Software developed by individual researchers and labs is often experimental and hard to get, hard to install, under-documented, and sometimes unreliable. Above all, most of this software is incompatible. As a result, it is not at all uncommon for researchers to develop tailor-made systems that replicate much of the functionality of other systems and in turn create programs that cannot be re-used by others, and so on in an endless software waste cycle. The reusability of data is a much-discussed topic these days; similarly, we need "software reusability", to avoid the re-inventing of the wheel characteristic of much language-analytic research in the past three decades. The Text Software Initiative (TSI) is committed to solving this problem by working to o establish and publish guidelines and standards for the development of text software; o promulgate and coordinate the development of free TSI- conformant software. The scope of the TSI covers all areas of analysis and manipulation of all kinds of texts (written or spoken, mono-lingual or multi- lingual parallel, etc.), including markup of physical and logical text features, linguistic analysis and annotation, browsing and retrieval, statistical analysis, and other text-related tasks in research in computational linguistics, humanities computing, terminology and lexicography, speech, etc. The TSI software development effort is distributed, that is, anyone can contribute on a voluntary basis. This means that tools will be developed according to the contributors' priorities; however, the TSI is ultimately working towards the development of a comprehensive text handling system. To ensure software compatibility and reusability and enable distributed development, the TSI is committed to: o design and publish program interface conventions o determine and publish guidelines for programming style and documentation o stress separation of code and linguistic data to ensure (natural) language independence o emphasize breaking high-level text-handling tasks into more primitive, reusable functions o provide a library of primitive text-handling tools o maintain a task list and set priorities o circulate information such as progress reports, revisions to the standard, availability of new software, etc. o set up a mechanism for testing and evaluation o maintain mailing lists for comments, bug reports, suggestions, etc. The TSI works in relation with other standardization groups, notably the Text Encoding Initiative and the Expert Advisory Group on Language Engineering Standards (EAGLES). All TSI software is free in the sense defined in the Free Software Foundation's General Public License, which guarantees the freedom to copy, redistribute, and modify software, and protects this freedom by requiring those who pass on the software to include the rights to further redistribute it and see and change the code. Distribution of TSI software is accomplished in relation with other dissemination groups such as the Free Software Foundation, RELATOR, and the Linguistic Data Consortium. The TSI does not provide technical support, but organizes a network of voluntary consultants and support people. PROJECT COORDINATORS Nancy Ide, Vassar College, Poughkeepsie, New York, USA ide@cs.vassar.edu Jean Veronis, Universite de Provence/CNRS, Aix-en-Provence, France veronis@grtc.cnrs-mrs.fr GENERAL ADVISORY BOARD Susan Armstrong, ISSCO, Geneva Mark Liberman, Linguistic Data Consortium, University of Pennsylvania Makoto Nagao, Kyoto University Mark Olsen, ARTFL Project, University of Chicago Richard Stallman, Free Software Foundation, Cambridge, Massachusetts Donald Walker, Bellcore, Morristown New Jersey Antonio Zampolli, Istituto di Linguistica Computazionale, Pisa The TSI also includes a TECHNICAL ADVISORY BOARD of software developers. From corpora-request@uib.no Fri May 28 13:19:41 1993 Date: Fri, 28 May 1993 11:19:41 +0200 From: Kirsti Rye Ramberg To: corpora Subject: Corpora of contemporary written German. A post graduate student at the Department of Germanic Languages and Literature wants to use electronic corpora of contemporary written German. Originally she wanted to use The Mannheimer Korpus from Oxford Text Archive. Unfortunally. this is a corpus available only to people within Oxford University. Does anyone know other corpora of contemporary written German? Kirsti Rye Ramberg Faculty of Arts University of Trondheim 7055 Dragvoll From corpora-request@uib.no Fri May 28 12:36:44 1993 To: corpora@hd.uib.no From: "H.P. Houtzagers" Date: 28 May 93 11:24:56 MET-1 Subject: Russian corpora I am trying to gather information on Russian corpora. Can anyone tell me if there exists a list of Russian corpora? I am grateful for each piece of information you can give me. Peter Houtzagers From corpora-request@uib.no Fri May 28 22:00:47 1993 Date: Fri, 28 May 93 20:00:47 +0200 To: corpora@hd.uib.no From: ursula.doleschal@wu-wien.ac.at Subject: query: German, German- English, and German-Russian dictionaries I am posting this on behalf of my colleague Sergej Krylov: I would like to find out if there exist anywhere computerized dictionaries (German monolingual, German-English and German-Russian). I am interested in freeware as well as shareware or commercial versions. Ursula Doleschal Tel.: ++43-1-31336 4115 Inst. f. Slawische Sprachen Fax: ++43-1-31336 744 Wirtschaftsuniv. Wien Augasse 9 Austria 1090 Wien Europe From corpora-request@uib.no Fri May 28 09:19:00 1993 Date: Fri, 28 May 93 13:19 EDT From: lewis@research.att.com (David Lewis) To: corpora@hd.uib.no, empiricists@csli.stanford.edu, linguist@tamvm1.tamu.edu Subject: major resources and lists of resources for text processing Hello, As part of a tutorial I'm presenting on natural language processing for information retrieval, I would like to prepare two brief lists: 1. A list of "lists" of resources for natural language processing of text. I count electronic mailing lists, and archives of those lists, as such lists of lists when they frequently discuss language processing resources. I will also list network servers (FTP archives, Bitnet LISTSERVs, etc.) as "lists" but even better would be to be able to provide pointers to regularly maintained files on such archives. My current list is the following: Free electronic mailing lists: NL-KR IRLIST LINGUIST EMPIRICISTS CORPORA-LIST LN HUMANIST Commercial electronic mailing lists: Computists Communique Regularly Maintained Lists of Resources: /pub/catalog at anonymous FTP site clr.nmsu.edu Suggestions are welcome. This list of "lists" is intended to be a one-shot effort that will become immediately out-of-date, but will have pointed a group of people to maintained resources with a longer expected lifetime. 2. A second list of the 10 to 20 most useful, easily available, non-commercial resources for content-based processing of large (at least megabyte scale) bodies of text, and how to get them. (If it takes more than 50 words to describe how to get them, they fail on "easily available".) I want to include only resources that are robust, available at nominal cost (at least to academics), and have actually been successfully used by multiple research or application groups. Some of the ones I intend to include on the list based on personal experience are: --Treebank corpus (tagged corpus) --PC-Kimmo and Englex (morphological analyzer) --Perl (programming language) --SMART (information retrieval system) --JUMAN (Japanese morphological analyzer / segmenter) I welcome suggested additions to this list, from resource developers and particularly from resource users based on your personal experience. Please do not be offended if you are a resource developer and I do not choose to list your resource. Brevity is my goal and there will be significant omissions. I will post the resulting two lists back to all newsgroups that this query is going to, so there's no need to write me asking for copies. These two lists will be one shots that will not be maintained, and will hopefully be soon forgotten in favor of better, permanently maintained lists. Many thanks, David Lewis David D. Lewis AT&T Bell Laboratories email: lewis@research.att.com 600 Mountain Ave.; Room 2C-408 ph. 908-582-3976 Murray Hill, NJ 07974; USA dept. fax. 908-582-7550 From corpora-request@uib.no Tue Jun 1 18:08:05 1993 Date: Tue, 1 Jun 1993 16:08:05 +0200 From: Inge de Munnink To: corplst@hd.uib.no Subject: mailinglist Hallo, Ik zou graag op dit nummer de mailinglist over corpora ontvangen en evt. ook die van linguistiek. groeten, Inge From corpora-request@uib.no Thu Jun 3 11:27:23 1993 From: Adrian Tulloch (MS Research Fellowship) To: CORPORA@hd.uib.no Date: Thu, 3 Jun 93 18:27:23 PDT Subject: Research Unit Head I have posted this on behalf of Professor Gledhill; please send any eMail directly to him. (vanceg@microsoft.com). 20 th MAY 1993 - POSITION AVAILABLE, AUSTRALIA RESEARCH UNIT HEAD NATURAL LANGUAGE UNDERSTANDING The Microsoft Institute of Advanced Software Technology is a 'not for profit' subsidiary of Microsoft Corporation established in Sydney, Australia. The Microsoft Institute is formally affiliated with Macquarie University and, as a result, has an active basic research program. The Microsoft Institute is seeking to appoint a scientist in Computational Linguistics to lead an active research program that will make an original contribution in this field. The successful applicant will have an appropriate research PhD in computer based Natural Language Understanding and will have established a presence in their field through publication and conference presentation. In particular, they will be energetic and committed to building a research unit of international reputation. The Unit is already established in Sydney with a team of four outstanding PhD students and appropriate software and equipment, it now needs a scientist with vision and direction to unify and direct the research. The position offers the opportunity to collaborate directly with the Natural Language Group in the Microsoft research laboratories in Seattle and contribute to fundamental work that should influence the direction of the field. Appointments to the position of Unit Head will be for an initial period of three years. A salary package of around $A75,000 would be provided which is equivalent to a full Chair in an Australian University. Applications including a CV and the names of three relevant referees should be sent to: Professor Vance Gledhill, Director, Microsoft Institute 65 Epping Road North Ryde New South Wales 2113 AUSTRALIA or emailed to: vanceg@microsoft.com From corpora-request@uib.no Fri Jun 4 13:51:37 1993 Date: Fri, 4 Jun 93 12:45:18 BST From: GTMY0413@vme.gla.ac.uk To: CORPORA@hd.uib.no Dear Sir, I wish to obtain information about the availability of Modern English corpora. Yours faithfully, Jamal Ardehali From corpora-request@uib.no Fri Jun 4 15:13:42 1993 Date: Fri, 4 Jun 93 12:48:56 BST From: GTMY0413@vme.gla.ac.uk Subject: MODERN ENGLISH CORPORA To: CORPLST@hd.uib.no Please send my any information on the availability of Modern English Corpora. Jama Ardehali, University of Glasgow From corpora-request@uib.no Mon Jun 7 10:59:39 1993 Date: Mon, 7 Jun 93 10:41 MET From: SCHOLTES@alf.let.uva.nl Subject: PhD Dissertation available To: arras@icase.edu, cdlee@moose.cs.indiana.edu, cdohert@ccvax.ucd.ie, =================================================================== As I had to disapoint many people because I run out of copies in the first batch, a high-quality reprint has been made from....................................... ........REPRINT........ Ph.D. DISSERTATION AVAILABLE on Neural Networks, Natural Language Processing, Information Retrieval 292 pages and over 350 references =================================================================== A Copy of the dissertation "Neural Networks in Natural Language Processing and Information Retrieval" by Johannes C. Scholtes can be obtained for cost price and fast airmail- delivery at US$ 25,-. Payment by Major Creditcards (VISA, AMEX, MC, Diners) is accepted and encouraged. Please include Name on Card, Number and Exp. Date. Your Credit card will be charged for Dfl. 47,50. Within Europe one can also send a Euro-Cheque for Dfl. 47,50 to: (include 4 or 5 digit number on back of cheque!) University of Amsterdam J.C. Scholtes Dufaystraat 1 1075 GR Amsterdam The Netherlands scholtes@alf.let.uva.nl Do not forget to mention a surface shipping address. Please allow 2-4 weeks for delivery. Abstract 1.0 Machine Intelligence For over fifty years the two main directions in machine intelligence (MI), neural networks (NN) and artificial intelligence (AI), have been studied by various persons with many dfferent backgrounds. NN and AI seemed to conflict with many of the traditional sciences as well as with each other. The lack of a long research history and well defined foundations has always been an obstacle for the general acceptance of machine intelligence by other fields. At the same time, traditional schools of science such as mathematics and physics developed their own tradition of new or "intelligent" algorithms. Progress made in the field of statistical reestimation techniques such as the Hidden Markov Models (HMM) started a new phase in speech recognition. Another application of the progress of mathematics can be found in the application of the Kalman filter in the interpretation of sonar and radar signals. Much more examples of such "intelligent" algorithms can be found in the statistical classification en filtering techniques of the study of pattern recognition (PR). Here, the field of neural networks is studied with that of pattern recognition in mind. Although only global qualitative comparisons are made, the importance of the relation between them is not to be underestimated. In addition it is argued that neural networks do indeed add something to the fields of MI and PR, instead of competing or conflicting with them. 2.0 Natural Language Processing The study of natural language processing (NLP) exists even longer than that of MI. Already in the beginning of this century people tried to analyse human language with machines. However, serious efforts had to wait until the development of the digital computer in the 1940s, and even then, the possibilities were limited. For over 40 years, symbolic AI has been the most important approach in the study of NLP. That this has not always been the case, may be concluded from the early work on NLP by Harris. As a matter of fact, Chomsky's Syntactic Structures was an attack on the lack of structural proper-ties in the mathematical methods used in those days. But, as the latter's work remained the standard in NLP, the former has been forgotten completely until recently. As the scientific community in NLP devoted all its attention to the symbolic AI-like theories, the only use- ful practical implementation of NLP systems were those that were based on statistics rather than on linguistics. As a result, more and more scientists are redirecting their attention towards the statistical techniques a vailable in NLP. The field of connectionist NLP can be considered as a special case of these mathematical methods in NLP. More than one reason can be given to explain this turn in approach. On the one hand, many problems in NLP have never been addressed properly by symbolic AI. Some examples are robust behavior in noisy environments, disambiguation driven by different kinds of knowledge, commensense generalizations, and learning (or training) abilities. On the other hand, mathematical methods have become much stronger and more sensitive to spe- cific properties of language such as hierarchical structures. Last but not least, the relatively high degree of success of mathematical techniques in commercial NLP systems might have set the trend towards the implementation of simple, but straightforward algorithms. In this study, the implementation of hierarchical structures and semantical features in mathematical objects such as vectors and matrices is given much attention. These vectors can then be used in models such as neural networks, but also in sequential statistical procedures implementing similar characteristics. 3.0 Information Retrieval The study of information retrieval (IR) was traditionally related to libraries on the one hand and military applications on the other. However, as PC's grew more popular, most common users loose track of the data they produced over the last couple of years. This, together with the introduction of various "small platform" computer programs made the field of IR relevant to ordinary users. However, most of these systems still use techniques that have been developed over thirty years ago and that implement nothing more than a global surface analysis of the textual (layout) properties. No deep structure whatsoever, is incorporated in the decision whether or not to retrieve a text. There is one large dilemma in IR research. On the one hand, the data collections are so incredibly large, that any method other than a global surface analysis would fail. On the other hand, such a global analysis could never implement a contextually sensitive method to restrict the number of possible candidates returned by the retrieval system. As a result, all methods that use some linguistic knowledge exist only in laboratories and not in the real world. Conversely, all methods that are used in the real world are based on technological achievements from twenty to thirty years ago. Therefore, the field of information retrieval would be greatly indebted to a method that could incorporate more context without slowing down. As computers are only capable of processing numbers within reasonable time limits, such a method should be based on vectors of numbers rather than on symbol manipulations. This is exactly where the challenge is: on the one hand keep up the speed, and on the other hand incorporate more context. If possible, the data representation of the contextual information must not be restricted to a single type of media. It should be possible to incorporate symbolic language as well as sound, pictures and video concurrently in the retrieval phase, although one does not know exactly how yet... Here, the emphasis is more on real-time filtering of large amounts of dynamic data than on document retrieval from large (static) data bases. By incorporating more contextual information, it should be possible to implement a model that can process large amounts of unstructured text without providing the end-user with an overkill of information. 4.0 The Combination As this study is a very multi-disciplinary one, the risk exists that it remains restricted to a surface discussion of many different problems without analyzing one in depth. To avoid this, some central themes, applications and tools are chosen. The themes in this work are self- organization, distributed data representations and context. The applications are NLP and IR, the tools are (variants of) Kohonen feature maps, a well known model from neural network research. Self-organization and context are more related to each other than one may suspect. First, without the proper natural context, self-organization shall not be possible. Next, self-organization enables one to discover contextual relations that were not known before. Distributed data representation may solve many of the unsolved problems in NLP and IR by introducing a powerful and efficient knowledge integration and generalization tool. However, distributed data representation and self-organization trigger new problems that should be solved in an elegant manner. Both NLP and IR work on symbolic language. Both have properties in common but both focus on different features of language. In NLP hierarchical structures and semantical features are important. In IR the amount of data sets the limitations of the methods used. However, as computers grow more powerful and the data sets get larger and larger, both approaches get more and more common ground. By using the same models on both applications, a better understanding of both may be obtained. Both neural networks and statistics would be able to implement self-organization, distributed data and context in the same manner. In this thesis, the emphasis is on Kohonen feature maps rather than on statistics. However, it may be possible to implement many of the techniques used with regular sequential mathematical algorithms. So, the true aim of this work can be formulated as the understanding of self-organization, distributed data representation, and context in NLP and IR, by in depth analysis of Kohonen feature maps. ============================================================================== From corpora-request@uib.no Wed Jun 9 01:58:00 1993 Date: Wed, 9 Jun 93 11:58:00 -1000 To: corpora@hd.uib.no From: awaywood@christ.acu.edu.au Subject: anxiety words-frequency of usage I am a Clinical Psychologist, sending a message via Andrew Waywood, looking at anxiety in children. As part of a series of studies, I am wanting to establish a list of words which are anxiety provoking for children at a variety of ages. At the very least I would like to find words which are anxiety provoking for both 8 and 14 year olds. I will need about 10 words relating to social threat, 10 relating to physical threat, 20 positive words but on the same physical and social themes, and 20 entirely neutral words. It is important that all the words are used as often in the English language, that the words can be matched with those of the other types (that is, anxiety Vs. positive Vs. neutral) , and that all would be easily understood by the average 8 year old. If someone can direct me to references which could answer these questions, I would be most appreciative. Please contact Simon Kennedy on: skennedy@christ.acu.edu.au Andrew Waywood awaywood@christ.acu.edu.au Australian Catholic University Christ Campus PO Box 213, Oakleigh, 3166. Australia. From corpora-request@uib.no Thu Jun 10 17:02:33 1993 Date: Thu, 10 Jun 1993 15:02:33 +0200 From: Knut Hofland To: corpora Subject: ICAME Journal no. 17, now out ICAME Journal no. 17 has now been printed and is sent to the subscribers. Information on how to subscribe can be fetched by sending the following line to FILESERV@HD.UIB.NO send icame icame.journal Back issues are available, for a table of contents, send the line send icame icame.journal.85-92.toc to the same address. Contents ICAME Journal No. 17, April 1993 Articles: Roger Garside: The marking of cohesive relationships: Tools for the construction of a large bank of anaphoric data p. 5 Junsaku Nakamura: Quantitative comparison of modals in the Brown and LOB corpora p. 29 Jacques Noel: Adjectives and nouns with reported clauses p. 49 Raymond Hickey: Corpus data processing with Lexa p. 73 Steve Fligelstone: Some reflections on the question of teaching, from a corpus linguistics perspective p. 97 Reviews: Jan Aarts and Willem Meijs (eds.): Theory and practice in corpus linguistics (Graeme Kennedy) p. 111 John Sinclair: Corpus, concordance, collocation (Kay Wikberg) p. 114 Shorter notices: David Tiomajou: Designing a corpus of Cameroonian English p. 119 Geoffrey Sampson: The Susanne Corpus p. 125 Anna-Brita Stenstrom and Leiv Egil Breivik: The Bergen Corpus of London Teenager Language p. 128 Christian Mair: Thirteenth ICAME Conference p. 129 Merja Kyto, Matti Rissanen, and Susan Wright: The First International Colloquium on English Diachronic Corpora p. 132 ICAME services: Knut Hofland: The CORPORA distribution list p. 138 Knut Hofland: ICAME file servers p. 139 Texts available through ICAME p. 142 Programs available through ICAME p. 145 The ICAME CD-ROM p. 145 Conditions on the use of ICAME corpus material p. 146 Information for contributors p. 147 From corpora-request@uib.no Sat Jun 12 00:42:36 1993 From: j.guy@trl.oz.au (Jacques Guy) Subject: Verb Phrase (from V. Ooi) To: corpora@hd.uib.no Date: Fri, 11 Jun 1993 14:42:36 +1000 (EST) Vincent Ooi asks: Recently, a Chomskyan linguist asked me to find out whether the following verb phrases (or similar ones), found in many descriptive grammar books, actually do occur in natural language (as exemplified by text corpora): 'might have been being', as in 'This building might have been being built'. or 'He had been being interviewed'. or 'He will have been being examined'. As I don't have access to any large corpus at the moment, do you folks know of such an occurring pattern in the corpora you are familar with? If so, what are these sentences and their frequency of occurrence? If this pattern does not occur (or rarely), I guess the Chomskyan linguist has a point regarding the inadequacy of corpora and the indispensability of arm-chair examples. --------end of quote---------------------------------------- Your Chomskyan colleague will be elated to learn that my corpus contains exactly one of each of those verb phrases, as it consists of the contents of my e-mail. There was a very good chapter on Chomskyan linguistics in a monograph published about 15 to 20 year ago at Uppsala University. It was called something like "Fantastic Linguistics" or "Fantasy in Linguistics". Chomskyan linguistics has this overwhelming virtue that being a clownishly distorted model of language the study and research of its inadequacies, quirks, and general bone-headedness makes it an inexhaustible source of seminal papers ("seminal" as in "semen", as in "wanking") . Does a powerful lot of good to your list of publications. And don't forget to kowtow to the Great Man twenty times at least in the bibliography. From corpora-request@uib.no Fri Jun 11 07:43:31 1993 From: ellooiby@nusunix2.nus.sg (Vincent Ooi) Subject: Verb Phrase To: corpora@hd.uib.no Date: Fri, 11 Jun 93 11:05:06 WST Hi folks, Recently, a Chomskyan linguist asked me to find out whether the following verb phrases (or similar ones), found in many descriptive grammar books, actually do occur in natural language (as exemplified by text corpora): 'might have been being', as in 'This building might have been being built'. or 'He had been being interviewed'. or 'He will have been being examined'. In each case, the pattern is (modal) + have-perfective + be-progressive + be-passive. As I don't have access to any large corpus at the moment, do you folks know of such an occurring pattern in the corpora you are familar with? If so, what are these sentences and their frequency of occurrence? If this pattern does not occur (or rarely), I guess the Chomskyan linguist has a point regarding the inadequacy of corpora and the indispensability of arm-chair examples. Thanks and regards, Vincent -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- |Vincent B Y Ooi | |National University of Singapore | |Dept of English Language & Literature| |Kent Ridge, SINGAPORE 0511 | |INTERNET: ellooiby@nusunix.nus.sg | |BITNET: ellooiby@nusvm.bitnet | -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= From corpora-request@uib.no Fri Jun 11 10:02:58 1993 From: Jem Clear Date: Fri, 11 Jun 93 08:56:47 BST To: corpora@hd.uib.no Vincent Ooi asks: >... the pattern is (modal) + have-perfective + be-progressive >+ be-passive. > >As I don't have access to any large corpus at the moment, do you folks know of >such an occurring pattern in the corpora you are familar with? If so, >what are these sentences and their frequency of occurrence? " I have just searched 120m words of varied modern (mainly post-1990) British (75%) and US (25%) English texts which is part of our "Bank of English" at COBUILD. I've got the following four instances. (The first is interesting in that it occurs as part of a discussion about natural-sounding English!) Jem Clear COBUILD Westmere 50 Edgbaston Park Road Birmingham B15 2RX UK jem@cobuild.collins.co.uk -------- ...with be and be no. . . I I can't get a I can't get a sensible-sounding sentence including be and be. [laughs] Erm [pause] Oh yes you can. . Is is always being Mm. Is always and has been being as well. . Mm is always being followed followed that's right yes you can. . Yes. Yes. . Yes you can. . Yeah. . So you would Mm. [pause] I mean in a sense it is it is... {casual speech} -------- Husband John, 41, an aircraft engineer, immediately suggested they should take their children Matthew, 20, Jayne, 13, and Michael, 11, on holiday to Florida. . Claire, 37, of Crawley, Surrey, has been being playing RICH since it first started but never dreamed she would win. . We have been to Florida before and it was brilliant," she said. . Thanks to this fabulous prize, we can enjoy it all over again. {low-brow daily newspaper} -------- ... he could coach in the National Basketball Association, I would think, be a professional. Kirkpatrick: Well, of course. Siegel: What--what's the allure of college ball, do you think, for him? Kirkpatrick: The allure has been being in, probably, Las Vegas for the last 15 years. He's a millionaire many many times over. He doesn't pay for hardly anything the way most of us know daily expenses. {broadcast radio} -------- There's a step change in the rate of growth of the money supply. Now what will happen to the rate of change in prices. Well initially little little increase but initially people will not have realized that erm the money supply has been being spent so fast and erm initially erm inventories will be strongly run down and there'll be a erm [pause] prices won't adjust immediately. {interview} From corpora-request@uib.no Fri Jun 11 03:28:22 1993 Date: Fri, 11 Jun 93 07:28:22 -0400 From: "Hoyt N. Duggan" To: Vincent Ooi , corpora@hd.uib.no Subject: Re: Verb Phrase See Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, & Jan Svartvik, _A Comprehensive Grammar of the English Language_ (London & NY: Longman, 1985) 151. Dug Duggan hnd@virginia.edu From corpora-request@uib.no Fri Jun 11 06:17:43 1993 From: jorn@chinet.com (Jorn Barger) Subject: Will have been being To: corpora@hd.uib.no Date: Fri, 11 Jun 1993 11:17:43 -0500 (CDT) Around 1968, Christopher Cerf (later of Sesame-Street fame) wrote a humor book called "The World's Largest Cheese" that included a short adventure story written entirely with the "will have been being" verb form. (This book is a lost gem, imho...) Of course, it was a little self-conscious ;^) jorn@chinet.com From corpora-request@uib.no Fri Jun 11 11:39:00 1993 Date: Fri, 11 Jun 93 16:39 EST From: KROVETZ@cs.umass.EDU Subject: Verb Phrase To: corpora@hd.uib.NO I found one instance of the pattern in a collection of legal text (about 300 MB). The sentence is: `Ayers states that he has never received a written reprimand for sleeping but rather his worst punishment has been being sent home for the night.' Bob krovetz@cs.umass.edu From corpora-request@uib.no Sat Jun 12 03:43:18 1993 Date: Sat, 12 Jun 93 00:43:18 +0300 From: jslindst@waltari.Helsinki.FI (Jouko Lindstedt) To: KROVETZ@cs.umass.EDU Subject: Verb Phrase > I found one instance of the pattern in a collection of legal text > (about 300 MB). The sentence is: `Ayers states that he has never > received a written reprimand for sleeping but rather his worst > punishment has been being sent home for the night.' This is actually not an instance of the same pattern -- it is not the punishment that has been sent home! ("Has been being sent" is not a complex verb form here.) Jouko Lindstedt Institutum Slavicum, Universitas Helsingiensis ---------------------------------------------------------------------- Department of Slavonic Languages, University of Helsinki or letters: P.O.Box 4, 00014 University of Helsinki, Finland fax: +358-0-1912974 ---------------------------------------------------------------------- From corpora-request@uib.no Sat Jun 12 06:50:43 1993 Date: Sat, 12 Jun 1993 14:45:56 +1000 From: bert peeters To: corpora@hd.uib.no Subject: Verb phrase In reply to Bob Krovetz (Hi, Bob...): The sentence is: "Ayers states that he has never received a written reprimand for sleeping but rather his worst punishment has been being sent home for the night.' This is not a good example: if you parse it properly ("his worst punishment has been / being sent home for the night") you will see why. --------------------------------------------------------------------- Dr Bert Peeters Tel: +61 02 202344 Department of Modern Languages 002 202344 University of Tasmania at Hobart Fax: 002 207813 GPO Box 252C Bert.Peeters@modlang.utas.edu.au Hobart TAS 7001 Australia From corpora-request@uib.no Sat Jun 12 10:43:05 1993 From: ellooiby@nusunix2.nus.sg (Vincent Ooi) Subject: Verb Phrase - More responses To: corpora@hd.uib.no Date: Sat, 12 Jun 93 16:37:56 WST Thanks to you folks who've responded (so far) to my query re the occurrence of the full verb phrase pattern. More, from Cathy Ball: (edited)....suppose you get access to the LOB corpus and don't actually find any examples. What would that mean? It could mean that the construction you're looking for is rare. The LOB corpus is only a million or so words, which is really not very large compared to say the number of words produced by a given speaker in his/her lifetime. So not to find something in a given corpus just means that we really need many more huge corpora, as I think most people would already agree. ------ Ken Church: I searched over nearly 1/2 billion words of text (e.g., AP news, Hansards, Shakespear, Bible, Wall Street Journal, Brown Corpus, etc) for "been being" and found only two examples. (DOE = Department of Energy abstracts). In my own opinion, this type of language is extremely marked. It is probably only acceptable in a certain kind of very pompous scientific gobbly gook that might be grammatical, but it certainly can't be called good writing. DOE 16808897: The thermal effects , accompaning the deformation process , have [[ been being ]] observed for a long time by the experimentalists and still remain the subject of a large number of publications . DOE 20996252: In order to heighten the performance of turbocharger of the passenger car 's engine , the development to have the turbocharger made of ceramics had [[ been being ]] made . ------ ted@nmsu.edu: i would have taken it just the other way. if i search decades of the wall street journal and never find such a misbegotten sentence, then that rather says something strong about the goofy linguistic approach which assigns equal plausibility to your examples. (btw, i was just talking on the phone and my girlfriend said that she 'might have been getting sick'. is that close enough?) From corpora-request@uib.no Sat Jun 12 16:42:11 1993 Date: Sat, 12 Jun 93 14:42:11 +0200 From: LEUSCHNE To: corpora@hd.uib.no Subject: Verb Phrase I have just joined the list, so I don't know what went on before, but from the (first) messages I received, it seems that the following examples might be interesting too (they have not been found electronically, I'm afraid :-)) ---------------- 1. Work in this aspect of English syntax, which is referred to here under the general heading of 'thematic organization', HAS BEEN BEING PURSUED by linguists in the United States and in Europe for some considerable time, ... (Halliday, 1967, in some article or book on Clause, p. 1) 2. She doen't trust us. I SHALL always BE BEING PUSHED away from him by her. (Galsworthy, quoted by Korsakov, 1969, a book on tenses, if I remember correctly, p. 144 - he said that this was the only example on 50 000 pages of text) 3. A group of teachers on a refresher course once ... noticed that there was a blank in the grammar we were using, against the Passive of the Perfect Continuous Tenses; after some discussion we accepted the verdict, as we could not think, on the spur of the moment, of any possibility of or the need for saying: "I have been being taken." The next morning, while driving in the office car to the course, I said that the car was making a rather peculiar noise. "Yes", answered the friend who was driving, "the car SHOULD HAVE BEEN BEING REPAIRED all this week, but it was needed for this course." The situation called for the pattern, and it emerged without reflection ... (Billows, F.L (1961), The techniques of language teaching, London: Longmans, p. 178) If precise bibliographic citations should be being needed, I can look them up. ___________________________________ And what do you make of the following? Everyone likes to convert people to something they like. Sam was no exception. She was being laughing and loving and little-girlish. I was a sucker for erudite little girls. (Deighton, Funeral, p. 587) ............................................................ : Burkhard Leuschner : : Paedagogische Hochschule, Schwaebisch Gmuend, Germany : : INTERNET: BITNET: : : Burkhard.Leuschner@extern.uni-ulm.de Leuschne@dulruu51 : ............................................................ From corpora-request@uib.no Sat Jun 12 18:24:01 1993 Date: Sat, 12 Jun 1993 16:24:01 +0200 From: Pieter de Haan To: Pieter de Haan , Subject: ICAME XIII Proceedings The following volume has just been published: ENGLISH LANGUAGE CORPORA: DESIGN, ANALYSIS AND EXPLOITATION Edited by Jan Aarts, Pieter de Haan and Nelleke Oostdijk Published by Rodopi, Amsterdam - Atlanta, GA ISBN: 90-5183-517-5 Price: Hfl 100,-/US$ 58.80 The volume can be ordered directly from the publishers: USA/Canada: USA Phone: (404) 523-1964 Fax: (404) 522-7116 All other countries: NL Phone: + 31 20 622 75 07 Fax: + 31 20 638 09 48 ----------------------- This volume contains a selection of the papers read at the Thirteenth Conference on the Use of Computer Corpora in English Language Research (ICAME 13), which was held at Nijmegen, the Netherlands, in June 1992. The selection has been made so as to represent the three major activities involved in the use of computer corpora in linguistic research: the design and compilation of corpora, their grammatical analysis and their subsequent exploitation. Things are moving fast in the discipline of corpus linguistics. This volume therefore follows developments as closely as possible; not only in the chronological sense of publishing an account of the most recent research efforts but also in the sense of paying close attention to the day-to-day practice of corpus linguistic research. ---------------- Contents Preface i I Compilation and design A supplement to the Helsinki Corpus of English Texts: The Corpus of Early American English 3 Merja Kyto The Early Modern English Renaissance Dictionaries Corpus 11 Ian Lancashire In search of history: English language in the eighteenth century 25 Susan Wright Electronic language: A new variety of English 41 Milena Collot and Nancy Belmore International Corpus of Learner English 57 Sylviane Granger Building a corpus of the English of computer science 73 Fang Cheng-yu Encoding the British National Corpus 79 Gavin Burnage and Dominic Dunlop From dirty data to clean language 97 Susan Blackwell The machine-readable Spoken English Corpus 107 Gerry Knowles II Analysis Progress in UCREL research: Improving corpus annotation practices 123 Elizabeth Eyes and Geoffrey Leech Towards a syntactic database: The TOSCA analysis system 145 Hans van Halteren and Nelleke Oostdijk A customized grammar workbench 163 Mark-Jan Nederhof and Kees Koster Undergeneration and robust parsing 181 Ted Briscoe and Nick Waegner Towards a standard format for parsed corpora 197 Clive Souter III Exploitation An object-oriented design for a Corpus Utility Program 215 Akiva Quinn Recurrent verb-complement constructions in the London-Lund Corpus 227 Bengt Altenberg Corpus evidence on some points of usage 247 Pam Peters Idiomaticity in English NPs 257 Henk Barkema A word in time: First findings from the investigation of dynamic text 279 Antoinette Renouf Issues of large-scale collocational analysis 289 Alex Collier Analyzing nominal compounds with the help of a computerized lexical knowledge system 299 Willem Meijs ------------------- From corpora-request@uib.no Sat Jun 12 18:24:26 1993 Date: Sat, 12 Jun 1993 16:24:26 +0200 From: Henry Kucera To: corpora@x400.hd.uib.no Subject: Re: Verb Phrase All complex verb phrases are discussed (and statistics given) in the analysis of the Brown corpus which can be found in the last chapter of Francis & Kucera, Frequency Analysis of English Usage, Boston, 1992. While various long verb groups appear, the longest passive one (e.g. might have been being considered). does not occur at all. Hoping that this is not bad news, regards Henry Kucera From corpora-request@uib.no Sat Jun 12 06:48:20 1993 From: bro@elm.circa.ufl.edu (John Bro) Subject: plausible vs possible VPs To: corpora@hd.uib.no (Corpora List) Date: Sat, 12 Jun 93 10:48:20 EDT > i would have taken it just the other way. if i search decades of the > wall street journal and never find such a misbegotten sentence, then > that rather says something strong about the goofy linguistic > approach which assigns equal plausibility to your examples. ^^^^^^^^^^^^^^^^^^ I'm surprised at this last statement (as well as at Jacques Guy's vehemence in a previous posting). Generative syntax (especially of the Chomskyan variety) says nothing whatsoever about equal "plausibility" of such sentences, but only about the *possibility* of them. Whether the conditions under which such sentences are appropriate arise frequently or not is not the domain of the theory. All it seeks to explain are the (innate) grammatical principles and constraints that permit or block such structures. Furthermore, it is often precisely those structures for which little or no positive evidence is available that are expected to be the most revealing of known but *unlearned* principles. I don't have a corpus example to provide, but I can cook up an example where it seems perfectly felicitous, in which case a generative theory must be able to produce it. A: Why didn't my email to C ever get there? B: Well, the system went down yesterday and it MIGHT HAVE BEEN BEING SENT just as the machine crashed. I dunno, does this sound like misbegotten gobblygook to you? -- John -- ============================================================= John Bro | bro@elm.circa.ufl.edu Linguistics | bro@oak.circa.ufl.edu University of Florida | bro@ufoak.bitnet Gainesville, Fl 32611 | bro@reef.cis.ufl.edu From corpora-request@uib.no Sat Jun 12 19:00:43 1993 Date: Sat, 12 Jun 1993 17:00:43 +0200 From: ted To: HENRY@BROWNVM.brown.edu Subject: Verb Phrase All complex verb phrases are discussed (and statistics given) in the analysis of the Brown corpus which can be found in the last chapter of Francis & Kucera, Frequency Analysis of English Usage, Boston, 1992. While various long verb groups appear, the longest passive one (e.g. might have been being considered). does not occur at all. Hoping that this is not bad news, regards given that the brown corpus is relatively minute (less than 1 % of other commonly available corpora) it isn't surprising that many linguistic phenomena are not observed. From corpora-request@uib.no Sat Jun 12 19:39:21 1993 To: jslindst@waltari.Helsinki.FI (Jouko Lindstedt) Subject: Re: Verb Phrase Date: Sat, 12 Jun 1993 23:39:21 EDT From: David Graff From corpora-request@uib.no Mon Jun 14 13:37:57 1993 Date: Mon, 14 Jun 1993 11:37:57 +0200 From: CORPORA list To: corpora@hd.uib.no Subject: Have been being innundated with have been being sentences ********************* Text Corpora List: Addresses *************************** CORPORA@HD.UIB.NO for messages to the list CORPORA-REQUEST@HD.UIB.NO for messages to list administrator FILESERV@HD.UIB.NO for requests to file server (try sending HELP) LISTSERV@UIB.NO for automatic (un)subscribing (not a fully LISTSERV!) SUB corpora firstname lastname UNSUB corpora ****************************************************************************** Send-date: Sat, 12 Jun 1993 12:52:20 UTC-0400 From: (Robert A Amsler) Subject: Have been being innundated with have been being sentences Gentlemen... (and Gentlewomen), I could see the conflict coming when the gauntlet was thrown down about the significance that could be attached to either no-occurrence or low-frequency of occurrence of some particular syntactic form in the available corpora. Aspersions as to the value of corpora, or the value of the grammatical form itself were immediately conjured up in defense of why things appeared one way or the other. We should be more humble at this stage of our knowledge. We don't know how `representative' any corpus is; and in fact there are great difficulties in even characterizing the parameters along which the corpora represent. Spoken language vs. technical language vs. scholarly text vs. newspaper text; text which has been edited by a given publication for stylistic conformance, for example, might not be representative of anything other than how acceptable some forms are to a certain group of managing editors of a publication. What we have are sufficient examples to satisfy the existence proof stage for the construction and at least some indication that it isn't all that common. We have hypotheses as to why it isn't more common (i.e. could be regarded as ungrammatical or at least uneasily grammatical by some fluent speakers; might be systematically eliminated by editorial grammarians seeking a more populist style; etc.) What impresses me most is the speed whereby the first examples were found. Clearly corpus-based linguistics passed one test; that it has now reached the stage where the existence proof for unusual or hypothetical formations can be tested quickly by those with access to such resources. I think if ever there was proof of a paradigm shift, this was it. It is important to incorporate into one's speculations about language the fact that now it is readily determined if and what examples exist. Non-existence doesn't yet seem to imply much because we can't establish any claims of representativeness of what we search or how much we find; but some positive occurrences do make a difference and one which needs to be included in theoretical judgement. From corpora-request@uib.no Mon Jun 14 14:40:03 1993 Date: Mon, 14 Jun 1993 15:23 IST From: Ron Kuzar Subject: Basic Info To: CORPORA Dear CORPORA netters, I am new to this list and have been (being) observing it now quietly for several weeks, in order to try and figure out what I can passively learn without bothering you, people. However, I have realized that I need some more active help. I am mainly interested in syntax, and would like to introduce computer based corpus work into my syntax course next year. This would, obviously, involve getting hold of some corpus/corpora and some toolkit(s). These are my questions: (a) Tagging 1. Are there corpora available in both tagged and untagged forms? 2. How theoretically-specific is tagging? 3. Are texts tagged for classification categories or functional ones, or both in the same project? (b) Availability 4. Are corpora and tools available as free-ware? Where? 5. Is there a difference in the quality/size between commercial and free software? (c) Hardware 6. If one was about to buy a (personal) computer now, with corpus work in mind, what would be the wildest hardware fantasy, taking foreseeable future developments of the field in mind? What would be the priorities in cutting down this dream to fit budget restrictions? (d) Bibliography 7. Is there an introduction into corpus work in electronic form? 8. Any other bibliographical suggestions? Knowing so little about it, I would like to get answers also to my unasked questions. Since this information may be repetitious on one hand, and trivial for many netters on the other hand, but also valuable to novices like me, I suggest that contributors send their answers to my private address, and I will summarize them in a couple of weeks, as is customary on other lists. I will credit the contributors, unless instructed otherwise. Thanks Ron Kuzar ---- soukr at hujivm1.bitnet From corpora-request@uib.no Mon Jun 14 05:41:13 1993 To: Ron Kuzar Subject: Re: Basic Info Date: Mon, 14 Jun 1993 09:41:13 EDT From: David Graff Hello, Christian, I believe the discs in this first shipment contain no spontaneous speech. The design of the file structure is such that spontaneous dictation (to be done by journalists only) is to be stored within directories named "si_tr_jd", while readings of sentences by journalists is to be kept in directories named "si_tr_j"; only the latter appears on disc 13-7.1. As for the difference between short-term and long-term speakers, I don't have the exact specifications in front of me, but my recollection is that the "short-term" set was intended to get relatively small samples from a large quantity of speakers (total of 80 hours from 200 people), while the "long-term" set obtains large amounts of readings from a smaller number of speakers (total of 65 hours from 25 people). Apart from this, there is no intentional systematic difference that I am aware of. Best regards, Dave Graff From corpora-request@uib.no Mon Jun 14 05:44:25 1993 To: Ron Kuzar Subject: Re: Basic Info Date: Mon, 14 Jun 1993 09:44:25 EDT From: David Graff Ron, I just realized that I sent you a message intended for someone else... My mouse input was working faster than expected. Please ignore the previous message, and accept my apologies for the noise. Dave Graff From corpora-request@uib.no Mon Jun 14 05:30:44 1993 From: andras Subject: Re: Verb Phrase (from V. Ooi) To: j.guy@trl.oz.au (Jacques Guy) Date: Mon, 14 Jun 1993 12:30:44 -0700 (PDT) Jacques Guy writes: > Chomskyan linguistics has this overwhelming > virtue that being a clownishly distorted model of language Jacques don't do this. If an SIL-type asked the question you would probably not sneer at their clownishly distorted model of the world. Chomskyan or not, s/he is doing the Right Thing when he asks whether certain constructions are actually attested. As it happens, the construction type the inquiries are about are mentioned in a lot of anti- or even pre-Chomskyan grammars of English, so this is not the time to start a religious war. Andras From corpora-request@uib.no Tue Jun 15 17:07:39 1993 Date: Tue, 15 Jun 93 15:07:39 +0200 To: corpora@hd.uib.no From: ursula.doleschal@wu-wien.ac.at Subject: Linguistic Data on Diskette Service I would like to advertise the following possibility of publication on behalf of Ulrich Lueders, linguist and publisher (but without e-mail): LDDS-Linguistic Data on Diskette Service a new service by LINCOM Europa LINCOM Europa offers a new service to linguistic researchers. Linguistic data of the following kind are published and distributed on diskette: + word lists + data for comparative language studies + dictionaries + bibliographical surveys + encyclopedical collections + Linguistic doctoral dissertations and M.A. theses of all linguistic disciplines are distributed on diskette by Linguistic Data on Diskette Service, too. The advantage of this service is the faster access to hitherto unpublished material and facilitation of discussion. Data in all languages are accepted for publication on diskette! Send information on your data on diskette to: LINCOM EUROPA, P.O. Box 1316, D-8044 (from July 1993 on D-85703) Unterschleissheim/Muenchen Germany. Fax: ++49-89-314 89 09 Ursula Doleschal Tel.: ++43-1-31336 4115 Inst. f. Slawische Sprachen Fax: ++43-1-31336 744 Wirtschaftsuniv. Wien Augasse 9 Austria 1090 Wien Europe From corpora-request@uib.no Tue Jun 15 17:37:57 1993 Date: Tue, 15 Jun 1993 15:37:57 +0200 From: jibagbee To: CORPORA@hd.uib.no Subject: Query: Linguistic Tools for french Hallo everybody, I am researching on the use of semantic information for automatic error correction. My previous work has been done for Basque, and I would like to investigate the matter for french, but I need some data and tools. I would like to know about the availability of the following: 1) tagged corpora which could be 2) syntactically analyzed corpus of the style of TREEBANK for english OR a robust parser which performs superficial syntactic analysis (to be used with the corpora) I will appreciate any information which could lead me to such data and tools, as well as information about price (if any) of licence, etc. Thanks in advance, Eneko Agirre Informatika Fakultatea Basque Country University jibagbee@si.ehu.es From corpora-request@uib.no Tue Jun 15 21:40:18 1993 Date: Tue, 15 Jun 1993 19:40:18 +0200 From: " (Beatrice Santorini)" To: " (corpora list)" Subject: scanning technology i am looking to buy a scanner to read texts in german and yiddish (which is written in the hebrew alphabet) and would appreciate any and all information and tips. the scanner needs to be compatible with a mac IIci and/or a sun sparc station. please reply to me privately---i'll be happy to summarize for the list. thanks. beatrice santorini From corpora-request@uib.no Tue Jun 15 12:00:26 1993 Date: Tue, 15 Jun 1993 16:00:26 -0400 (EDT) From: Cathy Ball Subject: Mean clause lengths for LOB? To: corpora@hd.uib.no I'm investigating the reliability of frequency studies of syntactic phenomena that are based on number of words rather than number of clauses in a text. Can anyone point me to an analysis of the LOB or Lundon-Lund Corpus in terms of mean clause length by text type? I do have the figures for the Brown Corpus ... Thanks. -- Cathy Ball (Georgetown) From corpora-request@uib.no Thu Jun 17 17:41:12 1993 From: Antoinette Renouf Date: Thu, 17 Jun 93 16:11:07 BST To: corplst@hd.uib.no Subject: Re: EFL corpora & tagging Dear Sir or Madam Please tell me how to get a copy of your concordancer. Thank you Antoinette Renouf From corpora-request@uib.no Tue Jun 22 11:30:29 1993 From: "Henry S. Thompson" Date: Tue, 22 Jun 93 10:11:15 BST To: lou@vax.ox.ac.uk Subject: Re: Help: German/English corpora? oops, previous message meant for your correspondant, to whom I've now sent a copy. From corpora-request@uib.no Sat Jun 22 17:48:00 1993 Date: 22 Jun 93 17:48 GMT From: D1634@applelink.apple.com (Circle Noetic Svc, A Nizhnikov,PAS) Subject: Solipsism To: CORPORA@HD.UIB.NO I hope my request for the following information is not ill-placed but I am hoping that there are some philosopher/linguist subscribers who can help. A friend of mine is writing a book and is looking for the origin of the word "Solipsism". My limited knowledge of the word is that it is a philosophy term which states that existence is limited to the human mind's thoughts. Thinking is it - we know nothing other than what we think. I believe Renee Descartes (17th C. French philosopher??) had the thinking and work done on this idea/concept, but he didn't have the label 'solipsism'. Who chose the label "Solipsism"? What other background and origin does the word have? Any help you can offer would be gratefully acknowledged. Please send your responses to my e-mail address at D1634@Applelink.Apple.Com@INTERNET# and in a couple of weeks I will post a summary of the information I receive. Sincerely, Gillian Smith From corpora-request@uib.no Tue Jun 22 13:36:45 1993 Date: Tue, 22 Jun 1993 17:36:45 -0400 From: haraldra@sbcs.sunysb.edu (Harald Rau) To: corpora@hd.uib.no Subject: Sentence rejecter I work on a system which is supposed to transmit English text. Due to noise the receiver might not be able to determine the intended words clearly. He might be left with a couple (<10) of suggestions per word slot. So the number of possible word sequences might be big for long sentences although only a few could be grammatically correct sentences. I am looking for a sentence rejecter/recognicer that allows to decrease the number of sequences that have to be considered. Any help you could offer would be gratefully acknowledged. Please send responses to haraldra@sbcs.sunysb.edu Harald Rau Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4000 From corpora-request@uib.no Tue Jun 22 17:36:01 1993 Date: Tue, 22 Jun 93 17:36:01 GMT-0600 From: bruce@ludwig.pmad.uic.edu (Bruce Lambert) To: D1634@applelink.apple.com (Circle Noetic Svc, A Nizhnikov,PAS), Subject: Re: Solipsism Hi, About the origin of the word 'solipsism' the Webster's Ninth New Collegiate Dictionary says: n[L solus (alone) + ipse (self)] (1874): a theory holdiong that the self can know nothing but its own modifications and that the self is the only existent thing. Hope that helps. Bruce Lambert From corpora-request@uib.no Tue Jun 22 18:36:43 1993 From: jorn@chinet.com (Jorn Barger) Subject: Corpora of Usenet postings? To: corpora@hd.uib.no Date: Tue, 22 Jun 1993 23:36:43 -0500 (CDT) Is anyone grabbing a day's worth of Usenet, now and then, for research purposes? I'd be very interested in a word-frequency analysis. Usenet culture is evolving so fast that one could graph the rise and fall of slang... Also, someone mentioned that the Icon language is available free for Macintosh somewhere. Can someone give me specific address and directory (and name) info, so I can ftpmail it to myself? (Incidentally, the immediate stimulus for this message was seeing a series of typos where people dropped the letter "r", which I read somewhere was a sign of repressed aggression, and I was wondering how easy it would be to see if 'r' is dropped most frequently... ;^) jorn@chinet.com From corpora-request@uib.no Tue Jun 22 19:55:00 1993 Date: Wed, 23 Jun 1993 00:55 EST From: "Tanto, quanto--Inigo de Loyola (1491-1556)" To: corpora@hd.uib.no set listserv@hd.uid.no repro From corpora-request@uib.no Tue Jun 22 20:46:00 1993 Date: Wed, 23 Jun 1993 01:46 EST From: "Tanto, quanto--Inigo de Loyola (1491-1556)" To: corpora@hd.uib.no Hello! I found the following explanation of the term 'solipsism': "SOLIPSISM: (Lat. solus, alone + ipse, self) From corpora-request@uib.no Sun Jun 23 04:45:56 1993 Date: 23 Jun 93 08:45:56 EDT From: Malcolm.Brown@Dartmouth.EDU (Malcolm Brown) Subject: Re: Solipsism To: corpora@hd.uib.no Just for the record, here's the entry from the OED. solipsism ('salIpsIz({)m). Metaph. [f. L. so $D9l-us alone + ipse self.] The view or theory that self is the only object of real knowledge or the only thing really existent. Also, = egoism 1, and in weakened sense. 1874 A. C. Fraser Sel. from Berkeley 47 Ueberweg suggests that Berkeley's reasoning implies that we can know only our own notions of what we call other spirits-thus leading, by a reductio ad absurdum, to Egoism or Solipsism. A. 1881 A. Barratt Phys. Metempiric (1883) 25 At any rate, Solipsism, if not inconceivable, is in the highest degree incredible. 1884 Contemp. Rev. Feb. 294 As long as we confine ourselves to the world given in experience..we must profess solipsism. 1895 Month May 27 Under pain of `solipsism', of being shut up within our own subjectivity.1978 Poetry Aug. 298 The deep underlying motive of Mark Strand's poetry is solipsism or loneliness of the individual imagination. Hence solip'sismal a. 1892 G. M. McCrie Miss Naden's World-Scheme 28 The existence of `other selves', being secondarily inferred, in no way touches the prime fact of solipsismal monism. From corpora-request@uib.no Wed Jun 23 17:22:46 1993 Date: Wed, 23 Jun 1993 16:22:46 +0100 To: jorn@chinet.com (Jorn Barger) From: eytan@dpt-info.u-strasbg.fr (Michel Eytan, LILoL) Subject: Re: Corpora of Usenet postings? [...] > >Also, someone mentioned that the Icon language is available free for >Macintosh somewhere. Can someone give me specific address and directory >(and name) info, so I can ftpmail it to myself? > [...] >jorn@chinet.com Reproduced (without permission) from a response I got to a query: >We have 2 flavors available for anonymous ftp from cs.arizona.edu, in the >/icon/packages/macintosh directory: > > source and executables for the Macintosh Programmers Workbench environment > (this is version 8.8 of Icon) > > source and executables for the stand-alone Macintosh (version 8.0 of Icon) > >There is also a commercial package for the Mac called ProIcon, available from > > Catspaw, Inc. > POBox 1123 > Salida, CO 81201-1123 > U.S.A. > > voice: (719) 539-3884 > fax: (719) 539-4830 > > >If you don't have direct FTP access, we have an ftpmail server. Send email >to ftpmail@cs.arizona.edu, with the word > >help > >in the body of the message to receive instructions on using it to access >our ftp area. > > > Cliff Hathaway, Icon Project > Dept. of Computer Science (602)621-4291 > University of Arizona cliff@cs.arizona.edu (internet) > Tucson, Ariz. 85721 {cmcl2,noao,uunet}!arizona!cliff (uucp) -- Michel Eytan, Lab Info, Log & Lang eytan@dpt-info.u-strasbg.fr Dpt Info, U Strasbourg II V: +33 88 41 74 29 22 rue Descartes, 67084 Strasbourg FR F: +33 88 41 74 40 From corpora-request@uib.no Thu Jun 24 07:41:12 1993 Date: Thu, 24 Jun 1993 08:23 IST From: Ron Kuzar Subject: Rerun of Query To: CORPORA I know this feeling:'there is a trivial question on the list, I am sure that there will be enough replies without my own input, so why bother?' Well... this was *not* the case with my query, so here it is again. B.t.w., in response to one answer, I am not a computer person, and I am not interested in producing new parsers etc., just to be a knowledgable user of the corpora and the tools. Original message follows: Dear CORPORA netters, I am new to this list and have been (being) observing it now quietly for several weeks, in order to try and figure out what I can passively learn without bothering you, people. However, I have realized that I need some more active help. I am mainly interested in syntax, and would like to introduce computer based corpus work into my syntax course next year. This would, obviously, involve getting hold of some corpus/corpora and some toolkit(s). These are my questions: (a) Tagging 1. Are there corpora available in both tagged and untagged forms? 2. How theoretically-specific is tagging? 3. Are texts tagged for classification categories or functional ones, or both in the same project? (b) Availability 4. Are corpora and tools available as free-ware? Where? 5. Is there a difference in the quality/size between commercial and free software? (c) Hardware 6. If one was about to buy a (personal) computer now, with corpus work in mind, what would be the wildest hardware fantasy, taking foreseeable future developments of the field in mind? What would be the priorities in cutting down this dream to fit budget restrictions? (d) Bibliography 7. Is there an introduction into corpus work in electronic form? 8. Any other bibliographical suggestions? Knowing so little about it, I would like to get answers also to my unasked questions. Since this information may be repetitious on one hand, and trivial for many netters on the other hand, but also valuable to novices like me, I suggest that contributors send their answers to my private address, and I will summarize them in a couple of weeks, as is customary on other lists. I will credit the contributors, unless instructed otherwise. Thanks Ron Kuzar ---- soukr at hujivm1.bitnet  From corpora-request@uib.no Sat Jun 26 04:21:29 1993 Date: Sat, 26 Jun 1993 02:21:29 +0200 From: " (Mari Ostendorf)" To: corpora@x400.hd.uib.no Subject: speech corpora I am compiling information on transcribed speech corpora that are currently available, and it was suggested that I try this mailing list. Please let me know if you have any information about corpora or other places I could contact. Thanks, Mari Ostendorf From corpora-request@uib.no Sat Jun 26 08:20:37 1993 Date: Sat, 26 Jun 1993 09:08 IST From: Ron Kuzar Subject: Summary: general info To: CORPORA Dear CORPORA netters, I have posted the following query twice, and got 8 responses. There is tons of information here. I suggest that if people would like to further discuss the issues brought up here, that they do it straight to the list, since this will be more of the nature of discussion, not just information. If any further information appears in my account, I will post a supplementary summary. Since nobody asked to remain anonymous, I have left the 'from' line of each message. On behalf of those who benefit from this info I would like to express our appreciation to those who invested their time in enriching their colleagues knowledge. Thank you. Ron Kuzar ================== original message ===================== Dear CORPORA netters, I am new to this list and have been (being) observing it now quietly for several weeks, in order to try and figure out what I can passively learn without bothering you, people. However, I have realized that I need some more active help. I am mainly interested in syntax, and would like to introduce computer based corpus work into my syntax course next year. This would, obviously, involve getting hold of some corpus/corpora and some toolkit(s). These are my questions: (a) Tagging 1. Are there corpora available in both tagged and untagged forms? 2. How theoretically-specific is tagging? 3. Are texts tagged for classification categories or functional ones, or both in the same project? (b) Availability 4. Are corpora and tools available as free-ware? Where? 5. Is there a difference in the quality/size between commercial and free software? (c) Hardware 6. If one was about to buy a (personal) computer now, with corpus work in mind, what would be the wildest hardware fantasy, taking foreseeable future developments of the field in mind? What would be the priorities in cutting down this dream to fit budget restrictions? (d) Bibliography 7. Is there an introduction into corpus work in electronic form? 8. Any other bibliographical suggestions? Knowing so little about it, I would like to get answers also to my unasked questions. Since this information may be repetitious on one hand, and trivial for many netters on the other hand, but also valuable to novices like me, I suggest that contributors send their answers to my private address, and I will summarize them in a couple of weeks, as is customary on other lists. I will credit the contributors, unless instructed otherwise. Thanks Ron Kuzar ---- soukr at hujivm1.bitnet =============== response #1 ======================= From: "Richard L. Goerwitz" If you are going to purchase a computer for corpus work, I'd say that it depends wholly on what you want to do. Are you a canned application user or do you program? If you program, do you prefer a windowing en- vironment or are you able to use command-line interfaces as well? What languages do you program in? Are you in a networked environment, or are you planning on keeping this machine isolated at home? Lotsa questions. -Richard ============== response #2 ====================== From: Cathy Ball Dear Ron, Hope you get many useful responses to your query ... I just taught a seminar on computational tools and corpus analysis, and I could send you some of the materials, but it would be tough to summarize in one message! You might look into Judith Klavans tutorial at the summer LSA/ACL if you're out that way ... in the meantime, not a bad place to start is with the Oxford Companion to the English Language, s.v. 'corpus'. -- Cathy Ball (Georgetown) ============= response #3 ========================== From: Dr Lindsay Evett Dear Ron, the best place to start is by getting in touch with: Humanistisk Datasenter, Norwegian Computing Centre for the Humanities, Harald Harfagresgt. 31, N-5007 Bergen, Norway They have many Corpora available at roughly cost price, including tagged and untagged versions of the LOB, and a parsed corpus. The Oxford Text Archive are also a very good source; cant remember their email adress for the minute, if someone else doesnt send it to you, get back to me. I assume you've seen R. Garside, G. Leech and G. Sampson (Eds) "The Computational Analysis of Englich: A Corpus-Based Approach" Longman, 1987 Also well worth a look at is the thesis of one of my students: F. G. Keenan, "Large Vocabulary Syntactic Analysis for Text Recognition", Nottingham Trent University, 1993, which will be on the shelves very soon, and contains an awk program for mapping between tag sets. Lindsay ps The Norway place also produces the ICAME Journal which is very useful =============== response #4 ============================ From: Oliver Christ Hi, in reply to your question to corpora@hd.uib.no. I'm quite new to the field also, so that I can only partially answer your questions (and not in depth, I suppose). Some of your questions should go into an FAQ, but until now no one has written one. I send my answers to you directly, since they are surely only partial. >>>>> On Thu, 24 Jun 1993 08:23 IST, Ron Kuzar said: Ron> (a) Tagging Ron> 1. Are there corpora available in both tagged and untagged forms? In principal you can generate the non-tagged version of a corpus with simple tools like awk, sed, grep, perl etc. There are several tagged corpora, for example the PENN TREEBANK on the ACL/DCI CD ROM, the Susanne Corpus (you may get it from ftp.uu.net) and some others, I think. To get very much text in short time, ftp to ftp.uu.net and look in the etext directories (several GByte). Ron> 2. How theoretically-specific is tagging? That depends on the corpus, the tagging method and on the intended use of the tagged corpus. Look for example at the texts in pub/treebank/parsing and pub/treebank/postexts/guide-to-tagging.ps on linc.cis.upenn.edu (anonymous ftp). Ron> 3. Are texts tagged for classification categories or functional ones, or both Ron> in the same project? I think both, but this heavily varies between the corpora. Ron> 4. Are corpora and tools available as free-ware? Where? Mainly via anonymous ftp to hosts all over the world, also on nora.hd.uib.no. Very much corpus material is also on the ACL/DCI CD ROM (contact Mark Liberman [myl@unagi.cis.upenn.edu], he is responsible, as far as I remember). It costs about $25. In the next few months a CD with European corpora is about to be published, contact Susan Warwick for details [susan@divsun.unige.ch]. Ron> 5. Is there a difference in the quality/size between commercial and free Ron> software? I don't know very much about commercial corpus tools. There are some commercial query/concordancing tools (PET from UWaterloo, for example), but due to our limited budgets, we don't have any. Ron> 6. If one was about to buy a (personal) computer now, with corpus work in Ron> mind, what would be the wildest hardware fantasy, taking foreseeable Ron> future developments of the field in mind? What would be the priorities in Ron> cutting down this dream to fit budget restrictions? In my eyes, a personal computer is a too limited architecture for corpus processing, since you often will need several gigabytes of external (harddisk) storage and some 16-64 Mbytes of internal RAM. That leads quickly to Unix ... (Sun Sparcstations or compatibles). Additionally, you will need a couple of MIPS to get through your texts... Ron> 7. Is there an introduction into corpus work in electronic form? The field is still too young, and I don't know of a published introduction to this area (if you get any answers to this question, please tell me!). I'll include a small bibliography (incomplete!) at the end of this mail. Ron> 8. Any other bibliographical suggestions? See above. Ron> I will summarize them in a couple of weeks, as is customary on other lists. Yes, please! Hope this helps you a little bit, Greetings, Oli --------------------------------------------------------------------------- Oliver Christ Institute for Natural Language Processing, University of Stuttgart, Germany oli@ims.uni-stuttgart.de/christ@is.informatik.uni-stuttgart.de --------------------------------------------------------------------------- ====================== BIBLIOGRAPHY, incomplete!!! ================ % -------------------------------------------------------------------- % % IMS Corpus BibTeX-Database % % Matthias Heyn % Oliver Christ % % Changes, errors and requests for new entries should be % sent to oli at ims.uni-stuttgart.de or heyn at ims.uni-stuttgart.de % % Last Modified: Wed Jan 20 14:25:27 1993 (oli) % % -------------------------------------------------------------------- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% some definitions @string{and = " and "} @string{etal = " and others"} @string{toappear = "(to appear)"} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% authors @string{church = "Church, Kenneth W."} @string{gale = "Gale, William A."} @string{hanks = "Hanks, Patrick"} @string{hindle = "Hindle, Donald M."} @string{liberman = "Liberman, Mark Y."} @string{rooth = "Rooth, Mats"} @string{yarowsky = "Yarowsky, David"} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% institutions @string{att = "AT\&T Bell Laboratories, Murray Hill, NJ, USA"} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Conference Proceedings @Proceedings{ACL28, title = "28th Annual Meeting of the Association for Computational Linguistics. Proceedings of the Conference. 6--9 June 1990. University of Pittsburgh, Pennsylvania, USA.", year = 1990, key = "Proc. 28th ACL (1990)" } @Proceedings{ACL29, title = "29th Annual Meeting of the Association for Computational Linguistics. Proceedings of the Conference. 18--21 June 1991. University of California, Berkeley, California, USA.", year = 1991, key = "Proc. 29th ACL (1991)" } @Proceedings{NOED:90, title = "Proceedings of the Sixth Annual Conference of the UW Centre for the New OED and Text Research -- Electronic Text Research", year = 1990, organization = "University of Waterloo, Canada", key = "Proc. 6th NOED (1990)", } @Proceedings{NOED:91, title = "Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research -- Using Corpora", year = 1991, organization = "University of Waterloo, Canada", key = "Proc. 7th NOED (1991)", } @Proceedings{NOED:92, title = "Proceedings of the Eighth Annual Conference of the UW Centre for the New OED and Text Research -- Screening Words: User Interfaces for Text", year = 1992, organization = "University of Waterloo, Canada", key = "Proc. 8th NOED (1992)", month = "October 18--20" } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Beginning of Bibliography @Unpublished{Amsler-Tompa:xx, author = "Amsler, Robert A. and Tompa, Frank W.", title = "An {SGML}-based Standard for English Monolingual Dictionaries", note = "Centre for the New Oxford English Dictionary" } @InProceedings{Atkins:92, author= "Atkins, B.T. Sue", title = "Tools for computer-aided corpus lexicography: the Hector Project", crossref = "Complex:92", pages = "3--59" } @Article{Bahl-etal:89, author = "Bahl, Lalit R. and Brown, Peter F. and Souza, Peter V. and Mercer, Robert L.", title = "A Tree-Based Statistical Language Model for Natural Language Speech Recognition", journal = "IEEE Transactions on Acoustics, Speech and Signal Processing", year = 1989, volume = 37, number = 7, month = jul } @InProceedings{Bahl-etal:90, author = "Bahl, L.B. and Brown, P.F. and de Souza, P.V. and Mercer, R.L.", title = "A tree-based statistical language model for natural language speech recognition.", booktitle = "Readings in Speech Recognition", year = 1990, editor = "Waibel, A. and Lee, K.-F.", pages = "507--514", publisher = "Morgan Kaufman", address = "San Mateo CA" } @InProceedings{Blasi-Koch:92, author = "Bl{\"a}si, Christoph and Koch, Detlev", title = "Dictionary Entry Parsing Using Standard Methods", crossref = "Complex:92", pages = "62--79" } @InProceedings{Brent:91, author = "Brent, Michael R.", title = "Automatic Acquisition of Subcategorization Frames from Untagged Text", crossref = "ACL29", pages = "209--213", } @InProceedings{Brill:92, author = "Brill, Eric", title = "A Simple Rule-Based Part of Speech Tagger", crossref = "3CANLP", pages = "152--155", } @InProceedings{Brown-Lai-Mercer:91, author = "Brown, Peter and Lai, Jennifer and Mercer, Robert", title = "Aligning Sentences in Parallel Corpora", crossref = "ACL29", pages = "169--176" } @Booklet{Brown-etal:91, author = "Brown, Peter F. and {Della Pietra}, Stephen A. and {Della Pietra}, Vincent J. and Mercer, Robert L.", title = "The Mathematics of Machine Translation: Parameter Estimation", month = may, year = 1991 } @InProceedings{Brown-etal:91a, author = "Brown, Peter and Della Pietra, Stephen and Della Pietra, Vincent and Mercer, Robert", title = "Word Sense Disambiguation using Statistical Methods", crossref = "ACL29", pages = "265--270" } @Article{Brown-etal:91b, author = "Brown, P. and Cocke, J. and Della Pietra, V. and Jelinek, J. and Lafferty, J. and Mercer, R. and Roossin, P. ", title = "A Statistical Approach to Machine Translation", journal = "Computational Linguistics", year = 1991, volume = 16, pages = "79--85", note = "-FEHLT-" } @Article{Brown-etal:92, author = "Brown, Peter F. and {Della Pietra}, Vincent J. and Mercer, Robert L. and {Della Pietra}, Stephen A. and Lai, Jennifer C.", title = "An Estimate of an Upper Bound for the Entropy of English", year = 1992, journal = "ACL?" } @TechReport{Burnard:91, author = "Burnard, Lou", title = "What is {SGML} and how does it help?", institution = "Text Encoding Initiative (TEI)", year = 1991, note = "TEI Document EDW25" } @TechReport{Catizone-Russell-Warwick:91, author = "Catizone, Roberta and Russell, Graham and Warwick-Armstrong, Susan", title = "Identifying Word Correspondences in Parallel Texts", institution = "AT\&T Bell Laboratories", year = 1991, type = "Technical Memorandum", number = "20878,60011" } @InCollection{Catizone-Russell-Warwick:92, author = "Catizone, Roberta and Russell, Graham and Warwick-Armstrong, Susan", title = "Deriving Translation Data from Bilingual Texts", booktitle = "Lexical Acquisition. Using on-line Resources to Build a Lexicon", publisher = "Lawrence Erlbaum", year = "1992 ???", editor = "Zernik", pages = "???" } @InProceedings{Church-Gale:91, author = church # and # gale, title = "Concordances for Parallel Text", crossref = "NOED:91", pages = "40--62", } @Article{Church-Hanks:90, author = church # and # hanks, title = "Word Association Norms, Mutual Information, and Lexicography", journal = "Computational Linguistics", year = 1990, volume = 16, number = 1, pages = "22--29" } @Article{Church-Liberman:xx, author = church # and # liberman, title = "A status report on the {ACL/DCI}" } @TechReport{Church-etal:91, author = church # and # gale # and # hanks # and # hindle, title = "Using statistics in lexical analysis", year = 1991, institution = "AT\&T Bell Laboratories", type = "Technical Memorandum", number = "60011,20878", anote = "a0068" } @InProceedings{Church:88, author = church, title = "A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text", booktitle = "2nd Conference on Applied Natural Language Processing", year = 1988, pages = "136--143" } @Proceedings{Complex:92, title = "{COMPLEX}92. Proceedings of the 2nd International Conference on Computational Lexicography, Budapest", year = 1992, key = "Proc. 2nd COMPLEX (1992)", OPTeditor = "Kiefer, Ference and Pajzs, J\'{u}lia" } @InProceedings{Copestake:92, author = "Copestake, Ann", title = "The {ACQUILEX} {LKB}: Representation Issues in Semi-Automatic Acquisition of Large Lexicons", crossref = "3CANLP", pages = "88--95", } @InProceedings{Cutting-etal:92, author = "Cutting, Doug and Kupiec, Julian and Pederson, Jan and Sibun, Penelope", title = "A Practical Part-of-Speech Tagger", crossref = "3CANLP", pages = "133--140", } @Article{DeRose:88, author = "DeRose, Steven", title = "Grammatical Category Disambiguation by Statistical Optimization", journal = "Computational Linguistics", year = 1988, volume = 14, number = 1, pages = "31--39", note = "-FEHLT-", } @InProceedings{Dumitrescu:92, author = "Dumitrescu, Cristian", title = "Paradigmatic Morphology. Morphology Modeling and Lexicon Design with MORPHO-2", crossref = "Euralex:92", pages = "203--212" } @TechReport{Dunlop:92, author = "Dominic Dunlop", title = "The Relationshhip Between the {TEI}.2 Header and the {BNC} Corpus and Text Headers", institution = "Text Encoding Initiative (TEI)", year = 1992, note = "TEI Document TGCW34" } @Proceedings{Euralex:92, title = "EURALEX '92 Proceedings I-II. Papers submitted to the 5th EURALEX International Congress on Lexicography in Tampere, Finland. (Tampere: Tampereen Yliopisto) 1992, two volumes.", year = 1992, key = "Proc. 5th EURALEX (1992)", OPTeditor = "{Tommola, Hannu and Varantola, Krista and Salmi-Tolonen, Tarjan and Schopp, J{\"u}rgen}" } @InCollection{Fillmore-Atkins:92, author = "Fillmore, Charles J. and Atkins, B. T. S.", title = "Starting where the dictionaries stop: the challenge of corpus lexicography", year = 1992, booktitle = "Computational Approaches to the Lexicon", publisher = "Oxford University Press, Oxford, UK", editor = "B. T. S. Atkins and Antonio Zampolli" } @InProceedings{Fontenelle:92, author = "Fontenelle, Thierry", title = "Collocation acquisition from a corpus or from a dictionary: a comparision", crossref = "Euralex:92", pages = "222--228" } @TechReport{Gale-Church-Yarowsky:92, author = gale # and # church # and # yarowsky, title = "A method for disambiguating word senses in a large corpus", institution = att, year = 1992, number = "Statistical Research Reports No. 104", month = mar } @InProceedings{Gale-Church:91a, author = "Gale, William A. and Church, Kenneth W.", title = "A Program for Aligning Sentences in Bilingual Corpora", crossref = "ACL29", pages = "177--184", } @InProceedings{Glassman-etal:92, author = "Glassman, Lucille and Grinberg, Dennis and Hibbard, Cynthia and Meehan, James and Reid, Loretta Guarino and Leunen, Mary-Claire", title = "Hector: Connecting Words with Definitions", crossref = "NOED:92", pages = "38--73" } @Misc{Heid-Martin-Posch:91, author = "Heid, Ulrich and Martin, Willy and Posch, Ilse", title = "Feasibility of standards for collocational description of lexical items", year = 1991, month = jan, note = "Universit{\"a}t Stuttgart, Vrije Universiteit Amsterdam = {\sc Eurotra}-7 Study, document DOC-9/4" } @InProceedings{Heid:92, author = "Heid, Ulrich", title = "Notes on the use of lexical functions for the description of collocations in an {NLP} lexicon", crossref = {MTT:92}, pages = "217--229" } @InProceedings{Heid-Heyn-Christ:92, author = "Heid, Ulrich and Heyn, Matthias and Christ, Oliver", title = "Extracting linguistic information from machine-readable versions of traditional dictionaries -- a metalexicographic method and some tools", booktitle = "Proceedings of COMPLEX-92, Conference on Computational Lexicography an Text Research", year = 1992, address = "Budapest", month = oct } @InProceedings{Heylen-etal:92, author = "{Heylen, Dirk and Humphreys, R. Lee and Warwick-Armstrong, Susan and Calzolari, Nicoletta and Murison-Bowie, Simon}", title = "Collocations and the Lexicalisation of Semantic Operations. {L}exical Functions for Multilingual Lexicons", crossref = "MTT:92", pages = "173--187" } @Book{Heyn:92, author = "Heyn, Matthias", title = "{Zur Wieder\-ver\-wendung ma\-schinen\-lesbarer W\"orterb\"ucher. Eine com\-puter\-unterst\"utzte meta\-lexi\-co\-gra\-phische Stu\-die am Beispiel der elek\-tro\-ni\-schen Edition des Oxford Advanced Learner's Dictionary of Current English}", publisher = "Niemeyer", year = 1992, series = "Lexicographica Series Maior 45", address = "{T\"ubingen}", commentnote = "CLEX" } @InProceedings{Hindle-Rooth:90, author = hindle # and # rooth, title = "Structural Ambibuity an and Lexical Relations", booktitle = "Proceedings of the DAPRA Speech and Natural Language Workshop, June 1990", year = 1990 } @TechReport{Hindle:90a, author = hindle, title = "A Parser for Text Corpora", year = 1990, institution = "AT\&T Bell Laboratories", type = "Technical Memorandum", number = "311401-3399", anote = "a0067" } @InProceedings{Hindle:90b, author = hindle, title = "Noun Classification from Predicate-Argument Structures", crossref = {ACL28}, pages = "268--257" } @InCollection{Jelinek:85, author = "Jelinek, F.", title = "Markov source modeling of text generation", booktitle = "Impact of Processing Techniques on Communication", publisher = "Skwirzinski, F. K.", year = 1985 } @InProceedings{Karlson:92, author = "Karlsson, Fred", title = "Lexicography and Corpus Linguistics", crossref = "Euralex:92", pages = "1--32" } @Unpublished{Kay-Roescheisen:88, author = "Kay, Martin and R{\"o}scheisen, Martin", title = "Text-Translation Alignment", organization = "Xerox Palo Alto Research Center", year = 1988 } @InProceedings{Klavans-Tzoukermann:90, author = "Klavans, Judith L. and Tzoukermann, Evelyne", title = "Linking Bilingual Corpora and Machine Readable Dictionaries with the BICORD System", crossref = "NOED:90", pages = "19--31" } @TechReport{Krovetz:xx, author = "Robert Krovetz", title = "Panel on Corpus Linguistics and Information Retrieval", institution = "Computer Science Dept., Univ. of Massachusetts", year = "??" } @Proceedings{LSKR:91, title = "Lexical Semantics and Knowledge Representation. Proceedings of a Workshop Sponsored by the Special Interest Group on the Lexicon of the Association for Computational Linguistics. 17 June 1991. Berkeley, California", year = 1991, OPTeditor = "Pustejovsky, James and Bergler, Sabine", key = "Proc. Lexical Semantics Wrkshp (1991)", organization = "ACL", } @Proceedings{MTT:92, title = "International Workshop on The Meaning-Text-Theory", year = 1992, OPTeditor = "Haenelt, Karin and Wanner, Leo", number = 671, series = "Arbeitspapiere der GMD", organization = "{Gesellschaft f{\"u}r Mathematik und Datenverarbeitung mbH}", address = "Darmstadt", key = "Proc. Intl. Workshop MTT (1992)", month = jul, } @Proceedings{3CANLP, title = "Third Conference on Applied Natural Language Processing. Association for Computational Linguistics. Proceedings of the Conference. 31 March -- 3 April 1992. Trento, Italy", year = 1992, key = "Proc. 3rd ANLP (1992)", } @InProceedings{Paulussen-Martin:92, author = "Paulussen, Hans and Martin, Willy", title = "{DILEMMA-2}: A Lemmatizer-Tagger for Medical Abstracts", crossref = "3CANLP", pages = "141--146", } @InProceedings{Picchi-Peters-Marinai:92, author = "Picchi, Eugenio and Peters, Carol and Marinai, Elisabetta", title = "The Pisa Lexicographic Workstation: The Bilingual Components", booktitle = "Euralex:92", year = 92, pages = "278--285" } @InProceedings{Sanfilippo-Poznanski:92, author = "Sanfilippo, Antonio and Pozna{\'n}ski, Victor", title = "The Acquisition of Lexical Knowledge from Combined Machine-Readable Dictionary Sources", crossref = "3CANLP", pages = "90--87", } @InProceedings{Sekine-etal:92, author = "Sekine, Satoshi and Carroll, Jeremy J. and Ananiadou, Sofia and Tsujii, Jun'ichi", title = "Automatic Learning for Semantic Collocation", crossref = "3CANLP", pages = "104--110" } @incollection{Sinclair:87, author = "Sinclair, J. McH.", title = "Collocation: A Progress Report", booktitle = "Language Topics. An international collection of papers by colleagues, students an admirers of Professor Michael Hallidays to honour on his retirement.", publisher = "Steele, Ross and Threadgold, Terry", year = 1987 } @Book{Sinclair:91, author = "Sinclair, John", title = "Corpus, Concordance, Collocation", publisher = "Oxford University Press", year = 1991 } @InProceedings{Smadja-McKeown:90, author = "Smadja, Frank A. and McKeown, Kathleen R.", title = "Automatically Extracting and Representing Collocations for Language Generation", crossref = {ACL28}, pages = "252--259" } @Article{Smadja:89, author = "Smadja, F. A.", title = "Lexical Co-occurrence: The Missing Link", journal = "Literary and Linguistic Computing", year = 1989, volume = 4, number = 3, pages = "163--169" } @InProceedings{Smadja:91, author = "Smadja, Frank A.", title = "From N-Grams to Collocations. An Evaluation of {XTRACT}", crossref = "ACL29", pages = "279--284" } @Unpublished{Smith:90, author = "Smith, Joan M.", title = "{SGML} Products and Services", note = "(European Edition)", year = 1990, month = nov } @incollection{Steele-Meyer:92, author = "Steele, James and Meyer, Ingrid", title = "Lexical Functions in an Explantory Combinatorial Dictionary: Kinds, Descriptions, and English Examples", booktitle = "Meaning-Text-Theory. Linguistics, Lexicography, and Implications", publisher = "Steele, James", year = 1992 } @InProceedings{Warwick-Hajic-Russell:90, author = "Warwick, S. and Hajic J. and Russell, G.", title = "Searching on Tagged Corpora. Linguistically Motivated Concordance Analysis", crossref = "NOED:90", pages = "10--18" } @InProceedings{Warwick-Russell:90, author = "Warwick, Susan and Russell, G.", title = "Bilingual Concordancing and Bilingual Lexicography", crossref = "Euralex:92" } @InProceedings{Winarski-Warwick-Hajic:92, author = "Winarski, A. and Warwick-Armstrong, S. and Haji{\v{c}}", title = "Tagging and Alignment of Parallel Texts: Current Status of {BCP}", crossref = "3CANLP", pages = "227--228", } @InProceedings{Wouden:92, author = "van der Wouden, Ton", title = "Prolegomena to a multilingual description of collocations", crossref = "Euralex:92", pages = "449--456" } @InProceedings{deMarcken:90, author = "de Marcken, C.G.", title = "Parsing the LOB Corpus", crossref = {ACL28}, pages = "136--143" } =========== response #5 ====================================== From: "Steve Fligelstone" Dear Ron, I suspect that the reason people didn't answer your query was not so much that they regarded it as trivial, as that your query seemed to necessitate an awful lot of work on someone's part - anyway, as I am not in a position to do an awful lot of work, here is a very brief response: (*) (a) Tagging (*) 1. Are there corpora available in both tagged and untagged forms? yes - LOB is one (1 million words) (*) 2. How theoretically-specific is tagging? not very (*) 3. Are texts tagged for classification categories or functional ones, or (*) both in the same project? Probably both - depends what you mean by function - (*) (b) Availability (*) 4. Are corpora and tools available as free-ware? Where? Yes - TACT and CLAN are two PC programs that cost nothing or nearly nothing. Your first stop has to be ICAME (address give below) -they distribute corpora, programs, documents, bibliographies, a journal, and this mailing list. (*) 5. Is there a difference in the quality/size between commercial and free (*) software? Not systematically. TACT has its fans, so does WordCruncher, which costs. CLAN is boring, non-interactive program - very useful for batch processing though. LMC and MicroConcord are two other proprietary packages which are aimed more at teaching use. LMC is great but it can only handle small files (> 50,000 words) - I haven't really used MicrConcord but my first impressions were that it was a bit slow and dull. For Macs there is something called Free Text Browser (also available through ICAME). Free Text is also available as UNIX program, albeit with a less attractive interface, and the CLAN sources are are said to compile under UNIX using the GCC compiler. (*) (c) Hardware (*) 6. If one was about to buy a (personal) computer now, with corpus work in (*) mind, what would be the wildest hardware fantasy, taking foreseeable (*) future developments of the field in mind? What would be the priorities in (*) cutting down this dream to fit budget restrictions? For personal use: 66 Mhz 486DX2 Processor VESA Local Bus with Graphics Accelerator (S3 or Weitek) and Caching Hard Disk Controller 16 Megabytes of RAM 400 Megabytes of Filestore Tower case (they are quieter than desktops and have lots of expansion capabilities) CD-ROM Drive is useful For Departmental use: Go for a network - store all your corpora on the server and let the connected PCs access them - it's like having a huge hard disk on every machine. We do it in Lancaster and it works fine. For money saving:- The above configuration is to some extent "future-proofed" it may seem excessive now but in a year or two it won't. If you just want to get stated and think about upgrading in a year or two, go for any 486, even a 486SX (unless you plan to do lots of Statistics or CAD - in which case avoid the SX as you need the maths co-processor that the SX lacks), halve the RAM (don't go lower than 8 if you plan to use the machine to run Windows, and Word-Processing and other general- purpose packages), halve the filestore, and forget about the Local Bus and the other "go-faster" bits. This would probably halve the cost of the machine at least. If you are really strapped for cash, you get away with a 386 and even less RAM. The point is that I have specified machines above to act as good, fast, Windows machines. In fact, most Concordancing software is still not Windows-based, and doesn't need all that power. However, if you don't have a machine capable of running Windows well, you are likely to feel disadvantaged very quickly if you intend to use other kinds of software. (*) (d) Bibliography (*) 7. Is there an introduction into corpus work in electronic form? See the Altenberg Bibliography (from ICAME) (*) 8. Any other bibliographical suggestions? (*) Best research surveys are the Rodopi and Mouton anthologies based mainly on proceedings from ICAME conferences. Also: J Sinclair: Corpus, Concordance, Collocation (Oxford) C Butler (Ed) 1992: Computers and Written Texts, Chapter 5: Computers and Corpus Analysis (by me and Geoff Leech) - Blackwell Garside, Sampson and Leech: The Computational Analysis of English: a corpus based approach 1987 Longman (*) Knowing so little about it, I would like to get answers also to my unasked (*) questions. (*) ICAME distribute English Langauge corpora individually, but they now also have a CD-ROM which for a few hundred dollars/pounds gives you a lot of corpora for your money - it includes also the TACT program and the WordCruncher View program. You can copy any or all of these on to a hard disk, though I don't know what the licensing implications of this would be. Best Wishes Steve Fligelstone *======================================================= *= Steve Fligelstone (Research Associate) Linguistics Department *= Unit for Computer Research Bowland College *= on the English Language (UCREL) Lancaster University *= GB-Lancaster LA1 4YT *= email:eia002@uk.ac.lancs.cent1 Tel: (+524) 65201 x3025 *= eia002@lancaster.ac.uk FAX: (0524) 843085 * Address for ICAME: Knut Hofland Norwegian Computing Centre for the Humanities Harald Haarfagres gt. 31 N-5007 Bergen, Norway They also have a file server - send a HELP message to FILESERV@hd.uib.no ========= response #6 ===================================== From: (a) Tagging 1. Are there corpora available in both tagged and untagged forms? in english, yes. in japanese or spanish, you could produce your own by running taggers which are available and then correcting (or not) the output. 2. How theoretically-specific is tagging? about as theoretically-specific as nouns and verbs. if you believe in words, and believe the segmentation of the tagger, then there is generally a reasonable interpretation of the tags in any orthodoxy you care to pick. the tagger probably can't make some distinctions you would like to make, and there are probably some distinctions you don't care about, but otherwise, the tags should be of the sort that you can shoehorn into your particular religion. 3. Are texts tagged for classification categories or functional ones, or both in the same project? (b) Availability care to explain this? the texts which were used as training data for the construe system are available via ftp. these texts are categorized according to the reuters news story classification. is this the sort of categorization you meant? you may also be able to get access to the muc and tipster texts which have some text category markings. 4. Are corpora and tools available as free-ware? Where? not really. since the corpora are copyrighted, you need to sign a license agreement at the least. the software for pos tagging and such is generally freeware or nearly such. 5. Is there a difference in the quality/size between commercial and free software? there really isn't any commercial software for corpus analysis. (c) Hardware 6. If one was about to buy a (personal) computer now, with corpus work in mind, what would be the wildest hardware fantasy, taking foreseeable future developments of the field in mind? What would be the priorities in cutting down this dream to fit budget restrictions? this depends on your budget restrictions. my own preference would be for a powerful workstation, not a pc. if you absolutely have to have a pc, then you probably have strong opinions about the flavor, too. assuming you want an ibm compatible, then you should get a fast 486 with gobs of memory and gobs of disk and cdrom reader (preferably a carousel). you should then run a free version of unix so that you can effectively use all that memory and processor. by gobs of memory, i mean 16MB or more, and by gobs of disk, i mean 1GB or more. that is, assuming you really mean to do corpus work. for much of the work, you will need much more than 1GB of disk (and for some, you will need mcuh more than 16MB of memory). for simple things, the 16MB/1GB combination will suffice. you should also make yourself familiar with some programming language that will help ou work with texts. awk or perl would suffice. icon would probably be better. (d) Bibliography 7. Is there an introduction into corpus work in electronic form? not that i know of. ============ response #7 =============================== From: SKIESLING@GUVAX.BITNET I think your problem is that you asked too many questions. What you ask could fill at least one - if not more - graduate courses. In fact, I just took one with Cathy Ball this spring here at Georgetown. It looks as if you want us on the list to do a lot of research for you. It's all out there, just look for it. Here a a couple of starting points: Aijmer, Karin and Bengt Altenberg, eds. 1991. English Corpus Linguistics. London: Longman. Johansson, Stig and Anna-Brita Stenstrom. 1991. English Computer Corpora: Selected Papers and Research Guide. You might also try the MLA bibliography under corpora or corpus and linguistics, likewise in your library's card catalog. Happy hunting! Scott =========== response #8 ================================ From: "Evan L. Antworth 214/709-3346" For starters, get hold the the book "The Humanities Computing Yearbook, 1989-90", ed. Ian Lancashire, Oxford, Clarendon Press, 1991. It is very expensive, so try the library. It will answer many of your questions. --Evan =========== end summary ===============================  From corpora-request@uib.no Mon Jun 28 14:24:28 1993 Date: Mon, 28 Jun 93 12:24:28 +0200 From: Oliver Christ To: CORPORA@hd.uib.no Subject: corpora.bib After Ron Kuzar has posted the (still rudimentary) corpus bibliography I sent to him to the net, I think it would be a good idea when this bibliography is 'moderated' in the sense that there is a central depository for it and someone who puts in new entries. I would volunteer to do this and to put the bibliography on our ftp server here (ftp.ims.uni-stuttgart.de) in the directory pub/corpora as corpora.bib. New entries which are being sent to me will be added to it. Please try to send any contributions to the bibliography in bibtex format, since converting other formats to bibtex takes lots of time (which I don't have). Please don't cross-send bibliography-releated messages to the net, I will summarize relevant changes in this newsgroup. In the same directory, a file 'corpus-bib.dvi' can be found -- this is a more readable (and printable) version of the bibliograpy (BTW: sometimes, 'ls' doesn't show directories on our server, so use 'dir' instead). The labels for the entries are generally constructed out of the list of authors separated with slashes ('/'), and the year of publication appended with a colon (':'). If there are multiple entries with the same label, lowercase letters are appended to the year field. The e-mail address in the header of the bibliograpy should read 'oli@ims.uni-stuttgart.de', but bibtex doesn't like at-signs at any other places than the beginning of entries. Best regards, Oli --------------------------------------------------------------------------- Oliver Christ Institute for Natural Language Processing, University of Stuttgart, Germany oli@ims.uni-stuttgart.de --------------------------------------------------------------------------- From corpora-request@uib.no Mon Jun 28 15:56:51 1993 From: miles@minster.york.ac.uk Date: Mon, 28 Jun 93 14:42:06 To: CORPORA@hd.uib.no Subject: corpora.bib People should also be aware of ICAME's bibliography (fafsrv@nobergen) which seems to be what Oliver Christ is duplicating. Miles From corpora-request@uib.no Mon Jun 28 20:17:15 1993 Date: Mon, 28 Jun 93 18:17:15 +0200 From: Oliver Christ To: miles@minster.york.ac.uk Subject: corpora.bib >>>>> On Mon, 28 Jun 93 14:42:06, miles@minster.york.ac.uk said: miles> People should also be aware of ICAME's bibliography (fafsrv@nobergen) miles> which seems to be what Oliver Christ is duplicating. The ICAME bibliography is really fine (and huge), but unfortunately not in BibTeX format (or did I miss one of the files there?). I don't know how many people use bibtex, but I always use it for my bibliographies. In my eyes, it is too much work to convert such a huge bibliography like the one of ICAME to bibtex, so I think I (we?) have to start a bibtex database with the relevant citations from scratch. I don't think that I'm duplicating the ICAME bib -- if I wanted to, I'd use cp. Regards, Oli From corpora-request@uib.no Tue Jun 29 02:53:03 1993 From: wuzhibia@iscs.nus.sg (Wu Zhibiao) Subject: Re: corpora.bib To: oli@ims.uni-stuttgart.de Date: Tue, 29 Jun 93 8:44:42 WST I have a set of corpura.bib in bibtex format. I would like to contribute it to the net. best zhibiao --------------------------------------------------------------- Zhibiao Wu, Dept. of Information System & Computer Science, National University of Singapore, Republic of Singapore,0511. Tel:(65) 772-2767 Fax:(65) 7794580 Email:wuzhibia@iscs.nus.sg --------------------------------------------------------------- @book{aarts:corpus1, address = "Amsterdam", editor = "Jan Aarts and Willem Meijs", publisher = "Rodopi B.V.", title = "Corpus Linguistics I", year = "1984 " } @book{aarts:corpus2, address = "Amsterdam", editor = "Jan Aarts and Willem Meijs", publisher = "Rodopi B.V.", title = "Corpus Linguistics II", year = "1984 " } @inproceedings{an:ap, author = "Peter Anick and James Pustejovsky", booktitle = "Proceedings of 13th International Conference on Computational Linguistics", title = "An Application of Lexical Semantics to Knowledge Acquisition from Corpora", year = "1990 " } @techreport{atwell:tppc, author = "E. S. Atwell and T.F. O'Donoghue and C. Souter", institution = "School of Computer Studies, The University of leeds", number = "91.20", title = "Training Parsers with Parsed Corpura", year = "1991" } @book{bacchus:representing, author = "Fahiem Bacchus", publisher = "MIT Press", title = "Representing and Reasoning with Probabilistic Knowledge", year = "1990 " } @book{barwise:situations, author = "Jon Barwise and John Perry", publisher = "MIT Press", title = "Situations and Attitudes", year = "1983 " } @inproceedings{bl:pr, author = "E. Black and S. Abney and D. Flickinger and C. Gdaniec and R. Grishman and P. Harrison and D. Hindle and R. Ingria and F. Jelinek and J. Klavans and M. Liberman and M. Marcus and S. Roukos and S. T. Strzalkowski", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars", year = "1990 " } @inproceedings{black:hbg, author = "E. Black and F. Jelinek and J. Lafferty and D. Magerman and R. Mercer and S. Roukos", booktitle = "Proceedings of the 1992 DARPA Speech and Natural Language Workshop", note = "to appear", title = "Towards History-based Grammars: using Richer Models for Probabilistic parsing", year = "1992 " } @inproceedings{br:au, author = "Michael R. Brent and Robert C. Berwick", booktitle = "Proceedings of the 1991 DARPA Speech and Natural Language Workshop", month = "Fubruary", title = "Automatic Acquisition of Subcategorization Frames from Tagged Text", year = "1991", Adress = "Asilomar, California" } @book{brady:computational, editor = "Michael Brady and Robert C. Berwick", publisher = "MIT Press", title = "Computational Models of Discourse", year = "1983 " } @unpublished{brown:class, author = "P. Brown and V. J. Pietra and P. V. Desouza and J. C. Lai and R. L. Merceri", month = "December", note = "manuscript", title = "Class-based N-gram Models of Natural Language", year = "1990" } @article{brown:mt, author = "P. Brown and J. Cocke and S. Della Pietra and V. Della Pietra and F. Jelinek and J. Lafferty and R. Mercer and P. Roossin", journal = "Computational Linguistics", month = "June", number = "2", title = "A Statistical Approach to Machine Translation", volume = "16", year = "1990" } @unpublished{brown:pe, author = "P. Brown and S. Della Pietra and V.Della Pietra and R. Mercer", month = "May", note = "to appear", title = "The Mathematics of Machine Translation: Parameter Estimation", year = "1991" } @book{chomsky:56, author = "Chomsky, Noam", publisher = "Mouton", title = "Syntactic Strcutures", year = "1956 " } @book{chomsky:65, author = "Chomsky, Noam", publisher = "MIT Press", title = "Aspects of the theory of syntax", year = "1965 " } @inproceedings{church:88, author = "Church, K.", booktitle = "Proceedings of Second Conference on Applied Natural Language Processing", title = "A Stochastic Parts Program and Noun Phrase Paser for Unrestricted Texts", year = "1988", Adress = "Austin,Texas" } @article{church:good, author = "Church, K. and Gale, W.", journal = "Computer Speech and Language", number = "1", title = "A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams", volume = "5 ", year = "1991" } @incollection{church:lexical, author = "Church, K. and Hanks, P. and Hindle, D. and Gale, W.", booktitle = "Lexical Acquisition: Using on-line Resources to Build a Lexicon", editor = "Zernik", publisher = "Lawrence Erlbaum", title = "Using Statistics in Lexical Analysis", year = "1991" } @inproceedings{church:opport, author = "Church, K.", address = "Seattle, Washington", booktitle = "Proceedings of 23rd Symposium on the Interface, Computing Science and Statistics", month = "April", title = "Some Statistical Opportunities in Speech and Language", year = "1991" } @inproceedings{church:para, author = "Church, K.", address = "Oxford, England", booktitle = "proceedings of Seventh Annual Conference of the UW Centre for the New OED and Text Research", title = "Concordances for Parallel Text", year = "1991" } @inproceedings{church:pwa, author = "K. Church and W. Gale and P. Hanks and D. hindle", booktitle = "Proceedings of the 1989 DARPA Speech and Natural Language Workshop", title = "Parsing, Word Associations and Typical Predicate-Argument Relations", year = "1989 " } @article{church:review, author = " Church, K.", journal = "Computational Linguistics", number = "1 ", title = "review of Aarts, J., and Meijs, W. (eds.) `Theory and Practice in Corpus Linguistics'", volume = "17", year = "1991" } @incollection{church:suggestion, author = "Church, K.", booktitle = "Abornik praci: In Honor of Henry Kucerau", editor = "Simmons", note = "to appear", publisher = "Michigan Slavic Studies.", title = "Current Practice in Part of Speech Tagging and Suggestions for the Future", year = "1992" } @article{co:pa, author = "Covington, M.A.", journal = "Computational Linguistics", month = "December", number = "4", pages = "234--236", title = "Parsing Discontinuous Constituents in Dependency Grammar", volume = "16", year = "1990" } @book{cohen:introduction, address = "New York", editor = "Daniel I. A. Cohen", publisher = "John Wiley \& Sons, Inc.", title = "Introduction To Computer Theory", year = "1986 " } @book{cr:la, author = "Alan Cruttenden", publisher = "Manchester University Press", title = "Language in Infancy and Childhood", year = "1979 " } @book{degroot:pr, author = "Morris H. DeGroot", publisher = "Addison Wesley", title = "probability and Statistics", year = "1975 " } @article{dempster:em, author = "A. P. Dempster and N. M. Laird and D. B. Rubin", journal = "Journal of the Royal Statistical Society", title = " Maximum Likelihood from Incomplete Data via the EM Algorithm", volume = "B. 39", year = "1977 " } @article{derose:88, author = "Steven J. DeRose", journal = "Computational Linguistics", month = "Winter", number = "1 ", title = "Grammatical Category Disambiguation by Statistical Optimization", volume = "14", year = "1988" } @book{devlin:log, author = "Keith Devlin", publisher = "Cambridge University Press", title = "Logic And Information", year = "1991 " } @book{dillon:semantics, author = "George L. Dillon", publisher = "Prentice-Hall", title = " Introduction to Contemporary Linguistic Semantics", year = "1977 " } @book{frederking:integrated, author = "Robert E. Frederking", publisher = "Kluwer Academic Publishers", title = "Intergrated natural Language Dialogue: A Computational Model", year = "1988 " } @inproceedings{gale:poor, author = "William A. Gale and Kenneth W. Church", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "Poor Estimates of Context are Worse than None", year = "1990 " } @book{garside:com, author = "Garside, Roger. and Leech, Geoffrey and Sampson, Geoffrey", publisher = "Longman", title = "The Computational Analysis of English", year = "1987 " } @article{go:ad, author = "A.L.Gorin and S.E.Levinson,A.N.Gertner and E.Goldman", journal = "Computer Speech and Language", pages = "101-132", title = "Adaptive acquisition of Language", volume = "5", year = "1991" } @inproceedings{hi:st, author = "Donald Hindle and Mats rooth", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "Structural Ambiguity and Lexical Relations", year = "1990 " } @book{hu:word, author = "Hudson, R.", publisher = "Blackwell", title = "Word Grammar", year = "1984 " } @book{hy:la, author = "Nina M. Hyams", publisher = "D. Reidel", title = "Langauge Acquisition and the Theory of Parameters", year = "1986 " } @incollection{jelinek:80, author = "F. Jelinek and R. L. Mercer", booktitle = "Pattern Recognition in Practice", editor = "E.S. Gelsema and L.N. Kanal", publisher = "North-holland", title = "Interpolated Estimation of Markov Source parameters from Sparse Data", year = "1980 " } @incollection{jelinek:85, author = "F. Jelinek", booktitle = "Reading in Speech Recognition", editor = "Alex Waibel and Kai-Fu Lee", publisher = "Morgan Kaufmann", title = "Self-Organized Langauge Modeling for Speech Recognition", year = "1990 " } @techreport{jelinek:90, author = "F. Jelinek and J. D. Lafferty and R. L. Mercer", institution = "IBM Research Division, T. J. Watson Research Center", number = "RC 16374(\#72684)", title = "Basic Methods of Probabilistic Context Free Grammars", type = "Computer Science", year = "1990" } @book{kanal:uncertainty, editor = "L. N. Kanal and J. F. Lemmer", publisher = "North-Holland", title = "Uncertainty in Artificial Intelligence", year = "1986 " } @inproceedings{kup:darpa, author = "Julian Kupiec", booktitle = "Proceedings of the 1991 DARPA Speech and Natural Language Workshop", month = "Fubruary", title = "A Trellis-Based Algorithm For Estimating the Parameters of Hidden Stochastic Context-Free Grammar", year = "1991", Adress = "Asilomar, California" } @article{la:io, author = "Lari, K. and Young, S. J.", journal = "Computer Speech and Language", pages = "237--257", title = "Applications of Stochastic Context-free Grammars using the Inside-Outside Algorithm", volume = "5", year = "1991" } @book{levine:neural, author = " D. S. Levine", address = "Hillsdale, New Jersey", publisher = "Lawrence Erlbaum Associates", title = "Neural \& Cognitive Modeling", year = "1991 " } @article{machova:review, author = "Svatava Machova", journal = "Computational Linguistics", number = "1", pages = "108-111", title = "Book Review: Meaning-Text Theory", volume = "18", year = "1992" } @unpublished{magerman:everything, author = "David M. Magerman", address = "magerman@csli.stanford.edu", note = "manuscript", title = "Everything You Always Wanted to Know About Probability Theory, But Were Afraid to Ask", year = "1991 " } @inproceedings{magerman:mu, author = "Magerman, D. M. and Marcus, M. P.", booktitle = "Proceedings of Eight National Conference on Artificial Intelligence", month = "August", title = "Parsing a Natural Language Using Mutual Information Statistics", year = "1990", Adress = "Boston,Massachusetts" } @inproceedings{magerman:pearl, author = "Magerman, D. M. and Marcus, M. P.", booktitle = "Proceedings of European ACL", month = "April", title = "Pearl: A Probabilistic Chart Parsing", year = "1991", Adress = "Berlin, Germany" } @inproceedings{magerman:picky, author = "Magerman, D. M. and C. Weir", booktitle = "Proceedings of the 1992 DARPA Speech and Natural Language Workshop", note = "to appear", title = "Probabilistic Prediction and Picky Chart parser", year = "1992 " } @book{mckeown:text, author = "Kathleen R. McKeown", publisher = "Cambridge University", title = "Text Generation", year = "1985 " } @book{me:de, author = "Igor A. Mel'\v{c}uk", publisher = "State University of New York Press", title = "Dependency Syntax: Theory and Practice", year = "1988 " } @book{me:surface, author = "Igor A. Mel'\v{c}uk and Nikolaj V. Pertsov", address = "Amsterdam", publisher = "John Benjamins Publishing Company", title = "Surface Syntax of English: A Formal Model with the Meaning-Text Framework", year = "1987 " } @book{meijs:corpus, address = "Amsterdam", editor = "Willem Meijs", publisher = "Rodopi B. V.", title = "Corpus Linguistics and Beyond", year = "1987 " } @unpublished{miles:review, author = "Miles Osborne", month = "December", note = "manuscript", title = "A Review of Parsing Paradigms for Natural Langauge", year = "1991 " } @book{moore:cog, address = "New york", editor = "Timothy E. Moore", publisher = "Academic Press", title = "Cognitive Development and the Acquisition of Language", year = "1973 " } @book{obermeier:natural, author = "Klaus K. Obermeier", publisher = "Ellis Horwood Limited", title = "Natural Language Processing Technologies in Artificial Intelligence", year = "1989 " } @phdthesis{od:phd, author = "Tim F. O'Donoghue", note = "forthcoming", school = "School of Computer Studies, University of Leeds", title = "Reversing the process of generation in Systemic Grammar", year = "1992" } @inproceedings{odonoghue:acl91, author = "O'Donoghue, Tim F.", booktitle = "Proceedings of the ACL SIG Workshop on Reversible Grammar in Natural Language Processing", month = June, title = "A Semantic Interpreter for Systemic Grammars", year = "1991 " } @incollection{odonoghue:acl91-book, author = "O'Donoghue, Tim F.", crossref = "strzalkowski:acl91-book", title = "Semantic Interpretation in a Systemic Grammar" } @techreport{odonoghue:epow, author = "Tim F. O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "91.11", title = "EPOW: The Edited Polytechnic of Wales Corpus", year = "1991" } @techreport{odonoghue:vsp, author = "Tim F. O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "91.15", title = "The Vertical Strip Parser: A Lazy Approach to Parsing", year = "1991" } @book{oe:ca, author = "Richard T. Oehrle and Emmon Bach and Deirdre Wheeler", publisher = "D, Reidel Publishing Company", title = "Categorial Grammars and natural Language Structures", year = "1988 " } @book{palmer:semantic, author = "Matha S. Palmer", publisher = "Cambridge University Press", title = "Semantic Processing for Finite Domains", year = "1990 " } @book{pollard:information, author = "Carl Pollard and Ivan A. Sag", publisher = "CSLI, Leland Stanford Junior University", series = "CSLI Lecture Notes", title = "Information-based Syntax and Semantics", volume = "13 ", year = "1987" } @incollection{rabiner:tutor, author = "Rabiner,L.R.", address = "Sanmateo, California", booktitle = "Readings in Speech Recognition", editor = "Waibel, A. and Lee, K. F.", publisher = "Morgan Kaufmann", title = "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", year = "1990 " } @book{rao:65, author = "Rao, C.R.", address = "New York", publisher = "Wiley", title = "Linear Statistical Inference and Its Application", year = "1965" } @book{shachter:uncertainty, author = "R. D. Shachter and T. S. Levitt and L. N. Kanal and J. F. Lemmer", publisher = "North-Holland", title = "Uncertainty in Artificial Interlligence 4", year = "1990 " } @book{shannon:communication, author = "Shannon, C.E. and Weaver, W.", publisher = "Urbana", title = "The Mathematical Theory of Communication", year = "1949 " } @techreport{souter:communal, author = "Clive Souter and Tim O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "90.2", title = "Probabilistic Parsing in the COMMUNAL project", year = "1990" } @techreport{souter:sfgc, author = "Clive Souter", institution = "School of Computer Studies, The University of leeds", number = "89.12", title = "Systemic-Functional Grammars and Corpura", year = "1989" } @book{sowa:conceptual, author = "J. F. Sowa", publisher = "Addison-Wesley", title = "Conceptual Structures: Information Processing in Mind and Machine", year = "1984" } @book{steele:meaning, editor = "James Steele", publisher = "University of Ottawa Press", title = "Maening-Text Theory: Linguistics, Lexicography, and Implications", year = "1990 " } @book{strzalkowski:acl91-book, editor = "Strzalkowski, Tomek", note = "To appear", publisher = "Kluwer Academic", title = "Reversible Grammar in Natural Language Processing", year = "1992" } @book{te:59, author = "Tesni\`{e}re, Lucien", publisher = "Klincksieck", title = "\'{E}l\'{e}ments de la Syntaxe Structurale", year = "1959 " } @book{tomita:parsing, editor = "Masaru Tomita", publisher = "Kluwer Academic", title = "Current Issues in Parsing Technology", year = "1991" } @book{trivedi:pr, author = "Kishor S. Trivedi", publisher = "Prentice-Hall Inc.", title = "Probability \& Statistics with Reliability, Queuing, and Computer Science Applications", year = "1982 " } @book{wilks:theoretical, address = "New Jersey", editor = "Yorick Wilks", publisher = "Lawrence Erlbaum Associates, Inc.", title = "Theoretical Issues in Natural langauge Processings", year = "1989 " } @book{woods:statistics, author = "Anthony Woods and Paul Fletcher and Arthur Hughes", publisher = "Cambridge University Press", title = "Statistics in Language Studies", year = "1986 " } @book{zipf, author = "Zipf, George K.", publisher = "George Routledge", title = " The Psycho-biology of Language: An introduction to dynamic Philology", year = "1936 " } @article{zu:notes, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "July", title = "Notes on a Probabilistic Parsing Experiment", year = "1990" } @article{zu:pr, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "June", title = "Probabilistic Methods in Dependency Grammar Parsing", year = "1989" } @article{zu:sa, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "February", title = "The Application of Simulated Annealing in Dependency Grammar Parsing", year = "1989" } @article{zu:ssn, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "November", title = "A Technique For The Compact Representation of Multiple Analyses in Dependency Grammar", year = "1988" } From corpora-request@uib.no Tue Jun 29 07:35:36 1993 From: wuzhibia@iscs.nus.sg (Wu Zhibiao) Subject: Re: corpora.bib To: oli@ims.uni-stuttgart.de Date: Tue, 29 Jun 93 8:44:42 WST This came straight to me, but was intended to the whole list. Perhaps it is worth mentioning that the 'reply' function on this list sends messages to the original adressor, not to the list. Ron Kuzar ========= message starts here ====================== I have a set of corpura.bib in bibtex format. I would like to contribute it to the net. best zhibiao --------------------------------------------------------------- Zhibiao Wu, Dept. of Information System & Computer Science, National University of Singapore, Republic of Singapore,0511. Tel:(65) 772-2767 Fax:(65) 7794580 Email:wuzhibia@iscs.nus.sg --------------------------------------------------------------- @book{aarts:corpus1, address = "Amsterdam", editor = "Jan Aarts and Willem Meijs", publisher = "Rodopi B.V.", title = "Corpus Linguistics I", year = "1984 " } @book{aarts:corpus2, address = "Amsterdam", editor = "Jan Aarts and Willem Meijs", publisher = "Rodopi B.V.", title = "Corpus Linguistics II", year = "1984 " } @inproceedings{an:ap, author = "Peter Anick and James Pustejovsky", booktitle = "Proceedings of 13th International Conference on Computational Linguistics", title = "An Application of Lexical Semantics to Knowledge Acquisition from Corpora", year = "1990 " } @techreport{atwell:tppc, author = "E. S. Atwell and T.F. O'Donoghue and C. Souter", institution = "School of Computer Studies, The University of leeds", number = "91.20", title = "Training Parsers with Parsed Corpura", year = "1991" } @book{bacchus:representing, author = "Fahiem Bacchus", publisher = "MIT Press", title = "Representing and Reasoning with Probabilistic Knowledge", year = "1990 " } @book{barwise:situations, author = "Jon Barwise and John Perry", publisher = "MIT Press", title = "Situations and Attitudes", year = "1983 " } @inproceedings{bl:pr, author = "E. Black and S. Abney and D. Flickinger and C. Gdaniec and R. Grishman and P. Harrison and D. Hindle and R. Ingria and F. Jelinek and J. Klavans and M. Liberman and M. Marcus and S. Roukos and S. T. Strzalkowski", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars", year = "1990 " } @inproceedings{black:hbg, author = "E. Black and F. Jelinek and J. Lafferty and D. Magerman and R. Mercer and S. Roukos", booktitle = "Proceedings of the 1992 DARPA Speech and Natural Language Workshop", note = "to appear", title = "Towards History-based Grammars: using Richer Models for Probabilistic parsing", year = "1992 " } @inproceedings{br:au, author = "Michael R. Brent and Robert C. Berwick", booktitle = "Proceedings of the 1991 DARPA Speech and Natural Language Workshop", month = "Fubruary", title = "Automatic Acquisition of Subcategorization Frames from Tagged Text", year = "1991", Adress = "Asilomar, California" } @book{brady:computational, editor = "Michael Brady and Robert C. Berwick", publisher = "MIT Press", title = "Computational Models of Discourse", year = "1983 " } @unpublished{brown:class, author = "P. Brown and V. J. Pietra and P. V. Desouza and J. C. Lai and R. L. Merceri", month = "December", note = "manuscript", title = "Class-based N-gram Models of Natural Language", year = "1990" } @article{brown:mt, author = "P. Brown and J. Cocke and S. Della Pietra and V. Della Pietra and F. Jelinek and J. Lafferty and R. Mercer and P. Roossin", journal = "Computational Linguistics", month = "June", number = "2", title = "A Statistical Approach to Machine Translation", volume = "16", year = "1990" } @unpublished{brown:pe, author = "P. Brown and S. Della Pietra and V.Della Pietra and R. Mercer", month = "May", note = "to appear", title = "The Mathematics of Machine Translation: Parameter Estimation", year = "1991" } @book{chomsky:56, author = "Chomsky, Noam", publisher = "Mouton", title = "Syntactic Strcutures", year = "1956 " } @book{chomsky:65, author = "Chomsky, Noam", publisher = "MIT Press", title = "Aspects of the theory of syntax", year = "1965 " } @inproceedings{church:88, author = "Church, K.", booktitle = "Proceedings of Second Conference on Applied Natural Language Processing", title = "A Stochastic Parts Program and Noun Phrase Paser for Unrestricted Texts", year = "1988", Adress = "Austin,Texas" } @article{church:good, author = "Church, K. and Gale, W.", journal = "Computer Speech and Language", number = "1", title = "A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilities of English Bigrams", volume = "5 ", year = "1991" } @incollection{church:lexical, author = "Church, K. and Hanks, P. and Hindle, D. and Gale, W.", booktitle = "Lexical Acquisition: Using on-line Resources to Build a Lexicon", editor = "Zernik", publisher = "Lawrence Erlbaum", title = "Using Statistics in Lexical Analysis", year = "1991" } @inproceedings{church:opport, author = "Church, K.", address = "Seattle, Washington", booktitle = "Proceedings of 23rd Symposium on the Interface, Computing Science and Statistics", month = "April", title = "Some Statistical Opportunities in Speech and Language", year = "1991" } @inproceedings{church:para, author = "Church, K.", address = "Oxford, England", booktitle = "proceedings of Seventh Annual Conference of the UW Centre for the New OED and Text Research", title = "Concordances for Parallel Text", year = "1991" } @inproceedings{church:pwa, author = "K. Church and W. Gale and P. Hanks and D. hindle", booktitle = "Proceedings of the 1989 DARPA Speech and Natural Language Workshop", title = "Parsing, Word Associations and Typical Predicate-Argument Relations", year = "1989 " } @article{church:review, author = " Church, K.", journal = "Computational Linguistics", number = "1 ", title = "review of Aarts, J., and Meijs, W. (eds.) `Theory and Practice in Corpus Linguistics'", volume = "17", year = "1991" } @incollection{church:suggestion, author = "Church, K.", booktitle = "Abornik praci: In Honor of Henry Kucerau", editor = "Simmons", note = "to appear", publisher = "Michigan Slavic Studies.", title = "Current Practice in Part of Speech Tagging and Suggestions for the Future", year = "1992" } @article{co:pa, author = "Covington, M.A.", journal = "Computational Linguistics", month = "December", number = "4", pages = "234--236", title = "Parsing Discontinuous Constituents in Dependency Grammar", volume = "16", year = "1990" } @book{cohen:introduction, address = "New York", editor = "Daniel I. A. Cohen", publisher = "John Wiley \& Sons, Inc.", title = "Introduction To Computer Theory", year = "1986 " } @book{cr:la, author = "Alan Cruttenden", publisher = "Manchester University Press", title = "Language in Infancy and Childhood", year = "1979 " } @book{degroot:pr, author = "Morris H. DeGroot", publisher = "Addison Wesley", title = "probability and Statistics", year = "1975 " } @article{dempster:em, author = "A. P. Dempster and N. M. Laird and D. B. Rubin", journal = "Journal of the Royal Statistical Society", title = " Maximum Likelihood from Incomplete Data via the EM Algorithm", volume = "B. 39", year = "1977 " } @article{derose:88, author = "Steven J. DeRose", journal = "Computational Linguistics", month = "Winter", number = "1 ", title = "Grammatical Category Disambiguation by Statistical Optimization", volume = "14", year = "1988" } @book{devlin:log, author = "Keith Devlin", publisher = "Cambridge University Press", title = "Logic And Information", year = "1991 " } @book{dillon:semantics, author = "George L. Dillon", publisher = "Prentice-Hall", title = " Introduction to Contemporary Linguistic Semantics", year = "1977 " } @book{frederking:integrated, author = "Robert E. Frederking", publisher = "Kluwer Academic Publishers", title = "Intergrated natural Language Dialogue: A Computational Model", year = "1988 " } @inproceedings{gale:poor, author = "William A. Gale and Kenneth W. Church", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "Poor Estimates of Context are Worse than None", year = "1990 " } @book{garside:com, author = "Garside, Roger. and Leech, Geoffrey and Sampson, Geoffrey", publisher = "Longman", title = "The Computational Analysis of English", year = "1987 " } @article{go:ad, author = "A.L.Gorin and S.E.Levinson,A.N.Gertner and E.Goldman", journal = "Computer Speech and Language", pages = "101-132", title = "Adaptive acquisition of Language", volume = "5", year = "1991" } @inproceedings{hi:st, author = "Donald Hindle and Mats rooth", booktitle = "Proceedings of the 1990 DARPA Speech and Natural Language Workshop", title = "Structural Ambiguity and Lexical Relations", year = "1990 " } @book{hu:word, author = "Hudson, R.", publisher = "Blackwell", title = "Word Grammar", year = "1984 " } @book{hy:la, author = "Nina M. Hyams", publisher = "D. Reidel", title = "Langauge Acquisition and the Theory of Parameters", year = "1986 " } @incollection{jelinek:80, author = "F. Jelinek and R. L. Mercer", booktitle = "Pattern Recognition in Practice", editor = "E.S. Gelsema and L.N. Kanal", publisher = "North-holland", title = "Interpolated Estimation of Markov Source parameters from Sparse Data", year = "1980 " } @incollection{jelinek:85, author = "F. Jelinek", booktitle = "Reading in Speech Recognition", editor = "Alex Waibel and Kai-Fu Lee", publisher = "Morgan Kaufmann", title = "Self-Organized Langauge Modeling for Speech Recognition", year = "1990 " } @techreport{jelinek:90, author = "F. Jelinek and J. D. Lafferty and R. L. Mercer", institution = "IBM Research Division, T. J. Watson Research Center", number = "RC 16374(\#72684)", title = "Basic Methods of Probabilistic Context Free Grammars", type = "Computer Science", year = "1990" } @book{kanal:uncertainty, editor = "L. N. Kanal and J. F. Lemmer", publisher = "North-Holland", title = "Uncertainty in Artificial Intelligence", year = "1986 " } @inproceedings{kup:darpa, author = "Julian Kupiec", booktitle = "Proceedings of the 1991 DARPA Speech and Natural Language Workshop", month = "Fubruary", title = "A Trellis-Based Algorithm For Estimating the Parameters of Hidden Stochastic Context-Free Grammar", year = "1991", Adress = "Asilomar, California" } @article{la:io, author = "Lari, K. and Young, S. J.", journal = "Computer Speech and Language", pages = "237--257", title = "Applications of Stochastic Context-free Grammars using the Inside-Outside Algorithm", volume = "5", year = "1991" } @book{levine:neural, author = " D. S. Levine", address = "Hillsdale, New Jersey", publisher = "Lawrence Erlbaum Associates", title = "Neural \& Cognitive Modeling", year = "1991 " } @article{machova:review, author = "Svatava Machova", journal = "Computational Linguistics", number = "1", pages = "108-111", title = "Book Review: Meaning-Text Theory", volume = "18", year = "1992" } @unpublished{magerman:everything, author = "David M. Magerman", address = "magerman@csli.stanford.edu", note = "manuscript", title = "Everything You Always Wanted to Know About Probability Theory, But Were Afraid to Ask", year = "1991 " } @inproceedings{magerman:mu, author = "Magerman, D. M. and Marcus, M. P.", booktitle = "Proceedings of Eight National Conference on Artificial Intelligence", month = "August", title = "Parsing a Natural Language Using Mutual Information Statistics", year = "1990", Adress = "Boston,Massachusetts" } @inproceedings{magerman:pearl, author = "Magerman, D. M. and Marcus, M. P.", booktitle = "Proceedings of European ACL", month = "April", title = "Pearl: A Probabilistic Chart Parsing", year = "1991", Adress = "Berlin, Germany" } @inproceedings{magerman:picky, author = "Magerman, D. M. and C. Weir", booktitle = "Proceedings of the 1992 DARPA Speech and Natural Language Workshop", note = "to appear", title = "Probabilistic Prediction and Picky Chart parser", year = "1992 " } @book{mckeown:text, author = "Kathleen R. McKeown", publisher = "Cambridge University", title = "Text Generation", year = "1985 " } @book{me:de, author = "Igor A. Mel'\v{c}uk", publisher = "State University of New York Press", title = "Dependency Syntax: Theory and Practice", year = "1988 " } @book{me:surface, author = "Igor A. Mel'\v{c}uk and Nikolaj V. Pertsov", address = "Amsterdam", publisher = "John Benjamins Publishing Company", title = "Surface Syntax of English: A Formal Model with the Meaning-Text Framework", year = "1987 " } @book{meijs:corpus, address = "Amsterdam", editor = "Willem Meijs", publisher = "Rodopi B. V.", title = "Corpus Linguistics and Beyond", year = "1987 " } @unpublished{miles:review, author = "Miles Osborne", month = "December", note = "manuscript", title = "A Review of Parsing Paradigms for Natural Langauge", year = "1991 " } @book{moore:cog, address = "New york", editor = "Timothy E. Moore", publisher = "Academic Press", title = "Cognitive Development and the Acquisition of Language", year = "1973 " } @book{obermeier:natural, author = "Klaus K. Obermeier", publisher = "Ellis Horwood Limited", title = "Natural Language Processing Technologies in Artificial Intelligence", year = "1989 " } @phdthesis{od:phd, author = "Tim F. O'Donoghue", note = "forthcoming", school = "School of Computer Studies, University of Leeds", title = "Reversing the process of generation in Systemic Grammar", year = "1992" } @inproceedings{odonoghue:acl91, author = "O'Donoghue, Tim F.", booktitle = "Proceedings of the ACL SIG Workshop on Reversible Grammar in Natural Language Processing", month = June, title = "A Semantic Interpreter for Systemic Grammars", year = "1991 " } @incollection{odonoghue:acl91-book, author = "O'Donoghue, Tim F.", crossref = "strzalkowski:acl91-book", title = "Semantic Interpretation in a Systemic Grammar" } @techreport{odonoghue:epow, author = "Tim F. O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "91.11", title = "EPOW: The Edited Polytechnic of Wales Corpus", year = "1991" } @techreport{odonoghue:vsp, author = "Tim F. O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "91.15", title = "The Vertical Strip Parser: A Lazy Approach to Parsing", year = "1991" } @book{oe:ca, author = "Richard T. Oehrle and Emmon Bach and Deirdre Wheeler", publisher = "D, Reidel Publishing Company", title = "Categorial Grammars and natural Language Structures", year = "1988 " } @book{palmer:semantic, author = "Matha S. Palmer", publisher = "Cambridge University Press", title = "Semantic Processing for Finite Domains", year = "1990 " } @book{pollard:information, author = "Carl Pollard and Ivan A. Sag", publisher = "CSLI, Leland Stanford Junior University", series = "CSLI Lecture Notes", title = "Information-based Syntax and Semantics", volume = "13 ", year = "1987" } @incollection{rabiner:tutor, author = "Rabiner,L.R.", address = "Sanmateo, California", booktitle = "Readings in Speech Recognition", editor = "Waibel, A. and Lee, K. F.", publisher = "Morgan Kaufmann", title = "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", year = "1990 " } @book{rao:65, author = "Rao, C.R.", address = "New York", publisher = "Wiley", title = "Linear Statistical Inference and Its Application", year = "1965" } @book{shachter:uncertainty, author = "R. D. Shachter and T. S. Levitt and L. N. Kanal and J. F. Lemmer", publisher = "North-Holland", title = "Uncertainty in Artificial Interlligence 4", year = "1990 " } @book{shannon:communication, author = "Shannon, C.E. and Weaver, W.", publisher = "Urbana", title = "The Mathematical Theory of Communication", year = "1949 " } @techreport{souter:communal, author = "Clive Souter and Tim O'Donoghue", institution = "School of Computer Studies, The University of leeds", number = "90.2", title = "Probabilistic Parsing in the COMMUNAL project", year = "1990" } @techreport{souter:sfgc, author = "Clive Souter", institution = "School of Computer Studies, The University of leeds", number = "89.12", title = "Systemic-Functional Grammars and Corpura", year = "1989" } @book{sowa:conceptual, author = "J. F. Sowa", publisher = "Addison-Wesley", title = "Conceptual Structures: Information Processing in Mind and Machine", year = "1984" } @book{steele:meaning, editor = "James Steele", publisher = "University of Ottawa Press", title = "Maening-Text Theory: Linguistics, Lexicography, and Implications", year = "1990 " } @book{strzalkowski:acl91-book, editor = "Strzalkowski, Tomek", note = "To appear", publisher = "Kluwer Academic", title = "Reversible Grammar in Natural Language Processing", year = "1992" } @book{te:59, author = "Tesni\`{e}re, Lucien", publisher = "Klincksieck", title = "\'{E}l\'{e}ments de la Syntaxe Structurale", year = "1959 " } @book{tomita:parsing, editor = "Masaru Tomita", publisher = "Kluwer Academic", title = "Current Issues in Parsing Technology", year = "1991" } @book{trivedi:pr, author = "Kishor S. Trivedi", publisher = "Prentice-Hall Inc.", title = "Probability \& Statistics with Reliability, Queuing, and Computer Science Applications", year = "1982 " } @book{wilks:theoretical, address = "New Jersey", editor = "Yorick Wilks", publisher = "Lawrence Erlbaum Associates, Inc.", title = "Theoretical Issues in Natural langauge Processings", year = "1989 " } @book{woods:statistics, author = "Anthony Woods and Paul Fletcher and Arthur Hughes", publisher = "Cambridge University Press", title = "Statistics in Language Studies", year = "1986 " } @book{zipf, author = "Zipf, George K.", publisher = "George Routledge", title = " The Psycho-biology of Language: An introduction to dynamic Philology", year = "1936 " } @article{zu:notes, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "July", title = "Notes on a Probabilistic Parsing Experiment", year = "1990" } @article{zu:pr, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "June", title = "Probabilistic Methods in Dependency Grammar Parsing", year = "1989" } @article{zu:sa, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "February", title = "The Application of Simulated Annealing in Dependency Grammar Parsing", year = "1989" } @article{zu:ssn, author = "Job M. van Zuijlen", journal = "BSO/Research, Utrecht, The Netherlands", month = "November", title = "A Technique For The Compact Representation of Multiple Analyses in Dependency Grammar", year = "1988" } From corpora-request@uib.no Tue Jun 29 12:07:39 1993 From: miles@minster.york.ac.uk Date: Tue, 29 Jun 93 11:00:32 To: corpora@hd.uib.no Subject: Spoken English Corpus Hello all. Sometime next year I'm about to start using the SE Corpus with which to evaluate my system. I was wondering if anyone out there would like to share their experience of this corpus. I know that people at Leeds and Cambridge have used it already. Thanks! Miles From corpora-request@uib.no Tue Jun 29 14:52:08 1993 Date: Mon, 28 Jun 93 14:42:57 CET From: "Alois.Pichler" To: corpora@hd.uib.no Subject: tagging Does there exist a program (for Unix or PC) which provides for automatic tagging of German adverbs - spread in an ASCI text - as local, temporal etc.? (no problem, if some adverbs are attributed too many, i.e. also wrong, features - the human look at the context will select the right ones) Or, similarily, of German conjunctions? I would be grateful for any information Alois Pichler The Wittgenstein Archives at the University of Bergen H. Haarfagresgt. 31 N-5007 Bergen Norway e-mail: alois@pc.hd.uib.no snakker noe imot? er teksten o.k.? From corpora-request@uib.no Wed Jun 30 11:59:23 1993 Date: Wed, 30 Jun 1993 09:59:23 +0200 From: Yen Ketty To: CORPORA Subject: Ill-formed corpora I'd like to learn from other netters about how to collect ill-formed corpora (written). For example, the translations and compositions in learning second/foreign language. I'm particularly interested in collecting English native speakers' translations and compositions in Chinese. If there is a source that I can access to , please let me know. But any information will be appreciated. Ketty Yen PRC Inc. McLean, VA U.S.A. yen_ketty@po.gis.prc.com From corpora-request@uib.no Wed Jun 30 18:01:45 1993 Date: Wed, 30 Jun 93 17:57 MET From: The Centre for Lexical Information Subject: TEI-guidelines To: CORPORA@hd.uib.no Dear readers, Could anyone point me to the ftp-site for the guidelines of the Text Encoding Initiative, especially with regard to spoken data (if these have been completed yet). Richard Piepenbrock CELEX - The Centre for Lexical Information Max-Planck-Institut fuer Psycholinguistik Wundtlaan 1 6525 XD NIJMEGEN The Netherlands EARN/BITNET: celex@hnympi51 INTERNET: celex@mpi.nl SURFNET: celex::celexmail JANET: celex%hnympi51@uk.ac.earn-relay From corpora-request@uib.no Wed Jun 30 08:48:47 1993 Date: Wed, 30 Jun 93 13:48:47 CDT From: "C. M. Sperberg-McQueen" Subject: Re: TEI-guidelines To: The Centre for Lexical Information On Wed, 30 Jun 93 17:57 MET Richard Piepenbrock said: >Dear readers, > >Could anyone point me to the ftp-site for the guidelines of the Text >Encoding Initiative, especially with regard to spoken data (if these have >been completed yet). With pleasure. All published drafts of the TEI's Guidelines for Text Encoding and Interchange are available by Listserv from uicvm.uic.edu, or by anonymous ftp from a variety of sites. The chapter on spoken texts, first published in April 1992, may be retrieved under the file names p234.p2x (an SGML-tagged version of the prose text) p234.ref (an SGML-tagged version of the reference material) p234.tex (a LaTeX version of the prose text only) p234.doc (a 'screen-readable' formatted version of the whole thing) p234.ps (a postscript version) from the servers listed below. N.B. this chapter is old and some of its descriptions of its interface to the rest of the TEI scheme are now out of date. It is no longer numbered chapter 34. And also our formatting macros are better now than they were when this chapter was prepared. To retrieve the file from the Listserv server in Chicago, send mail to listserv@uicvm.uic.edu (or to listserv@uicvm on Bitnet/Earn) containing one or more lines of the form get p234 ps get p234 p2x get p234 ref Most TEI drafts are also available from a list server at the university of Goettingen: send mail as described above to listserv@ibm.gwdg.de (or, for EARN sites, Listserv@dgogwdg1) Anonymous FTP service is generously maintained by friendly individuals at the following sites, in the directories indicated. Thanks to Paul Ellison, Michael Popham, Syun Tutiya, and Erik Naggum. sgml1.ex.ac.uk : tei/drafts pine.kuee.kyoto-u.ac.jp : pub/TEI (get p234.tar.Z) ftp.hitachi-sk.co.jp : pub/doc/TEI (get p234.tar.Z) ifi.uio.no : SGML/TEI The Text Encoding Initiative is a cooperative undertaking of the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC), to formulate and disseminate guidelines for the encoding and interchange of machine-readable texts intended for literary, linguistic, historical, or other textual research. The current draft of the TEI Guidelines bears the document number TEI P2 and is being published in fascicles (which is what you will see when you use the ftp servers just named). The TEI Advisory Board has just approved the current cumulative draft; over the course of the summer, some editorial cleanup will take place, followed by publication of the Guidelines (under the number TEI P3, which will no longer be labeled a draft) sometime this fall (probably October/November). For further information contact either of the editors, or subscribe to the listserv list TEI-L at UICVM. Send mail to listserv@uicvm.uic.edu with the line subscribe tei-l J. Doe substituting your name for that of J. Doe. -C. M. Sperberg-McQueen (u35395@uicvm.uic.edu) Lou Burnard (lou@vax.ox.ac.uk) Editors, ACH / ACL / ALLC Text Encoding Initiative From corpora-request@uib.no Fri Jul 2 07:21:01 1993 Date: Fri, 02 Jul 1993 11:21:01 -0400 (EDT) From: Cathy Ball Subject: Address for Merja Kyoto To: corpora@hd.uib.no Does anyone have the snail-mail and/or e-mail address of Merja Kyoto, University of Helsinki? Thanks ... -- Cathy Ball (cball@guvax.georgetown.edu) From corpora-request@uib.no Fri Jul 2 08:26:53 1993 Date: Fri, 02 Jul 1993 12:26:53 -0400 (EDT) From: Cathy Ball Subject: Merja Kyto's address - thanks! To: corpora@hd.uib.no Thanks very much to the 8 people who responded to my query! I now have 3 different addresses, and I'll try them all: mkyto@cc.helsinki.fi merja_kyto@helsinki.fi mkyto@finuha.bitnet@livid.uib.no -- Cathy Ball (cball@guvax.georgetown.edu) From corpora-request@uib.no Sun Jul 4 13:13:27 1993 Date: Sun, 04 Jul 1993 18:13:27 -0500 (EST) From: "N. Belmore, TESL Centre, Concordia Univ., Montreal H3G 1M8, (514)848-2457 /2450" Subject: Introduction to corpus linguistics To: corpora@hd.uib.no In reply to a query from Ron Kuzar, Oliver Christ says, "The field is still too young, and I don't know of a published introduction to this area." The field is not, in fact, 'young' though it is certainly suddenly in vogue. A brief introduction will be found in the 1991 volume English Corpus Linguistics published by Longman. The article is by one of the major contributors to corpus linguistics, Geoffrey Leech. The title is 'The State of the Art in Corpus Linguistics.' For further information about corpus linguistics send a message to FILESERV@HD.UIB.NO with the following lines: send icame bibliography.1991 send icame bibliography.1992 The bibliographies are part of the information available from the Humanistisk Datasenter to which Dr. Lindsay Evett referred in his reply to Ron's query. To find out more about what is available from the Humanistisk Datasenter, send the following message to the same address: send icame file.servers ICAME is the acronym for the International Computer Archive of Modern English. Also, there is a database at the University of Montreal which, as of 93 April 15, reported 1000 entries under the heading Corpus Linguistics and Dialect Study. To find out more about this database, contact sabourco@ere.umontreal.ca For 15 years Sabour and his group have been compiling a bibliographical database on all aspects of computer processing of natural language. It currently has about 67,000 references! As for hardware, the notion that 'a personal computer is a too limited architecture for corpus processing...' is certainly subject to challenge. Which personal computer? What configuration? Most important, what are you trying to do? As an earlier respondent noted, it's not wise to even think of buying hardware until you have answered that question. Regards, Nancy From corpora-request@uib.no Sun Jul 4 13:18:05 1993 Date: Sun, 04 Jul 1993 18:18:05 -0500 (EST) From: "N. Belmore, TESL Centre, Concordia Univ., Montreal H3G 1M8, (514)848-2457 /2450" Subject: Corpus of the English of Chinese learners To: corpora@hd.uib.no John Milton and Nandini Chowdhury have been awarded a research grant to compile a corpus of the interlanguage of Hong Kong students who have had about 12 years of instruction in school. For more information, send a message to 'lcjohn@usthk.bitnet' or 'lcjohn@usthk.ust.hk' From corpora-request@uib.no Mon Jul 5 06:29:07 1993 Date: Mon, 5 Jul 1993 05:29:07 +0100 To: Cathy Ball , corpora@hd.uib.no From: eytan@dpt-info.u-strasbg.fr (Michel Eytan, LILoL) Subject: Re: Address for Merja Kyoto At 11:21 2/07/93 -0400, Cathy Ball wrote: >Does anyone have the snail-mail and/or e-mail address of Merja Kyoto, >University of Helsinki? Thanks ... > -- Cathy Ball (cball@guvax.georgetown.edu) this is what I get from the Univ. of Helsinki phone-book (sorry for the lack of accents in my soft): Merja Kyt Englannin kielen laitos S hk postiosoitteita: Merja.Kyto@Helsinki.FI (VAX: IN%"Merja.Kyto@Helsinki.FI") mkyto@hylk.Helsinki.FI (VAX: IN%"mkyto@hylk.Helsinki.FI") Hope this helps ~=michel -- Michel Eytan, Lab Info, Log & Lang eytan@dpt-info.u-strasbg.fr Dpt Info, U Strasbourg II V: +33 88 41 74 29 22 rue Descartes, 67084 Strasbourg FR F: +33 88 41 74 40 From corpora-request@uib.no Mon Jul 5 18:04:40 1993 Date: Mon, 05 Jul 93 16:53:42 BST From: TOGNINIE@ibm3090.bham.ac.uk Subject: introduction to corpus linguistics To: corpora@HD.UIB.NO In reply to Oliver Christ remark on Corpus linguistics, I would recommend, as a good introduction to the subject "Corpus, Concordance, Collocation" by John Sinclair, O.U.P. 1991. Also, there is a section of 6 articles, all on corpus- driven studies in "Text and Technology", ed. by M. Baker, G. Francis and E. Tognini-Bonelli, Benjamins 1993. From corpora-request@uib.no Fri Jul 9 11:25:15 1993 Date: Fri, 9 Jul 1993 15:25:15 -0400 From: Becky Passonneau To: corpora@hd.uib.no Subject: e-mail address query I'm looking for the e-mail address of Roger Garside, or failing that, a postal address. Can anyone help? please send replies to becky@cs.columbia.edu From corpora-request@uib.no Tue Jul 13 12:15:30 1993 Date: Tue, 13 Jul 93 11:08:40 BST From: GOUTSOSD@ibm3090.bham.ac.uk Subject: Modern Greek Corpora To: corpora@HD.UIB.NO We recently conducted a survey of machine-readable corpora of Modern Greek, whose findings we presented at the recent ACH-ALLC '93 Georgetown Conference, under the title "Towards a corpus of Spoken Modern Greek". We managed to discover some 15 projects currently running but there may still be some people involved in corpus work in Modern Greek who have not been contacted. We intend to publicize our findings and so we'd welcome any suggestion on the enhancement of communication among researchers in this area. Dionysis Goutsos Rania Hatzidaki Philip King Modern Greek Corpus Initiative Dionysis Goutsos Telephone: (UK) 021-4498290 School of English Fax: (UK) 021-414 5668 University of Birmingham E-mail: goutsosd@uk.ac.bham Edgbaston Birmingham B15 2TT U.K. From corpora-request@uib.no Wed Jul 14 14:42:36 1993 Date: Wed, 14 Jul 93 13:18:03 BST From: dahe@sharp.co.uk (David Elworthy) To: corpora@hd.uib.no Subject: Penn postext format Can anyone help me with interpreting the following bit of formatting in the Penn Treebank postext files? Normally, / marks tags and \ disables the special effect of /. So what does the following mean: The/DT \/NN 252\/JJ neutron/NN spectrum/NN ^^^^^^^^^^^^^^ -- David Elworthy From corpora-request@uib.no Thu Jul 15 16:36:32 1993 Date: Thu, 15 Jul 93 14:36:32 +0200 From: veronis@grtc.cnrs-mrs.fr (jean Veronis) To: corpora@hd.uib.no, linguist@tamsun.tamu.edu, humanist@brownvm.brown.edu, Subject: CHum special issue I would like to bring to your attention a recently published special issue of COMPUTERS AND THE HUMANITIES which is of particular interest to readers of this list. COMPUTERS AND THE HUMANITIES Volume 26 Nos. 5-6 December 1992 COMMON METHODOLOGIES IN HUMANITIES COMPUTING AND COMPUTATIONAL LINGUISTICS Guest editors: Nancy Ide and Donald Walker CONTENTS: NANCY IDE and DONALD WALKER / Introduction: Common Methodologies in Humanities Computing and Computational Linguistics DOUGLAS BIBER / The Multi-Dimensional Approach to Linguistic Analyses of Genre Variation: An Overview of Methodology and Findings HARALD BAAYEN / Statistical Models for Word Frequency Distributions: A Linguistic Evaluation ADAM KILGARRIFF / Dictionary Word Sense Distinctions : An Enquiry into Their Nature EVAN L. ANTWORTH / Glossing Text with the PC-KIMMO Morphological Parser FRANK SMADJA / XTRACT: An Overview WILLIAM A. GALE, KENNETH W. CHURCH, and DAVID YAROWSKY / A Method for Disambiguating Word Senses in a Large Corpus SAM COATES-STEPHENS / The Analysis and Acquisition of Proper Names for the Understanding of Free Text For information about COMPUTERS AND THE HUMANITIES, contact Kluwer Academic Publishers, Group P.O. Box 322, 3300 AH Dordrecht, The Netherlands, or at P.O. Box 358, Accord Station, Hingham, Massachusetts 02018-0358 USA. From corpora-request@uib.no Mon Jul 19 06:33:57 1993 Date: Mon, 19 Jul 1993 10:33:57 -0400 From: Keith Miller To: corpora@hd.uib.no Subject: Japanese CA tools Hello all-- Besides JUMAN, is anyone aware of good corpus analysis tools for Japanese? I would be interested in tagging / search software as well as parsers. ftp addresses for the most recent versions of such tools -- including JUMAN -- (or pointers to such addresses) would be most welcome. Thank you in advance. ----- Keith J. Miller millerk@guvax.georgetown.edu From corpora-request@uib.no Wed Jul 21 12:02:01 1993 Date: Wed, 21 Jul 93 16:02:01 EDT From: prieto@mbeya.research.att.com (pilar prieto) To: linguist@tamsun.tamu.edu, corpora@hd.uib.no prosody@psuvm.psu.edu, empiricists@csli.stanford.edu, humanist@brownvm.brown.edu Subject: Spanish Reply-to: prieto@research.att.com I would appreciate any information or pointers about the existence of Spanish databases with the any of the following characteristics: 1/ Spanish text with parts-of-speech labels. 2/ Spanish text with syntactic labels. 3/ Spanish speech (any dialect) with phoneme segmentation. 4/ Spanish speech (any dialect) with some kind of prosodic labelling. I would also appreciate any information on commercial and non-commercial programs for Spanish text processing: 1/ parts-of-speech taggers 2/ syntactic parsers Please reply to my address, and I'll summarize the responses to the list. Thanks in advance, Pilar Prieto prieto@research.att.com From corpora-request@uib.no Thu Jul 22 12:06:00 1993 Date: Thu, 22 Jul 93 16:06 EDT From: lewis@research.att.com (David Lewis) To: CORPORA@hd.uib.no Subject: SIGIR 94 CFP C A L L F O R P A P E R S 17th International Conference on Research and Development in Information Retrieval -- SIGIR'94 Sponsored by Dublin City University in cooperation with ACM, BCS-IRSG, GI, AICA-GLIR, CEPIS-EIRSG and ICS The SIGIR'94 conference will take place in Dublin. Ireland, from 3rd to 6th July, 1994. This conference is a forum for the exchange of ideas and reporting of work done in areas related to information retrieval and covers information retrieval theory, user interface issues, multimedia, natural language processing, advanced techniques implementations and system issues, networked information retrieval, applications and many other areas. Program co-chairs are Prof. Keith van Rijsbergen (Glasgow U.) and Prof. W Bruce Croft (UMass). Contributions to the conference can be in the form of papers, panels, tutorials or workshops. The deadlines for submission are 6 January 1994 (papers) and 14 february 1994 (others). For a copy of the full call for papers send e-mail to sigir-cfp@ca.dcu.ie or contact the conference chair. Full details on submission formats for papers, panels, tutorials or workshops may be obtained by sending e-mail to sigir-format@ca.dcu.ie or contacting the general conference chair. To be added to the mailing list send e-mail to sigir-info@ca.dcu.ie. Conference Chair Dr Alan Smeaton Tel: +353 - 1 - 7045262 School of Computer Applications Fax: +353 - 1 - 7045442 Dublin City University, e-mail: asmeaton@ca.dcu.ie Glasnevin, Dublin 9, IRELAND From corpora-request@uib.no Thu Jul 29 17:38:39 1993 Date: Thu, 29 Jul 1993 16:38:39 +0100 From: Lou Burnard To: CORPORA@HD.UIB.NO Subject: Oxford Text Archive news *--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--* THE OXFORD TEXT ARCHIVE IS PLEASED TO ANNOUNCE... *--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--* * a new Short List of titles held at Oxford * 40 titles now available in TEI format for anonymous FTP * a new FTP service for licensed access via the Internet It's been a long time since we posted any news of our activities to this or other lists. It's not that we've been inactive -- quite the opposite in fact. * We have been converting texts to a standard TEI-compatible mark up (with much appreciated help from Jeffrey Triggs at Bellcore, and John Price-Wilkin at Virginia). * We have been experimenting with ways of saving time and money by using FTP, Gopher, WWW etc to deliver material rather than tapes and disks * We have been scouring the networks for new material of all kinds * We have been trying to find some additional and reliable sources of funding, but cannot report much progress. Any philanthropists out there, please form an orderly queue. ***** NEW ACCESSIONS ****** Our latest catalogue lists 1336 titles, in 28 languages. We have about 1.2 Gb of textual data, most of it freely available, some of it restricted in one way or another. We want more. We're particularly interested in scholarly minority-interest material which is not going to turn up on CD-anything in the foreseeable future. We don't charge fees to look after your material, and we keep track of what happens to it. We do our best to make sure that whatever texts you deposit with us are rendered as future-proof as we can make them but we don't change the information you recorded. We're archivists, not evangelists, for electronic text. At the same time, now that some kind of standardization is at last beginning to appear, we're eager to show that old wine can be put into new bottles. So you'll find that quite a few texts are now available in more than one form -- both the original, and a "TEI-compatible" form. (When the original form is easily available elsewhere, and particularly when the TEI form has more information in it, then we may well drop the former from the catalogue. But don't worry: it's still in the Archive....) *********** NEW FTP SERVICES ************* Our ftp address is: ota.ox.ac.uk. You can log on as anonymous, quoting your e-mail address as a password. If you don't know how to use FTP, ask someone at your local computer centre. If someone there runs a Gopher, or WWW server, get them to point the little critter at the following useful files, which you can also download from the above address: ota/textarchive.list our current catalogue ota/textarchive.info information file + order form There are two classes of texts available from this FTP server (a) texts which are in TEI format and which we can make freely available (these all appear as category P texts in the shortlist) (b) texts which are available only under our standard conditions of use, (these all appear as category U or A in the shortlist) [Just to confuse the issue, there are also texts which appear as category P texts in the Shortlist, because they are freely available, but which we have not yet checked or converted for TEI compatibility, and which are therefore not available from our FTP server, though you may well be able to get them from someone else's. We will distribute them in the same way as (b) class texts if you insist.] A CLASS TEXTS (Freely Available) You can just download these without formality using standard FTP commands. In some cases there are additional usage constraints, specified in the TEI header. We also hope that you won't redistribute these texts in a mutilated state or without acknowledgment of where you got them from. We can't enforce any of these things, obviously. We think that the Internet is successful because -- and as long as -- people trust each other. To see what (a) class texts are available now, just take a look in the directory ota. It's arranged, like the ShortList, by language, and within that by Author. There are x texts in there today, and there will be more. Each text has a conformant TEI header, and each text is a legal TEI compatible document, using a special document type definition (dtd), which you can also download from the same directory (look in ota/TEI). Eventually, there'll be some more introductory stuff on what SGML is, why the TEI is a Good Thing etc etc. Just now, we're working flat out getting the texts in there. Here's the list of what was there when I prepared this note: Anonymous: Gammer Gurtons Needle Edgar Rice Burroughs: A Princess of Mars Wilkie Collins: The Woman in White Joseph Conrad: Lord Jim; Nigger of the Narcissus Charles Darwin: Origin of Species Arthur Conan Doyle: Adventures of Sherlock Holmes; Casebook of Sherlock Holmes; His last bow; Memoirs of Sherlock Holmes; Sign of Four; Valley of Fear; Hound of the Baskervilles; Return of Sherlock Holmes; A study in Scarlet Henry James: The Europeans; Roderick Hudson; The Watch Jack London: Klondike Tales; The Seawolf; The Call of the Wild; Whitefang Andrew Marvell: English Poems (1688) Herman Melville: Moby Dick John Milton: Paradise Lost Lucy M. Montgomery: Ann of Avonlea William Morris: News from Nowhere Baroness Orczy: The Scarlet Pimpernel Bram Stoker: Dracula Antony Trollope: Lady Anna; Ayalas Angel; The Eustace Diamonds; Can you forgive her; Phineas Finn; Phineas Redux; Rachel Ray; Dr Wortle's School; Mark Twain: A Connecticut Yankee at the court of King Arthur H.G. Wells: The Invisible Man; The War of the Worlds; The Time Machine (B) CLASS TEXTS : (Restricted access) The majority of texts in the Archive are and always have been held in trust for a Depositor. Rather than keep track of a zillion different contracts with each Depositor, we worked out a single contract which is the basis of our standard user declaration form. It has served to keep us out of the law courts for the last twenty five years, so it can't have been all bad. Because it's a contract, we have to have a signed paper copy of the declaration in our hands before we can issue copies of the texts. Once we have that declaration, we can send you copies of restricted texts, on diskette, cartridge or magnetic tape, or even over the network. Up till this week, the only way you could get copies of (b) class texts over the network was to tell us an account and password on your machine. We would then bash the files across to you, for free. This was a rather unsatisfactory procedure in several ways: we think we now have a better one. It's still free and it works like this: - you send us a signed order form, as usual - on the order form you specify the password of your choice - we place copies of the files you ordered in a special directory under ota, access to which requires you to quote both a personal identifier (which we will give you) and the password (which you have told us) - we send you e-mail giving details of how to access the directory - you download copies of the files you ordered, using conventional ftp commands - after a fixed period of time (usually about a week) your personal identifier is removed and the file copies deleted **********THE DOWN SIDE************ We save until the very end of this note the inevitable piece of bad news. After 25 years, we've been told very firmly that we have to increase our prices to something a bit nearer a realistic level. Not only that, but within the European Community we must charge VAT at 17.5% on every order. We've taken this opportunity to rethink the way in which we charge slightly. We charge only for material costs, postage and packing on orders for texts sent on magnetic media of various kinds. We have abolished the "per text" fee, and we are no longer insisting on payment in advance. We are still charging over the odds for diskettes because they take us a disproportionate amount of effort to produce. The cost is worked out as follows: Magnetic tape: #50 ($80) each DC350 tape cartridge #30 ($50) each Diskette #20 ($35) each Invoicing charge #10 ($20) payable if order is not prepaid Postage surcharge #10 ($20) for orders outside EC Add VAT at 17.5% for orders within EC We will continue to give an estimate for the cost of any order free of charge. And, of course, if you use our new FTP service, then you don't need to pay us a penny. We look forward to hearing from you in the new academic year! Lou Burnard and Alan Morrison *--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--* Oxford Text Archive email: archive@ox.ac.uk Oxford University Computing Services tel: +44 865 273238 13 Banbury Road, Oxford OX2 6NN, UK fax: +44 865 273275 *--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--* From corpora-request@uib.no Fri Jul 30 22:18:48 1993 From: koehler@utrurt.Uni-Trier.DE (Prof. Dr. Koehler) Subject: Dateitransfer To: corpora@hd.uib.no Date: Fri, 30 Jul 1993 20:18:48 +0200 (MSZ) Please, redistribute this CfP on relevant bulletin boards. CALL FOR PAPERS A new journal on QUANTITATIVE LINGUISTICS will be launched in 1994. Authors are invited to submit 4 copies of their article providing it is within the scope of the journal. Scope The Journal of Quantitative Linguistics is a new international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics and psychology. Specifically, JQL will publish papers on: 1) Observations and descriptions of all aspects of language and text phenomena, including the areas of psycholinguistics, sociolinguistics, dialectology, pragmatics, etc. as far as they use quantitative mathematical methods (probability theory, stochastic processes, differential and difference equations, fuzzy logics and set theory, function theory etc.), on all levels of linguistic analysis. 2) Applications of methods, models, or findings from quantitative linguistics to problems of natural language processing, machine translation, language teaching, documentation and information retrieval. 3) Methodological problems of linguistic measurement, model construction, sampli ng and test theory. 4) Epistemological issues such as explanation of language and text phenomena, contributions to theory construction, systems theory, philosophy of science. Audience The Journal of Quantitative Linguistics will be important reading for all researchers in the following disciplines who are interested in quantitative methods and observations: linguistics, mathematics, statistics, artificial intelligence, cognitive science, and stylistics. The Journal is edited by Reinhard Koehler, linguistische Datenverarbeitung, University of Trier, Germany and will initially have 3 issues a year. Associate Editors: Gabriel Altmann, Bochum Sheila Embleton, York Assistant Editor: Peter Schmidt, Trier Editorial Board: Harald Baayen, Nijmegen Kenneth Church, Murray Hill, NJ Jacques Guy, Clayton (Australia) Christian Delcourt, Liege Lud^ek Hr^eb'i^cek, Prague Tatsuo Miyajima, Osaka Hiroshi Nakano, Tokyo Rajmond G. Piotrowski, St. Peterburg Anatolij A. Polikarpov, Moscow Roland Posner, Berlin Burghard Rieger, Trier Jadwiga Sambor, Warsow Pauli Saukkonen, Oulu/Helsinki Royal Skousen, Provo, Utah Philippe Thoiron, Lyon How to Submit Send 4 copies to one of these addresses: The Journal of Quantitative Linguistics Editorial Office Swets & Zeitlinger / SPS P.O. Box 825 2160 SZ Lisse The Netherlands or The Journal of Quantitative Linguistics Editorial Office Swets & Zeitlinger / SPS P.O. Box 613 ROYERSFORD, PA 19468 U.S.A. For further information please contact by e-mail: koehler@ldv01.uni-trier.de or Scrivy@swets.nl ---------------------------------------------------------------------------- From corpora-request@uib.no Sat Jul 31 03:16:43 1993 Date: Sat, 31 Jul 93 09:16:43 CST From: rocltsh@iis.sinica.edu.tw (ROCLING) To: corpora@hd.uib.no Subject: PACFoCoL/ROCLING Pacific Asia Conference On Formal and Computational Linguistics (PACFoCoL I) Taipei, August 30-31, 1993 Registration Form Name: Date of Birth: Gender: Country of Residence: Passport #: Institution: Address: E-mail Address: Tel: Fax number: Registration Fees for PACFoCoL: *Payment before 8-14-93 for those who register at the same time for ROCLING VI (see reverse side for additional ROCLING registration and hotel fees): ___ US$ 30 *Payment before 8/14 (for those attending PACFoCoL only): ___ US$ 50 PACFoCoL Conference Site: Academic Activity Center, Academia Sinica, Taipei Accommodations: Academic Activity Center, Academia Sinica Do you need our help in booking a room at Academic Activity Center? Yes _________ No __________ Single or Double Room? __________________ Check in Date _____________ Check out Date _____________ Daily charge for a single room is roughly US $25 and for a double room is US $32. Please inform us before August 14 if you need our help to book a room. You will need to pay for your hotel fees in Taiwan dollars before checking-out of the hotel. Please fill out this form, as well as the reverse side if you are also attending ROCLING VI and enclose a check of registration fees payable to the Computational Linguistics Society of R.O.C. and return it to the following address before 8/14/93: Miss Shu-Hui Tsai ROCLING Institute of Information Science Academia Sinica, Nankang Taipei 115, Taiwan, R.O.C. If you have any questions, please contact Miss Shu-Hui Tsai: Tel & Fax: 886-2-7881638 E-mail address: rocltsh@iis.sinica.edu.tw PLEASE SEE THE FOLLOWING PAGE FOR ROCLING VI REGISTRATION AND PAYMENT R.O.C. Computational Linguistics Conference VI ROCLING VI Hsitou National Park September 2-4, 1993 Theme: Computational Semantics/Corpus Linguistics Registration Form Name: Date of Birth: Gender: Country of Residence: Passport Number: Institute Affliation: Mailing Address: Email Address: Telephone: Fax Number: REGISTRATION FEES AND HOTEL FEES MUST BE PAID BY MAIL BEFORE AUGUST 14, 1993: Registration Fees: Member of ROC Comp. Ling. Society: ___ US $110 1 year membership fee: ___ US $50 (includes monthly newsletter and reduced fees on activities) Non-member: ___ US $140 Registration fees include round-trip transportation to and from Academia Sinica to conference site of Hsi-tou National Park, park entrance fees, conference proceedings, and all meals. THE BUS WILL LEAVE PROMPTLY AT 7AM ON 9/2 FROM THE FRONT DOOR OF THE ACTIVITY CENTER AT ACADEMIA SINICA. Hotels Fees for 2 nights (9/2 and 9/3) at Hsitou Park: Double Room: ___ US $50 per person (please specify name of person you wish to share with: _______________________________________) Single Room: ___ US $60 Total: PACFoCoL Registration (from reverse side) + ROCLING VI Registration + (ROCLING Membership Fee) + Hotel Fees for Hsitou Park: US $____ (check payment information on reverse side) PLEASE FILL OUT PACFoCoL REGISTRATION ON THE PRECEDING PAGE PACFOCOL I/ROCLING VI Optional Activities on Sept. 1, 1993 Morning Activities: A. 9:30-12:00 Visiting the Chinese Knowledge Information Processing (CKIP) Group of the Institute of Information Science at Academia Sinica. (Classical & Modern Chinese Corpora, Electronic Dictionary, Chinese Parser, etc.) B. 9:30-12:00 A guided tour of National Palace Museum in Taipei. Afternoon Activities: C. 3:00-5:00 Visiting the Behavior Design Corporation in Hsin- chu Science-based Industrial Park. (Demonstration of BDC EC MT System, D-Top Bilingual Desktop Publishing [W/ KWIC, Bilingual Dict.], Grammar Checker, On-line OCR, Personal Manager, etc.) Visitors will stay overnight in Hsin-chu and will board the bus going to ROCLING in Hsin-chu. D. 3:00-5:00 A guided tour of the Museum of the Institute of History and Philology, Academia Sinica. (Museum not usually open to the public and can be visited by appointment only.) Notes: 1. The above activities are free. However, participants are responsible for their own expenses, such as food, transportation and accommodation. 2. Spaces for the two visiting tours (A&C) are limited and will be allotted in the order of registration. 3. Unless indicated otherwise, we will book a hotel at Hsin-chu for the night of Sept.1 for participants of C. _____________________________________________________________ Please fill out the following form and return it with the registration form by August 14. I would like to attend the following activities [] one activity (please circle one) A, B, C, D [] two activities (please circle one) AC, AD, BC, BD Signature _______________________ Pacific Asia Conference on Formal and Computational Linguistics 1993/8/30 8:30 - 9:00 Registration 9:00 - 9:10 Opening Remarks 9:10 - 10:10 Invited Speaker John Nerbonne "Constraint-Based Semantics" (Groningen U.) 10:10 - 11:10 Invited Speaker Mary Dalrymple "Reciprocals and the Syntax-semantics (Xerox PARC) Interface" 11:10 - 11:30 Tea Break 11:30 - 12:30 Session 1 1.Chih-Chen Jane Tang "On the Distribution and Transportability (Academia Sinica) of Adjuncts in Chinese" 2.Kathleen Ahrens "Additional Evidence for LF: Wh-words in (U. C. San Diego) Mandarin Chinese" 3.Yu-Fang Wang "The Chinese NPI Renhe In Contexts With (Taiwan Normal U.) Negative Values" 12:30 - 13:30 Lunch 13:30 - 14:30 Session 2 4.Chunyu Kit "Description of Chinese intransitive verbs (Carnegie Mellon U.) and adjuncts within the LFG formalism" 5.Jong-Bok Kim "A Constraint-Based and Lexical Approach to (Stanford U.) Korean Verb Inflections" 6.Hongming Zhang "The Syntactic Condition of Taiwanese Tone (U. of Singapore) Sandhi" 14:30 - 14:50 Tea Break 14:50 - 15:50 Session 3 7.Yoichi Uetake "Two Formal Representations of The Thematic- (Tokyo U.) Rhematic Structure of Sentences" 8.Michiko Nakano "Cognitive Semantics and A Boolean Valued (Waseda U.) Model" 9.Paul Horng Jyh Wu "Toward Integrating Concept Hierarchies" (U. of Singapore) 15:50 - 16:10 Tea Break - 1 - 16:10 - 17:10 Session 4 10.Ruslan Mitkov "Automatic Abstracting in a Limited Domain" (U. of Science Malaysia) 11.H.C. Ho "Using Syntactic Markers and Semantic Frame Benjamin K. T'sou Knowledge Representation in Automatic Terence Y.W. Chan Chinese Text Abstraction Tom B.Y. Lai Suen Caesar Lun (City Polytechnic of H.K.) 12.Yuangshan Chuang "A Quantitative Corpus Analysis of Word (U. of Illinois) Frequency and Part of Speech in The English Textbooks Used in Senior High Schools in Taiwan" 18:00 Dinner 1993/8/31 9:00 - 10:00 Invited Speaker Mark Liberman "Is Syntax Hard to Learn?" (U. of Penn.) 10:00 - 11:00 Kenneth Church "Aligning Parallel Texts:Do Methods Develop ed (AT&T Bell Labs) for English-French Generalize to Asian Languages?" 11:00 - 11:20 Tea Break 11:20 - 12:20 Session 5 13.Hiroto Ohnishi "Intensional Contexts and Common Knowledge" (Toyo Women's U.) Seiki Akama (Teikyo U.) 14.Akira Ishikawa "Dynamic Temporal Reasoning in Japanese" (Sophia U.) 15.Ryooya Okabe "On Flattening Categories in Categorial (Sophia U.) Grammar" 12:20 - 13:30 Lunch 13:30 - 14:50 Session 6 16.Cheng-Hui Liu "On Transitivity in Pre-Qin Chinese--The (Academia Sinica) Application of Computational Corpus in Historical Chinese Syntax" 17.Benjamin K T'sou "The Pragmatics of Bargaining in Chinese¡G A Computational Model" (City Polytechnic of H.K.) - 2 - 18.Ching-Yu Chen "Some Distributional Properties of Mandarin Shu-Fen Tseng Chinese -- A Study based on the Academia Keh-jiann Chen Sinica Coupus" Chu-Ren Huang (Academia Sinica) 19.Ruslan Mitkov "A Knowledge-Based and Sublanguage-Oriented (U. of Science Approach for Anaphora Resolution" Malaysia) 14:50 - 15:20 Tea Break 15:20 - 16:20 Session 7 20.Claire Chang "Complex Stative Construction:Resultative (Cheng Chi U.) or Descriptive?" 21.Shen-Min Chang "V+qi(lai) Compounds in Mandarin Chinese" (Tsing Hua U.) 22.Zhao-Ming Gao "On the Syntactic Structure of Evaluative Chu-Ren Huang V-qilai Construction in Chinese" Chih-Chen Jane Tang (Academia Sinica) 16:20 - 16:40 Tea Break 16:40 - 17:40 Invited Speaker Ting-Chi Tang "A 'Theta-Grid' approach to a Contrastive (Tsing Hua U.) Analysis of English, Chinese and Japanese" Alternate Papers: 1.One-Soon Her "LECS: An LFG-Based Unification Grammar Formalism for Natural Language Processing" 2.Yong Kui Zhang "Using Cluster Analysis for Processing English Texts" James R. Cowie 3.Hui-i Amy Kung "A Small-Clause Analysis for the Mandarin Double Object Construction" 4.Hui-Chuan Hsu "Imperative Forms in Sediq:A Perspective froms the Morphemic Plane Hypothesis" 5.Wen-Jen Wei "Quantifier Phrase Incorporation in Chinese" - 3 -  From corpora-request@uib.no Sat Jul 31 03:24:24 1993 Date: Sat, 31 Jul 93 09:24:24 CST From: rocltsh@iis.sinica.edu.tw (ROCLING) To: corpora@hd.uib.no Subject: PACFoCoL/ROCLING Pacific Asia Conference On Formal and Computational Linguistics (PACFoCoL I) Taipei, August 30-31, 1993 Registration Form Name: Date of Birth: Gender: Country of Residence: Passport #: Institution: Address: E-mail Address: Tel: Fax number: Registration Fees for PACFoCoL: *Payment before 8-14-93 for those who register at the same time for ROCLING VI (see reverse side for additional ROCLING registration and hotel fees): ___ US$ 30 *Payment before 8/14 (for those attending PACFoCoL only): ___ US$ 50 PACFoCoL Conference Site: Academic Activity Center, Academia Sinica, Taipei Accommodations: Academic Activity Center, Academia Sinica Do you need our help in booking a room at Academic Activity Center? Yes _________ No __________ Single or Double Room? __________________ Check in Date _____________ Check out Date _____________ Daily charge for a single room is roughly US $25 and for a double room is US $32. Please inform us before August 14 if you need our help to book a room. You will need to pay for your hotel fees in Taiwan dollars before checking-out of the hotel. Please fill out this form, as well as the reverse side if you are also attending ROCLING VI and enclose a check of registration fees payable to the Computational Linguistics Society of R.O.C. and return it to the following address before 8/14/93: Miss Shu-Hui Tsai ROCLING Institute of Information Science Academia Sinica, Nankang Taipei 115, Taiwan, R.O.C. If you have any questions, please contact Miss Shu-Hui Tsai: Tel & Fax: 886-2-7881638 E-mail address: rocltsh@iis.sinica.edu.tw PLEASE SEE THE FOLLOWING PAGE FOR ROCLING VI REGISTRATION AND PAYMENT R.O.C. Computational Linguistics Conference VI ROCLING VI Hsitou National Park September 2-4, 1993 Theme: Computational Semantics/Corpus Linguistics Registration Form Name: Date of Birth: Gender: Country of Residence: Passport Number: Institute Affliation: Mailing Address: Email Address: Telephone: Fax Number: REGISTRATION FEES AND HOTEL FEES MUST BE PAID BY MAIL BEFORE AUGUST 14, 1993: Registration Fees: Member of ROC Comp. Ling. Society: ___ US $110 1 year membership fee: ___ US $50 (includes monthly newsletter and reduced fees on activities) Non-member: ___ US $140 Registration fees include round-trip transportation to and from Academia Sinica to conference site of Hsi-tou National Park, park entrance fees, conference proceedings, and all meals. THE BUS WILL LEAVE PROMPTLY AT 7AM ON 9/2 FROM THE FRONT DOOR OF THE ACTIVITY CENTER AT ACADEMIA SINICA. Hotels Fees for 2 nights (9/2 and 9/3) at Hsitou Park: Double Room: ___ US $50 per person (please specify name of person you wish to share with: _______________________________________) Single Room: ___ US $60 Total: PACFoCoL Registration (from reverse side) + ROCLING VI Registration + (ROCLING Membership Fee) + Hotel Fees for Hsitou Park: US $____ (check payment information on reverse side) PLEASE FILL OUT PACFoCoL REGISTRATION ON THE PRECEDING PAGE PACFOCOL I/ROCLING VI Optional Activities on Sept. 1, 1993 Morning Activities: A. 9:30-12:00 Visiting the Chinese Knowledge Information Processing (CKIP) Group of the Institute of Information Science at Academia Sinica. (Classical & Modern Chinese Corpora, Electronic Dictionary, Chinese Parser, etc.) B. 9:30-12:00 A guided tour of National Palace Museum in Taipei. Afternoon Activities: C. 3:00-5:00 Visiting the Behavior Design Corporation in Hsin- chu Science-based Industrial Park. (Demonstration of BDC EC MT System, D-Top Bilingual Desktop Publishing [W/ KWIC, Bilingual Dict.], Grammar Checker, On-line OCR, Personal Manager, etc.) Visitors will stay overnight in Hsin-chu and will board the bus going to ROCLING in Hsin-chu. D. 3:00-5:00 A guided tour of the Museum of the Institute of History and Philology, Academia Sinica. (Museum not usually open to the public and can be visited by appointment only.) Notes: 1. The above activities are free. However, participants are responsible for their own expenses, such as food, transportation and accommodation. 2. Spaces for the two visiting tours (A&C) are limited and will be allotted in the order of registration. 3. Unless indicated otherwise, we will book a hotel at Hsin-chu for the night of Sept.1 for participants of C. _____________________________________________________________ Please fill out the following form and return it with the registration form by August 14. I would like to attend the following activities [] one activity (please circle one) A, B, C, D [] two activities (please circle one) AC, AD, BC, BD Signature _______________________ Pacific Asia Conference on Formal and Computational Linguistics 1993/8/30 8:30 - 9:00 Registration 9:00 - 9:10 Opening Remarks 9:10 - 10:10 Invited Speaker John Nerbonne "Constraint-Based Semantics" (Groningen U.) 10:10 - 11:10 Invited Speaker Mary Dalrymple "Reciprocals and the Syntax-semantics (Xerox PARC) Interface" 11:10 - 11:30 Tea Break 11:30 - 12:30 Session 1 1.Chih-Chen Jane Tang "On the Distribution and Transportability (Academia Sinica) of Adjuncts in Chinese" 2.Kathleen Ahrens "Additional Evidence for LF: Wh-words in (U. C. San Diego) Mandarin Chinese" 3.Yu-Fang Wang "The Chinese NPI Renhe In Contexts With (Taiwan Normal U.) Negative Values" 12:30 - 13:30 Lunch 13:30 - 14:30 Session 2 4.Chunyu Kit "Description of Chinese intransitive verbs (Carnegie Mellon U.) and adjuncts within the LFG formalism" 5.Jong-Bok Kim "A Constraint-Based and Lexical Approach to (Stanford U.) Korean Verb Inflections" 6.Hongming Zhang "The Syntactic Condition of Taiwanese Tone (U. of Singapore) Sandhi" 14:30 - 14:50 Tea Break 14:50 - 15:50 Session 3 7.Yoichi Uetake "Two Formal Representations of The Thematic- (Tokyo U.) Rhematic Structure of Sentences" 8.Michiko Nakano "Cognitive Semantics and A Boolean Valued (Waseda U.) Model" 9.Paul Horng Jyh Wu "Toward Integrating Concept Hierarchies" (U. of Singapore) 15:50 - 16:10 Tea Break - 1 - 16:10 - 17:10 Session 4 10.Ruslan Mitkov "Automatic Abstracting in a Limited Domain" (U. of Science Malaysia) 11.H.C. Ho "Using Syntactic Markers and Semantic Frame Benjamin K. T'sou Knowledge Representation in Automatic Terence Y.W. Chan Chinese Text Abstraction Tom B.Y. Lai Suen Caesar Lun (City Polytechnic of H.K.) 12.Yuangshan Chuang "A Quantitative Corpus Analysis of Word (U. of Illinois) Frequency and Part of Speech in The English Textbooks Used in Senior High Schools in Taiwan" 18:00 Dinner 1993/8/31 9:00 - 10:00 Invited Speaker Mark Liberman "Is Syntax Hard to Learn?" (U. of Penn.) 10:00 - 11:00 Kenneth Church "Aligning Parallel Texts:Do Methods Develop ed (AT&T Bell Labs) for English-French Generalize to Asian Languages?" 11:00 - 11:20 Tea Break 11:20 - 12:20 Session 5 13.Hiroto Ohnishi "Intensional Contexts and Common Knowledge" (Toyo Women's U.) Seiki Akama (Teikyo U.) 14.Akira Ishikawa "Dynamic Temporal Reasoning in Japanese" (Sophia U.) 15.Ryooya Okabe "On Flattening Categories in Categorial (Sophia U.) Grammar" 12:20 - 13:30 Lunch 13:30 - 14:50 Session 6 16.Cheng-Hui Liu "On Transitivity in Pre-Qin Chinese--The (Academia Sinica) Application of Computational Corpus in Historical Chinese Syntax" 17.Benjamin K T'sou "The Pragmatics of Bargaining in Chinese¡G A Computational Model" (City Polytechnic of H.K.) - 2 - 18.Ching-Yu Chen "Some Distributional Properties of Mandarin Shu-Fen Tseng Chinese -- A Study based on the Academia Keh-jiann Chen Sinica Coupus" Chu-Ren Huang (Academia Sinica) 19.Ruslan Mitkov "A Knowledge-Based and Sublanguage-Oriented (U. of Science Approach for Anaphora Resolution" Malaysia) 14:50 - 15:20 Tea Break 15:20 - 16:20 Session 7 20.Claire Chang "Complex Stative Construction:Resultative (Cheng Chi U.) or Descriptive?" 21.Shen-Min Chang "V+qi(lai) Compounds in Mandarin Chinese" (Tsing Hua U.) 22.Zhao-Ming Gao "On the Syntactic Structure of Evaluative Chu-Ren Huang V-qilai Construction in Chinese" Chih-Chen Jane Tang (Academia Sinica) 16:20 - 16:40 Tea Break 16:40 - 17:40 Invited Speaker Ting-Chi Tang "A 'Theta-Grid' approach to a Contrastive (Tsing Hua U.) Analysis of English, Chinese and Japanese" Alternate Papers: 1.One-Soon Her "LECS: An LFG-Based Unification Grammar Formalism for Natural Language Processing" 2.Yong Kui Zhang "Using Cluster Analysis for Processing English Texts" James R. Cowie 3.Hui-i Amy Kung "A Small-Clause Analysis for the Mandarin Double Object Construction" 4.Hui-Chuan Hsu "Imperative Forms in Sediq:A Perspective froms the Morphemic Plane Hypothesis" 5.Wen-Jen Wei "Quantifier Phrase Incorporation in Chinese" - 3 -  From corpora-request@uib.no Mon Aug 2 19:11:44 1993 Date: Mon, 2 Aug 1993 17:11:44 +0200 From: " (02-Aug-1993 1654)" To: corpora@x400.hd.uib.no Subject: Looking for medical texts I am looking for text corpora in the domain of medicine to test some software we've been developing. I have heard about a corpus of radiology reports that is available for research, it is mentioned on the DCI CD ROM. Does anybody have any information (a contact) for this or other corpora ? If availability of such corpora is restricted by e.g. privacy restrictions, then perhaps "anonymized" versions could be produced ? Pim. ------------------------------------------------------------------ Pim van der Eijk eijk@cecamo.enet.dec.com Cooperative Engineering Center cecamo::eijk Digital Equipment Corporation Tel: +31-20-5866021 Kabelweg 21 1014 BA Amsterdam Fax: +31-20-6824772 The Netherlands From corpora-request@uib.no Tue Aug 3 10:14:36 1993 Date: Tue, 3 Aug 93 16:14:36 CST From: rocltsh@iis.sinica.edu.tw (ROCLING) To: corpora@hd.uib.no Subject: PACFoCoL/ROCLING Pacific Asia Conference on Formal and Computational Linguistics Program 1993/8/30 8:30 - 9:00 Registration 9:00 - 9:10 Opening Remarks 9:10 - 10:10 Invited Speaker John Nerbonne "Constraint-Based Semantics" (Groningen U.) 10:10 - 11:10 Invited Speaker Mary Dalrymple "Reciprocals and the Syntax-semantics (Xerox PARC) Interface" 11:10 - 11:30 Tea Break 11:30 - 12:30 Session 1 1.Chih-Chen Jane Tang "On the Distribution and Transportability (Academia Sinica) of Adjuncts in Chinese" 2.Kathleen Ahrens "Additional Evidence for LF: Wh-words in (U. C. San Diego) Mandarin Chinese" 3.Yu-Fang Wang "The Chinese NPI Renhe In Contexts With (Taiwan Normal U.) Negative Values" 12:30 - 13:30 Lunch 13:30 - 14:30 Session 2 4.Chunyu Kit "Description of Chinese intransitive verbs (Carnegie Mellon U.) and adjuncts within the LFG formalism" 5.Jong-Bok Kim "A Constraint-Based and Lexical Approach to (Stanford U.) Korean Verb Inflections" 6.Hongming Zhang "The Syntactic Condition of Taiwanese Tone (U. of Singapore) Sandhi" 14:30 - 14:50 Tea Break 14:50 - 15:50 Session 3 7.Yoichi Uetake "Two Formal Representations of The Thematic- (Tokyo U.) Rhematic Structure of Sentences" 8.Michiko Nakano "Cognitive Semantics and A Boolean Valued (Waseda U.) Model" 9.Paul Horng Jyh Wu "Toward Integrating Concept Hierarchies" (U. of Singapore) 15:50 - 16:10 Tea Break - 1 - 16:10 - 17:10 Session 4 10.Ruslan Mitkov "Automatic Abstracting in a Limited Domain" (U. of Science Malaysia) 11.H.C. Ho "Using Syntactic Markers and Semantic Frame Benjamin K. T'sou Knowledge Representation in Automatic Terence Y.W. Chan Chinese Text Abstraction Tom B.Y. Lai Suen Caesar Lun (City Polytechnic of H.K.) 12.Yuangshan Chuang "A Quantitative Corpus Analysis of Word (U. of Illinois) Frequency and Part of Speech in The English Textbooks Used in Senior High Schools in Taiwan" 18:00 Dinner 1993/8/31 9:00 - 10:00 Invited Speaker Mark Liberman "Is Syntax Hard to Learn?" (U. of Penn.) 10:00 - 11:00 Kenneth Church "Aligning Parallel Texts:Do Methods Develop ed (AT&T Bell Labs) for English-French Generalize to Asian Languages?" 11:00 - 11:20 Tea Break 11:20 - 12:20 Session 5 13.Hiroto Ohnishi "Intensional Contexts and Common Knowledge" (Toyo Women's U.) Seiki Akama (Teikyo U.) 14.Akira Ishikawa "Dynamic Temporal Reasoning in Japanese" (Sophia U.) 15.Ryooya Okabe "On Flattening Categories in Categorial (Sophia U.) Grammar" 12:20 - 13:30 Lunch 13:30 - 14:50 Session 6 16.Cheng-Hui Liu "On Transitivity in Pre-Qin Chinese--The (Academia Sinica) Application of Computational Corpus in Historical Chinese Syntax" 17.Benjamin K T'sou "The Pragmatics of Bargaining in Chinese¡G A Computational Model" (City Polytechnic of H.K.) - 2 - 18.Ching-Yu Chen "Some Distributional Properties of Mandarin Shu-Fen Tseng Chinese -- A Study based on the Academia Keh-jiann Chen Sinica Coupus" Chu-Ren Huang (Academia Sinica) 19.Ruslan Mitkov "A Knowledge-Based and Sublanguage-Oriented (U. of Science Approach for Anaphora Resolution" Malaysia) 14:50 - 15:20 Tea Break 15:20 - 16:20 Session 7 20.Claire Chang "Complex Stative Construction:Resultative (Cheng Chi U.) or Descriptive?" 21.Shen-Min Chang "V+qi(lai) Compounds in Mandarin Chinese" (Tsing Hua U.) 22.Zhao-Ming Gao "On the Syntactic Structure of Evaluative Chu-Ren Huang V-qilai Construction in Chinese" Chih-Chen Jane Tang (Academia Sinica) 16:20 - 16:40 Tea Break 16:40 - 17:40 Invited Speaker Ting-Chi Tang "A 'Theta-Grid' approach to a Contrastive (Tsing Hua U.) Analysis of English, Chinese and Japanese" Alternate Papers: 1.One-Soon Her "LECS: An LFG-Based Unification Grammar Formalism for Natural Language Processing" 2.Yong Kui Zhang "Using Cluster Analysis for Processing English Texts" James R. Cowie 3.Hui-i Amy Kung "A Small-Clause Analysis for the Mandarin Double Object Construction" 4.Hui-Chuan Hsu "Imperative Forms in Sediq:A Perspective froms the Morphemic Plane Hypothesis" 5.Wen-Jen Wei "Quantifier Phrase Incorporation in Chinese" - 3 - Pacific Asia Conference On Formal and Computational Linguistics (PACFoCoL I) Taipei, August 30-31, 1993 Registration Form Name: Date of Birth: Gender: Country of Residence: Passport #: Institution: Address: E-mail Address: Tel: Fax number: Registration Fees for PACFoCoL: *Payment before 8-14-93 for those who register at the same time for ROCLING VI (see reverse side for additional ROCLING registration and hotel fees): ___ US$ 30 *Payment before 8/14 (for those attending PACFoCoL only): ___ US$ 50 PACFoCoL Conference Site: Academic Activity Center, Academia Sinica, Taipei Accommodations: Academic Activity Center, Academia Sinica Do you need our help in booking a room at Academic Activity Center? Yes _________ No __________ Single or Double Room? __________________ Check in Date _____________ Check out Date _____________ Daily charge for a single room is roughly US $25 and for a double room is US $32. Please inform us before August 14 if you need our help to book a room. You will need to pay for your hotel fees in Taiwan dollars before checking-out of the hotel. Please fill out this form, as well as the reverse side if you are also attending ROCLING VI and enclose a check of registration fees payable to the Computational Linguistics Society of R.O.C. and return it to the following address before 8/14/93: Miss Shu-Hui Tsai ROCLING Institute of Information Science Academia Sinica, Nankang Taipei 115, Taiwan, R.O.C. If you have any questions, please contact Miss Shu-Hui Tsai: Tel & Fax: 886-2-7881638 E-mail address: rocltsh@iis.sinica.edu.tw PLEASE SEE THE FOLLOWING PAGE FOR ROCLING VI REGISTRATION AND PAYMENT ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 2. 1993 ================================================================= 6:30 Get on bus at Taiwan U. 7:00¡@ Get on bus at Activity Center, Academia Sinica 8:30¡@ Get on bus at Tsing-Hun U. | Check in 12:00 Lunch 14:00 - 15:00 Session Chair: Chu-Ren Huang Invited Speaker "Software for Applied Semantics" John Nerbonne 15:00 - 15:30 Tea Break 15:30 - 16:30 Session Chair: Chun-Sheng Chang Invited Speaker "The Resource Logic of Complex Mary Dalrymple Predicate Interpretation" 16:30 - 17:00 Tea Break Session I Session Chair: Hsin-Hsi Chen 17:00 - 17:30 Ming-Yu Lin "A Preliminary Study On Unknown Tung-Hui Chiang Word Problem In Chinese Word Keh-Yih Su Segmentation" 17:30 - 18:30 Tsai-Yen Peng "A Study on Chinese Lexical Chun-Sheng Chang Ambiguity, Word Segmentation and Tagging" 18:30 Dinner 19:30 Board of Director's Meeting ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 3. 1993 ================================================================= 7:30 - 8:00 Breakfast 8:30 - 9:30 Session Chair: Keh-Jiann Chen Invited Speaker "Part of Speech Tagging and Kenneth Church Suggestions for the Future" 9:30 - 10:30 Session Chair: Keh-Yih Su Invited Speaker Mark Liberman T.B.A. 10:30 - 11:00 Tea Break Session II Session Chair: Hsiao-Chuan Wang 11:00 - 11:30 Hsing-Ming Wang "An Algorithm for Automatically Yuan-Chen Chang Selecting Phonetically Balanced Ling-Shan Li Mandarin Sentences from Chinese Corpus" 11:30 - 12:00 Sung-Chien Lin "A Study of Word-Class Bigram Li-Feng Chien Approach to Linguistic Decoding Keh-Jiann Chen in Mandarin Speech Recognition" Ling-Shan Li 12:00 Lunch Session III Session Chair: Hsi-Chien Li 13:30 - 14:00 Yu-Hsi Li "A Storage Reduction Method Hsien-Hsi Chen for Corpus-Based Language Models" 14:00 - 14:30 Ming-Wen Wu "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 14:30 - 15:00 Chao-Huang Chang "Automatic Clustering of Chinese Characters and Words 15:00 - 15:30 Tea Break 15:30 - 16:30 CKIP "Segmentation Criterion and Matching Tagset for Mandarin Chinese" 16:30 - 17:30 Business Meeting 18:00 Banquet ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 4. 1993 ================================================================= 7:30 - 8:00 Breakfast Session IV Session Chair: Feng-Wen Su 8:30 - 8:45 Zhibiao Wu "Developing a Chinese Module in Loke Soo Hsu UNITRAN" Martha Palmer Chew Lin Tan 8:45 - 9:00 Audy Wong Man Hon "Fawrmt:With Special Emphasis Suen Caesar Lun On Grammar Designs and Partitioned Parsing" 9:00 - 9:30 Yun-Yen Yang "A Study of Document Auto- Keh-Jiann Chen Classification in Mandarin Ching-Chun Hsieh Chinese" Shu-Mei Chen 9:30 - 10:00 Tea Break Session V Session Chair: Wei-Chuan Li 10:00 - 10:30 Hsin-Hsi Chen "A Probabilistic Chunker" Kuang-hua Chen 10:30 - 11:00 Shih-Ping Wang "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 11:00 - 11:15 Feng-Wen Su "Toward Discourse-guided Thetagrid Chart Parsing for Madarin Chinese--A Preliminary Report" 11:30 Lunch Return to Academia Sinica R.O.C. Computational Linguistics Conference VI ROCLING VI Hsitou National Park September 2-4, 1993 Theme: Computational Semantics/Corpus Linguistics Registration Form Name: Date of Birth: Gender: Country of Residence: Passport Number: Institute Affliation: Mailing Address: Email Address: Telephone: Fax Number: REGISTRATION FEES AND HOTEL FEES MUST BE PAID BY MAIL BEFORE AUGUST 14, 1993: Registration Fees: Member of ROC Comp. Ling. Society: ___ US $110 1 year membership fee: ___ US $50 (includes monthly newsletter and reduced fees on activities) Non-member: ___ US $140 Registration fees include round-trip transportation to and from Academia Sinica to conference site of Hsi-tou National Park, park entrance fees, conference proceedings, and all meals. THE BUS WILL LEAVE PROMPTLY AT 7AM ON 9/2 FROM THE FRONT DOOR OF THE ACTIVITY CENTER AT ACADEMIA SINICA. Hotels Fees for 2 nights (9/2 and 9/3) at Hsitou Park: Double Room: ___ US $50 per person (please specify name of person you wish to share with: _______________________________________) Single Room: ___ US $60 Total: PACFoCoL Registration (from reverse side) + ROCLING VI Registration + (ROCLING Membership Fee) + Hotel Fees for Hsitou Park: US $____ (check payment information on reverse side) PLEASE FILL OUT PACFoCoL REGISTRATION ON THE PRECEDING PAGE PACFOCOL I/ROCLING VI Optional Activities on Sept. 1, 1993 Morning Activities: A. 9:30-12:00 Visiting the Chinese Knowledge Information Processing (CKIP) Group of the Institute of Information Science at Academia Sinica. (Classical & Modern Chinese Corpora, Electronic Dictionary, Chinese Parser, etc.) B. 9:30-12:00 A guided tour of National Palace Museum in Taipei. Afternoon Activities: C. 3:00-5:00 Visiting the Behavior Design Corporation in Hsin- chu Science-based Industrial Park. (Demonstration of BDC EC MT System, D-Top Bilingual Desktop Publishing [W/ KWIC, Bilingual Dict.], Grammar Checker, On-line OCR, Personal Manager, etc.) Visitors will stay overnight in Hsin-chu and will board the bus going to ROCLING in Hsin-chu. D. 3:00-5:00 A guided tour of the Museum of the Institute of History and Philology, Academia Sinica. (Museum not usually open to the public and can be visited by appointment only.) Notes: 1. The above activities are free. However, participants are responsible for their own expenses, such as food, transportation and accommodation. 2. Spaces for the two visiting tours (A&C) are limited and will be allotted in the order of registration. 3. Unless indicated otherwise, we will book a hotel at Hsin-chu for the night of Sept.1 for participants of C. _____________________________________________________________ Please fill out the following form and return it with the registration form by August 14. I would like to attend the following activities [] one activity (please circle one) A, B, C, D [] two activities (please circle one) AC, AD, BC, BD Signature _______________________  From corpora-request@uib.no Tue Aug 3 10:21:44 1993 Date: Tue, 3 Aug 93 16:21:44 CST From: rocltsh@iis.sinica.edu.tw (ROCLING) To: corpora@hd.uib.no Subject: PACFoCoL/ROCLING Pacific Asia Conference on Formal and Computational Linguistics Program 1993/8/30 8:30 - 9:00 Registration 9:00 - 9:10 Opening Remarks 9:10 - 10:10 Invited Speaker John Nerbonne "Constraint-Based Semantics" (Groningen U.) 10:10 - 11:10 Invited Speaker Mary Dalrymple "Reciprocals and the Syntax-semantics (Xerox PARC) Interface" 11:10 - 11:30 Tea Break 11:30 - 12:30 Session 1 1.Chih-Chen Jane Tang "On the Distribution and Transportability (Academia Sinica) of Adjuncts in Chinese" 2.Kathleen Ahrens "Additional Evidence for LF: Wh-words in (U. C. San Diego) Mandarin Chinese" 3.Yu-Fang Wang "The Chinese NPI Renhe In Contexts With (Taiwan Normal U.) Negative Values" 12:30 - 13:30 Lunch 13:30 - 14:30 Session 2 4.Chunyu Kit "Description of Chinese intransitive verbs (Carnegie Mellon U.) and adjuncts within the LFG formalism" 5.Jong-Bok Kim "A Constraint-Based and Lexical Approach to (Stanford U.) Korean Verb Inflections" 6.Hongming Zhang "The Syntactic Condition of Taiwanese Tone (U. of Singapore) Sandhi" 14:30 - 14:50 Tea Break 14:50 - 15:50 Session 3 7.Yoichi Uetake "Two Formal Representations of The Thematic- (Tokyo U.) Rhematic Structure of Sentences" 8.Michiko Nakano "Cognitive Semantics and A Boolean Valued (Waseda U.) Model" 9.Paul Horng Jyh Wu "Toward Integrating Concept Hierarchies" (U. of Singapore) 15:50 - 16:10 Tea Break - 1 - 16:10 - 17:10 Session 4 10.Ruslan Mitkov "Automatic Abstracting in a Limited Domain" (U. of Science Malaysia) 11.H.C. Ho "Using Syntactic Markers and Semantic Frame Benjamin K. T'sou Knowledge Representation in Automatic Terence Y.W. Chan Chinese Text Abstraction Tom B.Y. Lai Suen Caesar Lun (City Polytechnic of H.K.) 12.Yuangshan Chuang "A Quantitative Corpus Analysis of Word (U. of Illinois) Frequency and Part of Speech in The English Textbooks Used in Senior High Schools in Taiwan" 18:00 Dinner 1993/8/31 9:00 - 10:00 Invited Speaker Mark Liberman "Is Syntax Hard to Learn?" (U. of Penn.) 10:00 - 11:00 Kenneth Church "Aligning Parallel Texts:Do Methods Develop ed (AT&T Bell Labs) for English-French Generalize to Asian Languages?" 11:00 - 11:20 Tea Break 11:20 - 12:20 Session 5 13.Hiroto Ohnishi "Intensional Contexts and Common Knowledge" (Toyo Women's U.) Seiki Akama (Teikyo U.) 14.Akira Ishikawa "Dynamic Temporal Reasoning in Japanese" (Sophia U.) 15.Ryooya Okabe "On Flattening Categories in Categorial (Sophia U.) Grammar" 12:20 - 13:30 Lunch 13:30 - 14:50 Session 6 16.Cheng-Hui Liu "On Transitivity in Pre-Qin Chinese--The (Academia Sinica) Application of Computational Corpus in Historical Chinese Syntax" 17.Benjamin K T'sou "The Pragmatics of Bargaining in Chinese¡G A Computational Model" (City Polytechnic of H.K.) - 2 - 18.Ching-Yu Chen "Some Distributional Properties of Mandarin Shu-Fen Tseng Chinese -- A Study based on the Academia Keh-jiann Chen Sinica Coupus" Chu-Ren Huang (Academia Sinica) 19.Ruslan Mitkov "A Knowledge-Based and Sublanguage-Oriented (U. of Science Approach for Anaphora Resolution" Malaysia) 14:50 - 15:20 Tea Break 15:20 - 16:20 Session 7 20.Claire Chang "Complex Stative Construction:Resultative (Cheng Chi U.) or Descriptive?" 21.Shen-Min Chang "V+qi(lai) Compounds in Mandarin Chinese" (Tsing Hua U.) 22.Zhao-Ming Gao "On the Syntactic Structure of Evaluative Chu-Ren Huang V-qilai Construction in Chinese" Chih-Chen Jane Tang (Academia Sinica) 16:20 - 16:40 Tea Break 16:40 - 17:40 Invited Speaker Ting-Chi Tang "A 'Theta-Grid' approach to a Contrastive (Tsing Hua U.) Analysis of English, Chinese and Japanese" Alternate Papers: 1.One-Soon Her "LECS: An LFG-Based Unification Grammar Formalism for Natural Language Processing" 2.Yong Kui Zhang "Using Cluster Analysis for Processing English Texts" James R. Cowie 3.Hui-i Amy Kung "A Small-Clause Analysis for the Mandarin Double Object Construction" 4.Hui-Chuan Hsu "Imperative Forms in Sediq:A Perspective froms the Morphemic Plane Hypothesis" 5.Wen-Jen Wei "Quantifier Phrase Incorporation in Chinese" - 3 - Pacific Asia Conference On Formal and Computational Linguistics (PACFoCoL I) Taipei, August 30-31, 1993 Registration Form Name: Date of Birth: Gender: Country of Residence: Passport #: Institution: Address: E-mail Address: Tel: Fax number: Registration Fees for PACFoCoL: *Payment before 8-14-93 for those who register at the same time for ROCLING VI (see reverse side for additional ROCLING registration and hotel fees): ___ US$ 30 *Payment before 8/14 (for those attending PACFoCoL only): ___ US$ 50 PACFoCoL Conference Site: Academic Activity Center, Academia Sinica, Taipei Accommodations: Academic Activity Center, Academia Sinica Do you need our help in booking a room at Academic Activity Center? Yes _________ No __________ Single or Double Room? __________________ Check in Date _____________ Check out Date _____________ Daily charge for a single room is roughly US $25 and for a double room is US $32. Please inform us before August 14 if you need our help to book a room. You will need to pay for your hotel fees in Taiwan dollars before checking-out of the hotel. Please fill out this form, as well as the reverse side if you are also attending ROCLING VI and enclose a check of registration fees payable to the Computational Linguistics Society of R.O.C. and return it to the following address before 8/14/93: Miss Shu-Hui Tsai ROCLING Institute of Information Science Academia Sinica, Nankang Taipei 115, Taiwan, R.O.C. If you have any questions, please contact Miss Shu-Hui Tsai: Tel & Fax: 886-2-7881638 E-mail address: rocltsh@iis.sinica.edu.tw PLEASE SEE THE FOLLOWING PAGE FOR ROCLING VI REGISTRATION AND PAYMENT ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 2. 1993 ================================================================= 6:30 Get on bus at Taiwan U. 7:00¡@ Get on bus at Activity Center, Academia Sinica 8:30¡@ Get on bus at Tsing-Hun U. | Check in 12:00 Lunch 14:00 - 15:00 Session Chair: Chu-Ren Huang Invited Speaker "Software for Applied Semantics" John Nerbonne 15:00 - 15:30 Tea Break 15:30 - 16:30 Session Chair: Chun-Sheng Chang Invited Speaker "The Resource Logic of Complex Mary Dalrymple Predicate Interpretation" 16:30 - 17:00 Tea Break Session I Session Chair: Hsin-Hsi Chen 17:00 - 17:30 Ming-Yu Lin "A Preliminary Study On Unknown Tung-Hui Chiang Word Problem In Chinese Word Keh-Yih Su Segmentation" 17:30 - 18:30 Tsai-Yen Peng "A Study on Chinese Lexical Chun-Sheng Chang Ambiguity, Word Segmentation and Tagging" 18:30 Dinner 19:30 Board of Director's Meeting ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 3. 1993 ================================================================= 7:30 - 8:00 Breakfast 8:30 - 9:30 Session Chair: Keh-Jiann Chen Invited Speaker "Part of Speech Tagging and Kenneth Church Suggestions for the Future" 9:30 - 10:30 Session Chair: Keh-Yih Su Invited Speaker Mark Liberman T.B.A. 10:30 - 11:00 Tea Break Session II Session Chair: Hsiao-Chuan Wang 11:00 - 11:30 Hsing-Ming Wang "An Algorithm for Automatically Yuan-Chen Chang Selecting Phonetically Balanced Ling-Shan Li Mandarin Sentences from Chinese Corpus" 11:30 - 12:00 Sung-Chien Lin "A Study of Word-Class Bigram Li-Feng Chien Approach to Linguistic Decoding Keh-Jiann Chen in Mandarin Speech Recognition" Ling-Shan Li 12:00 Lunch Session III Session Chair: Hsi-Chien Li 13:30 - 14:00 Yu-Hsi Li "A Storage Reduction Method Hsien-Hsi Chen for Corpus-Based Language Models" 14:00 - 14:30 Ming-Wen Wu "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 14:30 - 15:00 Chao-Huang Chang "Automatic Clustering of Chinese Characters and Words 15:00 - 15:30 Tea Break 15:30 - 16:30 CKIP "Segmentation Criterion and Matching Tagset for Mandarin Chinese" 16:30 - 17:30 Business Meeting 18:00 Banquet ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 4. 1993 ================================================================= 7:30 - 8:00 Breakfast Session IV Session Chair: Feng-Wen Su 8:30 - 8:45 Zhibiao Wu "Developing a Chinese Module in Loke Soo Hsu UNITRAN" Martha Palmer Chew Lin Tan 8:45 - 9:00 Audy Wong Man Hon "Fawrmt:With Special Emphasis Suen Caesar Lun On Grammar Designs and Partitioned Parsing" 9:00 - 9:30 Yun-Yen Yang "A Study of Document Auto- Keh-Jiann Chen Classification in Mandarin Ching-Chun Hsieh Chinese" Shu-Mei Chen 9:30 - 10:00 Tea Break Session V Session Chair: Wei-Chuan Li 10:00 - 10:30 Hsin-Hsi Chen "A Probabilistic Chunker" Kuang-hua Chen 10:30 - 11:00 Shih-Ping Wang "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 11:00 - 11:15 Feng-Wen Su "Toward Discourse-guided Thetagrid Chart Parsing for Madarin Chinese--A Preliminary Report" 11:30 Lunch Return to Academia Sinica R.O.C. Computational Linguistics Conference VI ROCLING VI Hsitou National Park September 2-4, 1993 Theme: Computational Semantics/Corpus Linguistics Registration Form Name: Date of Birth: Gender: Country of Residence: Passport Number: Institute Affliation: Mailing Address: Email Address: Telephone: Fax Number: REGISTRATION FEES AND HOTEL FEES MUST BE PAID BY MAIL BEFORE AUGUST 14, 1993: Registration Fees: Member of ROC Comp. Ling. Society: ___ US $110 1 year membership fee: ___ US $50 (includes monthly newsletter and reduced fees on activities) Non-member: ___ US $140 Registration fees include round-trip transportation to and from Academia Sinica to conference site of Hsi-tou National Park, park entrance fees, conference proceedings, and all meals. THE BUS WILL LEAVE PROMPTLY AT 7AM ON 9/2 FROM THE FRONT DOOR OF THE ACTIVITY CENTER AT ACADEMIA SINICA. Hotels Fees for 2 nights (9/2 and 9/3) at Hsitou Park: Double Room: ___ US $50 per person (please specify name of person you wish to share with: _______________________________________) Single Room: ___ US $60 Total: PACFoCoL Registration (from reverse side) + ROCLING VI Registration + (ROCLING Membership Fee) + Hotel Fees for Hsitou Park: US $____ (check payment information on reverse side) PLEASE FILL OUT PACFoCoL REGISTRATION ON THE PRECEDING PAGE PACFOCOL I/ROCLING VI Optional Activities on Sept. 1, 1993 Morning Activities: A. 9:30-12:00 Visiting the Chinese Knowledge Information Processing (CKIP) Group of the Institute of Information Science at Academia Sinica. (Classical & Modern Chinese Corpora, Electronic Dictionary, Chinese Parser, etc.) B. 9:30-12:00 A guided tour of National Palace Museum in Taipei. Afternoon Activities: C. 3:00-5:00 Visiting the Behavior Design Corporation in Hsin- chu Science-based Industrial Park. (Demonstration of BDC EC MT System, D-Top Bilingual Desktop Publishing [W/ KWIC, Bilingual Dict.], Grammar Checker, On-line OCR, Personal Manager, etc.) Visitors will stay overnight in Hsin-chu and will board the bus going to ROCLING in Hsin-chu. D. 3:00-5:00 A guided tour of the Museum of the Institute of History and Philology, Academia Sinica. (Museum not usually open to the public and can be visited by appointment only.) Notes: 1. The above activities are free. However, participants are responsible for their own expenses, such as food, transportation and accommodation. 2. Spaces for the two visiting tours (A&C) are limited and will be allotted in the order of registration. 3. Unless indicated otherwise, we will book a hotel at Hsin-chu for the night of Sept.1 for participants of C. _____________________________________________________________ Please fill out the following form and return it with the registration form by August 14. I would like to attend the following activities [] one activity (please circle one) A, B, C, D [] two activities (please circle one) AC, AD, BC, BD Signature _______________________  From corpora-request@uib.no Tue Aug 3 10:19:03 1993 Date: Tue, 3 Aug 93 16:19:03 CST From: rocltsh@iis.sinica.edu.tw (ROCLING) To: corpora@hd.uib.no Subject: PACFoCoL/ROCLING Pacific Asia Conference on Formal and Computational Linguistics Program 1993/8/30 8:30 - 9:00 Registration 9:00 - 9:10 Opening Remarks 9:10 - 10:10 Invited Speaker John Nerbonne "Constraint-Based Semantics" (Groningen U.) 10:10 - 11:10 Invited Speaker Mary Dalrymple "Reciprocals and the Syntax-semantics (Xerox PARC) Interface" 11:10 - 11:30 Tea Break 11:30 - 12:30 Session 1 1.Chih-Chen Jane Tang "On the Distribution and Transportability (Academia Sinica) of Adjuncts in Chinese" 2.Kathleen Ahrens "Additional Evidence for LF: Wh-words in (U. C. San Diego) Mandarin Chinese" 3.Yu-Fang Wang "The Chinese NPI Renhe In Contexts With (Taiwan Normal U.) Negative Values" 12:30 - 13:30 Lunch 13:30 - 14:30 Session 2 4.Chunyu Kit "Description of Chinese intransitive verbs (Carnegie Mellon U.) and adjuncts within the LFG formalism" 5.Jong-Bok Kim "A Constraint-Based and Lexical Approach to (Stanford U.) Korean Verb Inflections" 6.Hongming Zhang "The Syntactic Condition of Taiwanese Tone (U. of Singapore) Sandhi" 14:30 - 14:50 Tea Break 14:50 - 15:50 Session 3 7.Yoichi Uetake "Two Formal Representations of The Thematic- (Tokyo U.) Rhematic Structure of Sentences" 8.Michiko Nakano "Cognitive Semantics and A Boolean Valued (Waseda U.) Model" 9.Paul Horng Jyh Wu "Toward Integrating Concept Hierarchies" (U. of Singapore) 15:50 - 16:10 Tea Break - 1 - 16:10 - 17:10 Session 4 10.Ruslan Mitkov "Automatic Abstracting in a Limited Domain" (U. of Science Malaysia) 11.H.C. Ho "Using Syntactic Markers and Semantic Frame Benjamin K. T'sou Knowledge Representation in Automatic Terence Y.W. Chan Chinese Text Abstraction Tom B.Y. Lai Suen Caesar Lun (City Polytechnic of H.K.) 12.Yuangshan Chuang "A Quantitative Corpus Analysis of Word (U. of Illinois) Frequency and Part of Speech in The English Textbooks Used in Senior High Schools in Taiwan" 18:00 Dinner 1993/8/31 9:00 - 10:00 Invited Speaker Mark Liberman "Is Syntax Hard to Learn?" (U. of Penn.) 10:00 - 11:00 Kenneth Church "Aligning Parallel Texts:Do Methods Develop ed (AT&T Bell Labs) for English-French Generalize to Asian Languages?" 11:00 - 11:20 Tea Break 11:20 - 12:20 Session 5 13.Hiroto Ohnishi "Intensional Contexts and Common Knowledge" (Toyo Women's U.) Seiki Akama (Teikyo U.) 14.Akira Ishikawa "Dynamic Temporal Reasoning in Japanese" (Sophia U.) 15.Ryooya Okabe "On Flattening Categories in Categorial (Sophia U.) Grammar" 12:20 - 13:30 Lunch 13:30 - 14:50 Session 6 16.Cheng-Hui Liu "On Transitivity in Pre-Qin Chinese--The (Academia Sinica) Application of Computational Corpus in Historical Chinese Syntax" 17.Benjamin K T'sou "The Pragmatics of Bargaining in Chinese¡G A Computational Model" (City Polytechnic of H.K.) - 2 - 18.Ching-Yu Chen "Some Distributional Properties of Mandarin Shu-Fen Tseng Chinese -- A Study based on the Academia Keh-jiann Chen Sinica Coupus" Chu-Ren Huang (Academia Sinica) 19.Ruslan Mitkov "A Knowledge-Based and Sublanguage-Oriented (U. of Science Approach for Anaphora Resolution" Malaysia) 14:50 - 15:20 Tea Break 15:20 - 16:20 Session 7 20.Claire Chang "Complex Stative Construction:Resultative (Cheng Chi U.) or Descriptive?" 21.Shen-Min Chang "V+qi(lai) Compounds in Mandarin Chinese" (Tsing Hua U.) 22.Zhao-Ming Gao "On the Syntactic Structure of Evaluative Chu-Ren Huang V-qilai Construction in Chinese" Chih-Chen Jane Tang (Academia Sinica) 16:20 - 16:40 Tea Break 16:40 - 17:40 Invited Speaker Ting-Chi Tang "A 'Theta-Grid' approach to a Contrastive (Tsing Hua U.) Analysis of English, Chinese and Japanese" Alternate Papers: 1.One-Soon Her "LECS: An LFG-Based Unification Grammar Formalism for Natural Language Processing" 2.Yong Kui Zhang "Using Cluster Analysis for Processing English Texts" James R. Cowie 3.Hui-i Amy Kung "A Small-Clause Analysis for the Mandarin Double Object Construction" 4.Hui-Chuan Hsu "Imperative Forms in Sediq:A Perspective froms the Morphemic Plane Hypothesis" 5.Wen-Jen Wei "Quantifier Phrase Incorporation in Chinese" - 3 - Pacific Asia Conference On Formal and Computational Linguistics (PACFoCoL I) Taipei, August 30-31, 1993 Registration Form Name: Date of Birth: Gender: Country of Residence: Passport #: Institution: Address: E-mail Address: Tel: Fax number: Registration Fees for PACFoCoL: *Payment before 8-14-93 for those who register at the same time for ROCLING VI (see reverse side for additional ROCLING registration and hotel fees): ___ US$ 30 *Payment before 8/14 (for those attending PACFoCoL only): ___ US$ 50 PACFoCoL Conference Site: Academic Activity Center, Academia Sinica, Taipei Accommodations: Academic Activity Center, Academia Sinica Do you need our help in booking a room at Academic Activity Center? Yes _________ No __________ Single or Double Room? __________________ Check in Date _____________ Check out Date _____________ Daily charge for a single room is roughly US $25 and for a double room is US $32. Please inform us before August 14 if you need our help to book a room. You will need to pay for your hotel fees in Taiwan dollars before checking-out of the hotel. Please fill out this form, as well as the reverse side if you are also attending ROCLING VI and enclose a check of registration fees payable to the Computational Linguistics Society of R.O.C. and return it to the following address before 8/14/93: Miss Shu-Hui Tsai ROCLING Institute of Information Science Academia Sinica, Nankang Taipei 115, Taiwan, R.O.C. If you have any questions, please contact Miss Shu-Hui Tsai: Tel & Fax: 886-2-7881638 E-mail address: rocltsh@iis.sinica.edu.tw PLEASE SEE THE FOLLOWING PAGE FOR ROCLING VI REGISTRATION AND PAYMENT ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 2. 1993 ================================================================= 6:30 Get on bus at Taiwan U. 7:00¡@ Get on bus at Activity Center, Academia Sinica 8:30¡@ Get on bus at Tsing-Hun U. | Check in 12:00 Lunch 14:00 - 15:00 Session Chair: Chu-Ren Huang Invited Speaker "Software for Applied Semantics" John Nerbonne 15:00 - 15:30 Tea Break 15:30 - 16:30 Session Chair: Chun-Sheng Chang Invited Speaker "The Resource Logic of Complex Mary Dalrymple Predicate Interpretation" 16:30 - 17:00 Tea Break Session I Session Chair: Hsin-Hsi Chen 17:00 - 17:30 Ming-Yu Lin "A Preliminary Study On Unknown Tung-Hui Chiang Word Problem In Chinese Word Keh-Yih Su Segmentation" 17:30 - 18:30 Tsai-Yen Peng "A Study on Chinese Lexical Chun-Sheng Chang Ambiguity, Word Segmentation and Tagging" 18:30 Dinner 19:30 Board of Director's Meeting ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 3. 1993 ================================================================= 7:30 - 8:00 Breakfast 8:30 - 9:30 Session Chair: Keh-Jiann Chen Invited Speaker "Part of Speech Tagging and Kenneth Church Suggestions for the Future" 9:30 - 10:30 Session Chair: Keh-Yih Su Invited Speaker Mark Liberman T.B.A. 10:30 - 11:00 Tea Break Session II Session Chair: Hsiao-Chuan Wang 11:00 - 11:30 Hsing-Ming Wang "An Algorithm for Automatically Yuan-Chen Chang Selecting Phonetically Balanced Ling-Shan Li Mandarin Sentences from Chinese Corpus" 11:30 - 12:00 Sung-Chien Lin "A Study of Word-Class Bigram Li-Feng Chien Approach to Linguistic Decoding Keh-Jiann Chen in Mandarin Speech Recognition" Ling-Shan Li 12:00 Lunch Session III Session Chair: Hsi-Chien Li 13:30 - 14:00 Yu-Hsi Li "A Storage Reduction Method Hsien-Hsi Chen for Corpus-Based Language Models" 14:00 - 14:30 Ming-Wen Wu "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 14:30 - 15:00 Chao-Huang Chang "Automatic Clustering of Chinese Characters and Words 15:00 - 15:30 Tea Break 15:30 - 16:30 CKIP "Segmentation Criterion and Matching Tagset for Mandarin Chinese" 16:30 - 17:30 Business Meeting 18:00 Banquet ¡@R.O.C. Computational Linguistics Conference VI (ROCLING VI) ¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@¡@ ¡@Program Sep 4. 1993 ================================================================= 7:30 - 8:00 Breakfast Session IV Session Chair: Feng-Wen Su 8:30 - 8:45 Zhibiao Wu "Developing a Chinese Module in Loke Soo Hsu UNITRAN" Martha Palmer Chew Lin Tan 8:45 - 9:00 Audy Wong Man Hon "Fawrmt:With Special Emphasis Suen Caesar Lun On Grammar Designs and Partitioned Parsing" 9:00 - 9:30 Yun-Yen Yang "A Study of Document Auto- Keh-Jiann Chen Classification in Mandarin Ching-Chun Hsieh Chinese" Shu-Mei Chen 9:30 - 10:00 Tea Break Session V Session Chair: Wei-Chuan Li 10:00 - 10:30 Hsin-Hsi Chen "A Probabilistic Chunker" Kuang-hua Chen 10:30 - 11:00 Shih-Ping Wang "Corpus-based Automatic Rule Keh-Yih Su Selection In Designing A Grammar Checker" 11:00 - 11:15 Feng-Wen Su "Toward Discourse-guided Thetagrid Chart Parsing for Madarin Chinese--A Preliminary Report" 11:30 Lunch Return to Academia Sinica R.O.C. Computational Linguistics Conference VI ROCLING VI Hsitou National Park September 2-4, 1993 Theme: Computational Semantics/Corpus Linguistics Registration Form Name: Date of Birth: Gender: Country of Residence: Passport Number: Institute Affliation: Mailing Address: Email Address: Telephone: Fax Number: REGISTRATION FEES AND HOTEL FEES MUST BE PAID BY MAIL BEFORE AUGUST 14, 1993: Registration Fees: Member of ROC Comp. Ling. Society: ___ US $110 1 year membership fee: ___ US $50 (includes monthly newsletter and reduced fees on activities) Non-member: ___ US $140 Registration fees include round-trip transportation to and from Academia Sinica to conference site of Hsi-tou National Park, park entrance fees, conference proceedings, and all meals. THE BUS WILL LEAVE PROMPTLY AT 7AM ON 9/2 FROM THE FRONT DOOR OF THE ACTIVITY CENTER AT ACADEMIA SINICA. Hotels Fees for 2 nights (9/2 and 9/3) at Hsitou Park: Double Room: ___ US $50 per person (please specify name of person you wish to share with: _______________________________________) Single Room: ___ US $60 Total: PACFoCoL Registration (from reverse side) + ROCLING VI Registration + (ROCLING Membership Fee) + Hotel Fees for Hsitou Park: US $____ (check payment information on reverse side) PLEASE FILL OUT PACFoCoL REGISTRATION ON THE PRECEDING PAGE PACFOCOL I/ROCLING VI Optional Activities on Sept. 1, 1993 Morning Activities: A. 9:30-12:00 Visiting the Chinese Knowledge Information Processing (CKIP) Group of the Institute of Information Science at Academia Sinica. (Classical & Modern Chinese Corpora, Electronic Dictionary, Chinese Parser, etc.) B. 9:30-12:00 A guided tour of National Palace Museum in Taipei. Afternoon Activities: C. 3:00-5:00 Visiting the Behavior Design Corporation in Hsin- chu Science-based Industrial Park. (Demonstration of BDC EC MT System, D-Top Bilingual Desktop Publishing [W/ KWIC, Bilingual Dict.], Grammar Checker, On-line OCR, Personal Manager, etc.) Visitors will stay overnight in Hsin-chu and will board the bus going to ROCLING in Hsin-chu. D. 3:00-5:00 A guided tour of the Museum of the Institute of History and Philology, Academia Sinica. (Museum not usually open to the public and can be visited by appointment only.) Notes: 1. The above activities are free. However, participants are responsible for their own expenses, such as food, transportation and accommodation. 2. Spaces for the two visiting tours (A&C) are limited and will be allotted in the order of registration. 3. Unless indicated otherwise, we will book a hotel at Hsin-chu for the night of Sept.1 for participants of C. _____________________________________________________________ Please fill out the following form and return it with the registration form by August 14. I would like to attend the following activities [] one activity (please circle one) A, B, C, D [] two activities (please circle one) AC, AD, BC, BD Signature _______________________  From corpora-request@uib.no Tue Aug 3 06:53:42 1993 Date: Tue, 3 Aug 1993 11:53:42 -0500 From: Yuangshan Chuang To: corpora@hd.uib.no, rocltsh@iis.sinica.edu.tw Subject: Re: PACFoCoL/ROCLING From corpora-request@uib.no Thu Aug 5 09:55:18 1993 Date: Thu, 05 Aug 93 09:43:14 MEZ From: Maria Strobel Subject: german speech database To: corpora@hd.uib.no Is there any german speech database which can be used for collecting samples of both isolated and connected words? I am especially interested in recorded telephone talks. Please reply to me directly and I'll post a summary to the list. Maria Strobel From corpora-request@uib.no Fri Aug 6 05:45:02 1993 Date: Fri, 6 Aug 93 11:45:02 CST From: lsc@speech.ee.ntu.edu.tw To: corpora@hd.uib.no Subject: Reguest Mandarin spoken language corpus Is there any Mandarin Spoken Language Corpus which can be used for collecting samples of Chinese daily used words ? I am interested in large corpus NLP; however, I has only some written language corpus. If you have any spoken language corpus, please reply me. Sung-Chien Lin e-mail address: lsc@speech.ee.ntu.edu.tw address: Rm. #529, Dept. of Elec. Eng. National Taiwan university 1, sec. 4, Roosevelt Rd., Taipei, Taiwan, 10764 R.O.C. From corpora-request@uib.no Mon Aug 9 04:56:12 1993 Date: Mon, 9 Aug 1993 09:56:12 -0500 From: Yuangshan Chuang To: corpora@hd.uib.no, rocltsh@iis.sinica.edu.tw, ycg9915@uxa.cso.uiuc.edu Subject: Re: PACFoCoL/ROCLING From corpora-request@uib.no Tue Aug 10 03:20:29 1993 From: rocltsh@iis.sinica.edu.tw Subject: Returned mail: User unknown (fwd) To: corpora@hd.uib.no Date: Tue, 10 Aug 93 9:11:31 EAT > Dear Sir, > We received two empty e-mails from the address ycg9915@uxa.cso. > uiuc.edu. We do not understand what's wrong with the communication. > If you can contact the master of the address, please let him know. > Thank you for your kindness. > ROCLING > From corpora-request@uib.no Tue Aug 10 08:48:27 1993 Date: Tue, 10 Aug 93 8:48:27 GMT From: Rainer Hoch To: corpora@hd.uib.no Subject: subscription subscribe corpora Rainer Hoch From corpora-request@uib.no Thu Aug 12 10:47:59 1993 Date: Thu, 12 Aug 93 10:43:04 MET DST From: Magnus Merkel To: corpora@hd.uib.no Subject: Design of Bilingual Corpora I am interested in any material regarding how to design a large bilingual corpora. We are just starting a project where we are going to build a parallell corpus of English-Swedish texts in the field of technical documentation. There is an article in The Journal of Literary and Linguistic Computing Vol. 7 No. 1, 1992 by Sue Atkins and Jeremy Clear entitled Corpus Design Criteria, where they have a thorough discussion about setting up corpora in general, but not how to deal with parallell (bilingual) corpora. Any hints? Magnus Merkel Dept. of Computer and Information Science Linkoping University, Sweden From corpora-request@uib.no Mon Aug 12 05:55:12 1993 Date: 12 Aug 1993 10:55:12 -0500 (CDT) From: B217RMJ@UTARLG.UTA.EDU Subject: 1974 LSA To: corpora@hd.uib.no Does anyone out there have any idea how I could get a copy of an unpublished pap er of the 1974 Summer LSA Meeting. The paper was entitled something like "A Functional Affinity between 'IF' clauses and relatives on generic head" Anything on the function of relatives would be appreciated. Thanks e-mail address B217RMJ@utarlg.uta.edu From corpora-request@uib.no Fri Aug 13 14:17:37 1993 Date: Fri, 13 Aug 1993 13:17:37 +0100 From: S.J.Yates@open.ac.uk (Simeon J. Yates) To: corpora@hd.uib.no Subject: Software... In one sense this is not a strict corpus linguisitc request but I could think of few other places to ask the question. I will shortly begin work at a communications studies institute and we will be dealing with a number of vary large corpora of video taped material (i.e. news reports etc.) Does anyone know of ethnographic, or similar, software, running on any system, PC, Apple, Sun, VAX that could cope with digitised video images? I realise that many Mac Hypercard based systems could but not to the level I might require? If anyone can think of a better list or someone to contact please let me know. Simeon Yates CITE IET Open University Walton Hall Milton Keynes. England From corpora-request@uib.no Sat Aug 14 07:12:00 1993 Date: Fri, 13 Aug 1993 23:12 +0800 From: B096770@vax.csc.cuhk.hk Subject: US parser 'challenge' To: corpora@hd.uib.no At a recent conference on corpus linguistics in Spain (early Aug) Ed Hovy mentioned that there's some kind of yearly challenge for (syntactic I think ) parsers. Apparently 'contestants' get a series of setnences or whatever in a sealed brown paper envelpe (or equivalent) and they have to submit their machine parses. The winner this year was apparently a parser which parses on the basis of a simple two-word window. Does anyone have any information on the 'contest' etc, and especially this year's 'winner'? Failing that, does anyone have Ed Hovy's email number, or was anyone at the conference and knows the references etc? Regards Dave Coniam Chinese University of Hong Kong email: b096770@cucsc.bitnet From corpora-request@uib.no Fri Aug 13 19:24:00 1993 Date: Fri, 13 Aug 93 19:24 GMT From: ENG0997@v2.qub.ac.uk To: CORPORA@HD.UIB.NO Subject: Scottish Data Scottish Data For a paper I've to give next month, I'm trying to compile as complete a list as possible of all machine-readable data from Scotland that exists, no matter what purpose it was originally gathered. I'm thinking especially of transcriptions of Scottish speakers including digitisations, no matter what their Scottish accent or whether you would call it Scots, English or anything in between. I'm also thinking of written originating in scotland by Scottish people, not necessarily fictional writing, but any kind of writing. If you've compiled anything of this sort - or know of any such compilations, please let me know. Please assume I don't know about so that you will get in touch! I will be most grateful and your assistance will be gratefully appreciated. I look forward to hearing from whoever ... With thanks again, John Kirk The Queen's University of Belfast Email: ENG0997@QUB.AC.UK From corpora-request@uib.no Fri Aug 13 05:56:04 1993 Date: Fri, 13 Aug 93 12:56:04 -0700 From: edwards@cogsci.Berkeley.EDU (Jane A. Edwards) To: S.J.Yates@open.ac.uk, corpora@hd.uib.no Subject: Re: Software for videotape-transcript analysis You might be interested in Lois Bloom's system (Columbia University, Teachers' College) - devised for an Apple for microanalysis of the interplay of gesture and spoken language in child language. It is described in a chapter in Edwards & Lampert (1993) "Talking Data: Transcription and Coding in Discourse Research" or contact her directly at: lmbloom@cutcv2.bitnet -Jane Edwards From corpora-request@uib.no Sat Aug 14 08:07:46 1993 Date: Sat, 14 Aug 1993 13:59:54 HKT From: "lcjohn@usthk.ust.hk" Subject: Re: Software... To: corpora@hd.uib.no Simeon Yates asked about software to cope with digitised video images. You might try GAINMOMENTUM. I've just heard of it, so can't tell you much: it's in the category of multimedia database systems. It's VERY expensive and may be more than you need. Published by SYBASE: Sybase, Inc. 1870 Embarcadero Road, Palo Alto, California 94303-1108 ph (415) 813-1800 / (800) 232-4246 From corpora-request@uib.no Wed Aug 18 13:09:14 1993 From: Steve Fligelstone Date: Wed, 18 Aug 93 12:09:14 +0100 To: corpora@hd.uib.no Subject: TALC94 Call for papers CALL FOR PAPERS TEACHING AND LANGUAGE CORPORA 94 Lancaster University 11 - 13th April 1994 AIMS OF THE CONFERENCE While the use of computer text corpora in research is well established, they are now being used increasingly for teaching purposes. This includes the use of corpus data to inform and create teaching materials; it also includes the direct exploration of corpora by students, both in the study of linguistics and in the study of foreign languages. We would like to bring together researchers and teachers who are involved in such work in order to encourage an international exchange of experience and expertise. We intend to keep costs to a minimum, and hope to be able to charge attendees no more than #{Sterling}50 a day, including accommodation and food. Papers are invited on the following topics: * the uses of corpora in the teaching of linguistics * the uses of corpora in the teaching of foreign languages * software for the use of corpora in teaching * corpus annotation * issues concerning funding and resourcing * availability of corpora * necessary skills (teacher and learner) for exploiting corpora * applications of spoken corpora * computational linguistics By "corpora" we mean: * corpora of written and spoken language * multi-lingual (e.g. parallel or translation) corpora FORMAT FOR SUBMISSION The conference will be composed of oral and poster presentations, as well as some discussion groups. A paper should last for 30 minutes including questions, and a poster presentation may consist of up to eight sheets of A3, including the title page. If you wish to offer a presentation of either kind, please submit an abstract by no later than 30th September 1993. Abstracts should be no more than 300 words long. It would also help us if you would complete the attached reply form and return it as soon as possible. All papers accepted for the conference will be reviewed and considered for the conference proceedings, which we intend to publish. Any papers selected for publication should then be submitted in machine-readable form, either ASCII or word- processor (preferably Word or WordPerfect). Email submission of abstracts are encouraged. ADDRESS FOR SUBMISSIONS Surface Mail: TALC 94, Department of Linguistics, Lancaster University, Bailrigg, Lancaster, LA1 4YT, U.K. E-mail: talc94@uk.ac.lancaster _____________________________________________________________________ REPLY FORM * Name: * Address: * I shall/shall not be attending the TALC94 Conference * I would like to present a paper, with the following (provisional) title: * I would like to display a poster presentation with the following (provisional) title: _____________________________________________________________________ From corpora-request@uib.no Sun Aug 18 12:00:52 1993 To: CORPORA@hd.uib.no From: JARDEHAL@human.gla.ac.uk Date: 18 Aug 93 12:00:52 GMT Subject: The British Press I'm interested in any information on the availability IN A MACHINE- READABLE FORM of the following materials: 1. any 1960 or 1961 issue (or parts thereof) of any British "quality" national newspaper (e.g. The Times or The Guardian) 2. any British radio or TV news or current affairs broadcast (or parts thereof) in the periods 1960-61 and 1990-to date. Jamal Ardehali, University of Glasgow. From corpora-request@uib.no Sun Aug 18 12:31:13 1993 To: CORPLST@hd.uib.no From: JARDEHAL@human.gla.ac.uk Date: 18 Aug 93 12:31:13 GMT Subject: The British Press I'm interested in any information on the availability IN A MACHINE-READABLE FORM of the following materials: 1. any 1960 or 1961 issue (or parts thereof) of any British "quality" national newspaper (e.g. The Times or The Guardian) 2. any British radio or TV news or current affairs broadcast (or parts thereof) in the periods 1960-61 and 1990-to date. Jamal Ardehali, University of Glasgow.