Return-Path: Received: from CS.CMU.EDU by A.GP.CS.CMU.EDU id aa15247; 14 Oct 93 20:08:41 EDT Received: from Csli.Stanford.EDU by CS.CMU.EDU id aa19315; 14 Oct 93 20:08:13 EDT Received: from localhost.Stanford.EDU by CSLI.Stanford.EDU (4.1/25-CSLI-eef) id AA29810; Thu, 14 Oct 93 15:18:11 PDT Message-Id: <9310142218.AA29810@CSLI.Stanford.EDU> From: asmeaton@compapp.dcu.ie (by way of yarowsky@unagi.cis.upenn.edu) Subject: TREC-2 report and TREC-3 call for participation To: empiricists@CSLI.Stanford.EDU Date: Thu, 14 Oct 1993 15:18:10 -0700 Sender: roscheis@CSLI.Stanford.EDU Report on TREC-2 (Text REtrieval Conference) 30 August - 2 September, Gaithersburg, USA by The TREC-2 Program Committee INTRODUCTION As part of an effort to encourage research into text retrieval from large and diverse document collections, the first Text REtrieval Conference (TREC-1) was held in Gaithersburg, Md., in 1992. This forum provided researchers with a large collection of textual materials, queries and associated relevance judgements, and a uniform scoring procedure. The conference, co-sponsored by the U.S. Advanced Research Projects Agency (ARPA) and the U.S. National Institute for Standards and Technology (NIST), was a benchmarking exercise which involved gauging the relative effectivenesses of many different approaches to the indexing and retrieval of large volumes of text. A second conference/workshop (TREC-2) was held in early September 1993 and was the culmination of the experimental runs carried out at over 31 sites where information retrieval research is carried out across the world. A call for participation in TREC-2 was drafted and circulated (mostly electronically) during November and December 1992 with a closing date of intent to participate of December 5th, 1992. There were a total of 39 groups who submitted an initial request to participate. The program committee divided that group into 20 full participants and the remainder were offered participation in the benchmarking but with a poster rather than a paper presentation at the workshop. Of the 20 full participants selected, 19 made presentations, plus 4 presentations from TIPSTER groups (University of Massachusetts, Syracuse University, HNC and BBN), some of whom also partook in the TREC-2 benchmarking. There were 5 posters at the workshop representing groups who had also run the TREC-2 benchmark. The 19 full TREC-2 participants and TIPSTER groups with their respective approaches to information retrieval were (U.S. unless stated) Bellcore - SMART document preprocessing and Latent Semantic Indexing Carnegie Mellon University - NLP-based indexing CITRI, Royal Melbourne Institute of Technology (Australia) - document structure and efficiency issues City University, London (UK) - variant of probabilistic model and probabilistic weighting functions Cornell University - Vector Space Model/SMART system Environment Research Institute of Michigan - n-gram indexing/retrieval GE Research and Development Center - building complex boolean queries HNC - TIPSTER group learning a reduced dimensionality index space Institute for Decision System Research - Bayesian networks New York University - NLP-based indexing by word pairs Queens College (CUNY) - variant of probabilistic model using PIRCS system and spreading activation Rutgers University - combination of results of different retrieval strategies Siemens Corporate Research Inc. - SMART retrieval & query expansion using WordNet Swiss Federal Institute of Technology (ETH) (Switzerland) - efficient implementation of RSV metric Thinking Machines Corporation - various vector space model experiments, concerned with efficiency of execution TRW Systems Development Division- hardware filtering Universitaet Dortmund (Germany) & Cornell University - variant of probabilistic model with learning of parameter values University of California, Berkeley - variant of probabilistic model with logistic regression for probability estimates University of Massachusetts - TIPSTER group using a Bayesian Inference Network approach Verity, Inc. - machine learning for TOPIC IR system VPI&SU - combining results of multiple searches Poster groups included UCLA, ConQuest Software, Mead Data Central, PRC, University of Central Florida, University of Illinois at Chicago, Systems Environment Corporation, Advanced Decision Systems and Dalhousie University. This list of participants represents a good mix of academic and industrial interest in the project and more importantly, a good mix of approaches to indexing and retrieval. The TREC experiment reaches a community of information retrieval researchers and developers and provides an exploratory benchmark for information retrieval techniques but it is important to realise from the start that TREC-2, like the first TREC in 1992, was not a competition. Indeed there are so many variables in running the experiment and so many caveats about the evaluation methodologies used, that it is very difficult to compare even two systems directly and impossible to come up with a "ranking" of approaches. Most participants continued to develop their approaches and refine their systems after the official deadline for submission of results and most achieved further improvements in retrieval effectiveness to present at the workshop. Although comparisons across systems are very difficult to interpret, experiments within a given system are quite informative and many groups reported results of indexing and retrieval experiments conducted against their own baseline performance. Only when the official proceedings are published by NIST in Spring 1994 will we have a stable picture of the performances of different systems. LOGISTICS As with the first TREC, participants in TREC-2 worked with approximately one million documents (2 gigabytes of text data), retrieving lists of documents that could be considered relevant to each of 50 topics in what was called "ad hoc" querying. A second information retrieval paradigm used was where 50 retrieval topics were known in advance and new documents were to be matched against the 50 standard queries simulating a "routing" operation. In both cases the queries were not really queries at all but carefully honed user need statements and were thus extensive descriptions of the topic of interest. Participating groups were allowed to do completely automatic query construction, manual query formulation or to simulate relevance feedback. The test data used consisted of newspaper stories (Wall Street Journal and San Jose Mercury News), Associated Press Newswire articles, U.S. patent applications and articles from the Federal Register, the Ziff database and the U.S. Department of the Environment, all in all, a deliberately heterogeneous mix of document types and document lengths. The test data was distributed by NIST and was installed by the participants at their research sites, in addition to some test topics and relevance assessments. Participating groups fine-tuned their retrieval strategies and were then sent the new topics for ad-hoc querying and 1 gigabyte of new test data for the pre-defined routing queries. The ranking results from each site were then sent back to NIST who pooled together the rankings and had teams of assessors manually evaluate the relevance of each document appearing in the top 100 documents from at least one site, for each of the 50 ad hoc and 50 routing queries. A total of 41 different ad hoc runs (from 25 groups) and 40 different routing runs (from 23 groups) were pooled to generate the set for manual relevance assessment. As was expected, different systems retrieved different sets of documents in their top rankings but there was a much higher overlap in retrieved document sets as compared with the first TREC, possibly because the systems in TREC-2 were better. Each participant in TREC-2 set their own baseline effectiveness levels using the trial queries and relevance assessments provided at the start of TREC-2 and then improved upon, or deteriorated their relative effectivenesses on the official runs. No relevance assessments were available at the time the official runs were being completed and there was little time given in which to complete these runs, so there wasn't much tinkering that could be done in the time allowed. This ensured that no system had an unfair advantage over any others. EVALUATION The issue of evaluation has always been one of debate in information retrieval and within TREC there is scope for even more discussion than normal. For the "official" results submitted by each group, NIST calculated a range of statistical performance figures including averaged recall-precision, recall-fallout, and precision figures at 5,10,15,20,30,100,200,500 and 1000 documents. A major improvement in the evaluation of TREC-2 over the first TREC is the fact that the top 1000 ranking and not just top 200 documents per topic were submitted by the groups. The way in which evaluation figures were calculated and averaged were also improved upon. There is a real problem with using the standard measures for information retrieval evaluation on something like TREC; looking at averages of averages is very superficial and hides most of what is actually going on with respect to performance. In TREC-2 it was possible to do some failure analysis on the data before the workshop and this showed some interesting features like the fact that "long" documents were being retrieved by most approaches but were not proving relevant and that systems which yielded poor levels of precision averaged over 50 topics actually did well, often best, for some of those 50 topics ! In fact there were 21 groups for which there was at least one topic on which their system had the best average precision. The message to be found here is that there is much work to be done on the data generated by different retrieval approaches to try and explain some of the results. THE WORKSHOP The TREC-2 workshop in September 1993 was open only to participating systems and government sponsors and was even more open and sharing and workshop-like than TREC-1. Each participant presented an overview of their system and the performance as measured using the evaluation methods outlined earlier were available for all official runs for all systems. This meant that there were many results being presented and there was an element of information overload in trying to digest and assimilate so much raw data. With most participants performing experimental runs and presenting results obtained literally a couple of days before the workshop this was really leading edge stuff. There are many different approaches to information retrieval represented among the TREC-2 participants grouped roughly into probabilistic models and variants thereof, vector space approaches, NLP- based, bayesian networks, query expansion and dimensionality reduction, boolean query construction, combination of results of different retrieval strategies, explorations into document structuring, ... as well as some outliers like retrieval using n-grams, word pairs, hardware approaches, and some work on efficiency issues. Generalising results across systems and across approaches is difficult but some trends have already emerged. Simple systems which do simple things are still doing really well and the more complex ones are catching up and in some cases surpassing the simple approaches. This result was expected after the first TREC where simple systems did well and more complex ones generally did not. Term weighting has also emerged as something which counts as especially important. There is also a large spread of levels of effectiveness among systems. An irritating aspect of the way information retrieval evaluation using the standard measures is that this does not show the full power of the systems; to say that system retrieves 13 relevant documents in its top 20 ranked set from a collection of 1,000,000 means that that system really is a good computational tool to have, but recall and precision values hide this fact. The inclusion of recall-fallout tables addresses this somewhat but these still belie the fact that the information retrieval techniques available are really quite good. The overall measurements in TREC-2 show an improvement in effectiveness over the first TREC and whereas some of this is due to the fact that ranking was done to top 1000 and not to top 200, it may also be due to the systems being better. This could be because systems in TREC-2 are more fine tuned than before as most TREC-2 participants were also in TREC-1 and they would have thus been able to anticipate, if not always fully overcome, the engineering problems entailed when wrestling with 2 or 3 Gbytes of text and the associated indexes, etc. TREC-2 seemed to have less problems with engineering the volume of the data than before, probably because most groups had been through it before. >From the outset, efficiency issues were never foremost in TREC which is benchmarking retrieval effectiveness and is not directly concerned with the engineering aspects of large IR systems. Some of the figures for indexing and retrieval operations show a very large range of equipment and performance times, varying from retrieval from 2 gbytes in less than 5 seconds where the entire inverted file is held in main memory, to retrieval for a single query measured in hours of CPU and elapsed time implemented on a PC which decompresses and scans the text as it is being read from the CD-ROM. The message here is that you don't need massive computing resources to take part in TREC ... it helps and it makes things easier, but it is not mandatory. In fact, as described in the accompanying call for TREC-3, there is a category of participation within TREC for computationally intensive approaches which allows a group to use only a subset of the entire collection. Finally, TREC-2 did not have as much work on sub-document retrieval as was expected. This may be due to the fact that relevance judgements are dichotomous and do not indicate which PART of a document is relevant, as in most IR test collections. This is something which is being looked at in the next TREC. WIND UP At the start of this report we cautioned about making comparisons between systems and approaches which took part in TREC because of the number of variables involved. This then begs the question of why bother doing TREC if the results of different systems cannot be compared ? The answer lies in the objective of the TREC initiative, which were defined by Donna Harman as 1. to increase research in information retrieval carried out on large- scale test collections 2. to provide a forum for communication among academic, industrial and other interested parties 3. to foster the transfer of technology between research laboratories and commercial products 4. to present a state of the art showcase of retrieval methods Certainly the first and last of these goals have been achieved; the second goal looks like having been accomplished and as for the third, only time will tell. Direct comparisons between systems and approaches taken in TREC are extremely dodgy and only broad stroke statements about effectiveness as made in this report, can be made. So what happens next in TREC ? A call for participation is already available for TREC-3 (attached) with deadline of proposals for participation due December 1st, 1993. Data will be distributed in January 1994 with results of retrieval runs due August 1st, and the workshop scheduled for early November. In addition to the English texts there will also be texts in Spanish with queries and relevance assessments. A subset of the topics will be much narrower than in previous TRECs and there will be more emphasis next time on user interfaces and issues of query formulation, user models etc. At this stage TREC has got its own momentum and is having an effect on how information retrieval research is carried out. We can only expect its impact on information retrieval to grow even more in the future. The full TREC program committee is: Donna Harman, NIST, (chair); Chris Buckley, Cornell University; Susan Dumais, Bellcore; Darryl Howard, U.S. Department of Defense; David Lewis, AT & T Bell Labs; Matt Mettler, TRW; John Prange, U.S. Department of Defense; Alan Smeaton, Dublin City University, Ireland; Richard Tong, Advanced Decision Systems; Steve Walker, City University, UK; Karen Spark Jones (for TREC-3), Cambridge University, UK; ------------------------------------------------------------------------------- CALL FOR PARTICIPATION TEXT RETRIEVAL CONFERENCE January 1994 - November 1994 Conducted by: National Institute of Standards and Technology (NIST) Sponsored by: Advanced Research Projects Agency Software and Intelligent Systems Technology Office (ARPA/SISTO) A new conference for examination of text retrieval methodologies (TREC) was held in November 1992 at Gaithersburg, Md. The goal of this conference was to encourage research in text retrieval from large document collections by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. Both ad-hoc queries against archival data collections and routing (filtering or dissemination) queries against incoming data streams were tested. The conference was a workshop open only to the 24 participating systems and government sponsors; however, the proceedings were published by NIST in the spring of 1993. A second workshop (TREC-2) was held in September 1993, with 31 participating systems, and proceedings to be published in the spring of 1994. This announcement serves as a call for participation from groups interested in working in the third year of this workshop (TREC-3). Participants will be expected to work with approximately million documents (2 gigabytes of data), retrieving lists of documents that could be considered relevant to each of 100 topics (50 routing and 50 adhoc topics). NIST will distribute the data and will collect and analyze the results. As before, the workshop will be open only to participating systems and government sponsors. Because of government cutbacks, there will be no financial support this year for participants. Schedule: Dec. 1, 1993 -- deadline for participation applications Jan. 1, 1994 -- acceptances announced, and training data distributed to new participants (including 3 CD-ROMS containing about 3 gigabytes of data, and 150 training topics and relevance judgments) June 1, 1994 -- Test gigabyte of data distributed via CD-ROM, after routing queries received at NIST July 1, 1994 -- 50 new test topics distributed Aug. 1, 1994 -- results from 50 routing queries and 50 test topics due at NIST Oct. 1, 1994 -- relevance judgments and individual evaluation scores due back to participants Nov. 2-4 -- TREC-3 conference at NIST in Gaithersburg, Md. Task Description: Participants will receive 3 gigabytes of data to use for training of their systems, including development of appropriate algorithms or knowledge bases. The 150 topics used in the first two TREC workshops, and the relevance judgments for these topics will also be sent. The topics are in the form of a highly-formatted user need statement (see attachment 1). Queries can either be constructed automatically from this topic description, or can be manually constructed. Two types of retrieval operations will be tested: a routing or filtering operation against new data, and an ad-hoc query operation against archival data. Fifty of the topics (numbers 101-150) initially distributed as training topics will be used by each participating group to create formalized routing or filtering queries to be used for retrieval against a new test gigabyte of data (disk 4). Fifty new test topics (151-200) will be used against 2 gigabytes of the training data (disks 2 and 3) as ad-hoc queries. Results from both types of queries (routing and ad-hoc) will be submitted to NIST as the top 1000 documents retrieved for each query. Participants creating queries both automatically and manually may submit both sets for evaluation. Scoring techniques including traditional recall/precision measures will be run for all systems and individual results will be returned to each participant. Conference Format: The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Additionally some organizations may not wish to describe their proprietary algorithms, and these groups may chose to participate in a different manner (see Category C). To allow a maximum number of participants, the following three categories have been established. Category A: Full participation Participants will be expected to work with the full data set, and to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session. In addition to algorithms and experiments, some information on time and effort statistics should be provided. This includes time for data preparation (such as indexing, building a manual thesaurus, building a knowledge base), time for construction of manual queries, query execution time, etc. More details on the desired content of the presentation will be provided later. Category B: Exploratory groups Because small groups with novel retrieval techniques might like to participate but may have limited research resources, a category has been set up to work with only a subset of the data. This subset will consist of about 1/2 gigabyte of training data (and all training topics), and 1/4 gigabyte of test data (and all test topics). Participants in this category will be expected to follow the same schedule as category A, except with less data, and will be expected to present full details of system algorithms, experiments, and time and effort statistics either in a poster session or in a talk. Category C: Evaluation only Participants in this category will be expected to work on the full data set, submit results for common scoring and tabulation, and present their results in a poster session, including the time and effort statistics described in Category A. They will not be expected to describe their systems in detail. Data (Test Collection): The test collection (documents, topics, and relevance judgments) will be an extension of the collection (English only) used for the ARPA TIPSTER project. The collection is being assembled from Linguistic Data Consortium text, and a LDC User Agreement will be required from all participants. The documents are an assorted collection of newspapers (including the Wall Street Journal), newswires, journals, technical abstracts and email newsgroups. The test set will be of approximately the same composition as the training set, and all documents will be typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The format of the documents is relatively clean and easy-to-use as is (see attachment 2). Most of the documents will consist of a text section only, with no titles or other categories. The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology. Response format and submission details By Dec. 1, 1993 organizations wishing to participate should respond to the call for participation by submitting a summary of their text retrieval approach and a system architecture description, not to exceed five pages in total. The summary should include the strengths and significance of their approach to text retrieval, and highlight differences between their approach and other retrieval approaches. Each organization should indicate in which category they wish to participate. Please indicate clearly the persons responsible for the summary statement and to whom correspondence should be directed. A full regular address, telephone number, and an email address should be given. EMAIL IS THE PREFERRED METHOD OF COMMUNICATION, although it is realized that diagrams and figures will need to be sent by regular mail or FAX. It is expected that ALL participants have some access to email, as conference communications will be done via email. It is highly likely that some Spanish text and topics (approximately a 1/4 gigabyte of text and 25 topics) will also be available for retrieval tests. If your organization is interested in trying Spanish (in addition to English), please state this and indicate the availability of at least one person who can read Spanish. All responses should be submitted by Dec. 1, 1993 to the Program Chair, Donna Harman: harman@magi.ncsl.nist.gov or Donna Harman, NIST, Building 225/A216, Gaithersburg, Md. 20899 FAX: 301-975-2128 AS NOTED ABOVE, EMAIL IS THE DESIRED FORM OF COMMUNICATION. ***************************************************************************** Any questions about conference participation, response format, etc. should also be sent to the same address. Selection of participants: As the goal of TREC is to further research in large-scale text retrieval, the program committee will be looking for as wide a range of text retrieval approaches as possible, and will select the best representatives of these approaches as participants for categories A and B. Category C participants must be able to demonstrate their ability to work with the full data collection. The program committee has been chosen from a broad range of information retrieval researchers and government users, and will both select the participants and provide guidance in the planning of the conference. Program Committee Donna Harman, NIST, chair Chris Buckley, Cornell University Susan Dumais, Bellcore Darryl Howard, U.S. Department of Defense David Lewis, AT & T Bell Labs Matt Mettler, TRW John Prange, U.S. Department of Defense Alan Smeaton, Dublin City University, Ireland Karen Sparck Jones, Cambridge University Richard Tong, Advanced Decision Systems Steve Walker, City University, London ---------------------------------------------------------------------------- Attachment 1 -- Sample Topic Tipster Topic Description Number: 028 Domain:Science and Technology Topic: AT&T's Technical Efforts <desc> Description: Document must describe AT&T's technical efforts in computers and communications. <narr> Narrative: To be relevant, a document must contain information on American Telephone and Telegraph's (AT&T) technical efforts in computers and communications. Examples of relevant subject matter would include: product announcements, releases or cancellations, and discussion of AT&T Bell Labs research. Documents focusing either AT&T's efforts to buy other computer companies or AT&T's legal battles with other organizations, or AT&T's Unix operating system are NOT relevant. For the purposes of this topic the Regional Bell Operating Companies, (RBOC's) or the "Baby Bells" are not considered AT&T. <con> Concept(s): 1. AT&T, American Telephone and Telegraph 2. 3B-2 minicomputer, AT&T 386 PC 3. AT&T Starlan 4. PBX, 5. Product announcements, product releases </top> ------------------------------------------------------------------------------- Attachment 2 -- Sample Document (abridged) <DOC> <DOCNO> WSJ880406-0090 </DOCNO> <HL> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </HL> <AUTHOR> Janet Guyon (WSJ Staff) </AUTHOR> <SO> </SO> <CO> T </CO> <IN> TEL </IN> <DATELINE> NEW YORK </DATELINE> <TEXT> American Telephone & Telegraph Co. introduced the first of a new generation of phone services with broad implications for computer and communications equipment markets. AT&T said it is the first national long-distance carrier to announce prices for specific services under a world-wide standardization plan to upgrade phone networks. By announcing commercial services under the plan, which the industry calls the Integrated Services Digital Network, AT&T will influence evolving communications standards to its advantage, consultants said, just as International Business Machines Corp. has created de facto computer standards favoring its products. . . </TEXT> </DOC> Date: Thu, 7 Oct 93 17:18 EDT From: lewis@research.att.com (David Lewis) To: nl-kr@cs.rpi.edu Subject: CFP: Text Retrieval Conference/Dataset/Evaluation (TREC-3) CALL FOR PARTICIPATION TEXT RETRIEVAL CONFERENCE January 1994 - November 1994 Conducted by: National Institute of Standards and Technology (NIST) Sponsored by: Advanced Research Projects Agency Software and Intelligent Systems Technology Office (ARPA/SISTO) A new conference for examination of text retrieval methodologies (TREC) was held in November 1992 at Gaithersburg, Md. The goal of this conference was to encourage research in text retrieval from large document collections by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. Both ad-hoc queries against archival data collections and routing (filtering or dissemination) queries against incoming data streams were tested. The conference was a workshop open only to the 24 participating systems and government sponsors; however, the proceedings were published by NIST in the spring of 1993. A second workshop (TREC-2) was held in September 1993, with 31 participating systems, and proceedings to be published in the spring of 1994. This announcement serves as a call for participation from groups interested in working in the third year of this workshop (TREC-3). Participants will be expected to work with approximately million documents (2 gigabytes of data), retrieving lists of documents that could be considered relevant to each of 100 topics (50 routing and 50 adhoc topics). NIST will distribute the data and will collect and analyze the results. As before, the workshop will be open only to participating systems and government sponsors. Because of government cutbacks, there will be no financial support this year for participants. Schedule: Dec. 1, 1993 -- deadline for participation applications Jan. 1, 1994 -- acceptances announced, and training data distributed to new participants (including 3 CD-ROMS containing about 3 gigabytes of data, and 150 training topics and relevance judgments) June 1, 1994 -- Test gigabyte of data distributed via CD-ROM, after routing queries received at NIST July 1, 1994 -- 50 new test topics distributed Aug. 1, 1994 -- results from 50 routing queries and 50 test topics due at NIST Oct. 1, 1994 -- relevance judgments and individual evaluation scores due back to participants Nov. 2-4 -- TREC-3 conference at NIST in Gaithersburg, Md. Task Description: Participants will receive 3 gigabytes of data to use for training of their systems, including development of appropriate algorithms or knowledge bases. The 150 topics used in the first two TREC workshops, and the relevance judgments for these topics will also be sent. The topics are in the form of a highly-formatted user need statement (see attachment 1). Queries can either be constructed automatically from this topic description, or can be manually constructed. Two types of retrieval operations will be tested: a routing or filtering operation against new data, and an ad-hoc query operation against archival data. Fifty of the topics (numbers 101-150) initially distributed as training topics will be used by each participating group to create formalized routing or filtering queries to be used for retrieval against a new test gigabyte of data (disk 4). Fifty new test topics (151-200) will be used against 2 gigabytes of the training data (disks 2 and 3) as ad-hoc queries. Results from both types of queries (routing and ad-hoc) will be submitted to NIST as the top 1000 documents retrieved for each query. Participants creating queries both automatically and manually may submit both sets for evaluation. Scoring techniques including traditional recall/precision measures will be run for all systems and individual results will be returned to each participant. Conference Format: The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Additionally some organizations may not wish to describe their proprietary algorithms, and these groups may chose to participate in a different manner (see Category C). To allow a maximum number of participants, the following three categories have been established. Category A: Full participation Participants will be expected to work with the full data set, and to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session. In addition to algorithms and experiments, some information on time and effort statistics should be provided. This includes time for data preparation (such as indexing, building a manual thesaurus, building a knowledge base), time for construction of manual queries, query execution time, etc. More details on the desired content of the presentation will be provided later. Category B: Exploratory groups Because small groups with novel retrieval techniques might like to participate but may have limited research resources, a category has been set up to work with only a subset of the data. This subset will consist of about 1/2 gigabyte of training data (and all training topics), and 1/4 gigabyte of test data (and all test topics). Participants in this category will be expected to follow the same schedule as category A, except with less data, and will be expected to present full details of system algorithms, experiments, and time and effort statistics either in a poster session or in a talk. Category C: Evaluation only Participants in this category will be expected to work on the full data set, submit results for common scoring and tabulation, and present their results in a poster session, including the time and effort statistics described in Category A. They will not be expected to describe their systems in detail. Data (Test Collection): The test collection (documents, topics, and relevance judgments) will be an extension of the collection (English only) used for the ARPA TIPSTER project. The collection is being assembled from Linguistic Data Consortium text, and a LDC User Agreement will be required from all participants. The documents are an assorted collection of newspapers (including the Wall Street Journal), newswires, journals, technical abstracts and email newsgroups. The test set will be of approximately the same composition as the training set, and all documents will be typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The format of the documents is relatively clean and easy-to-use as is (see attachment 2). Most of the documents will consist of a text section only, with no titles or other categories. The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology. Response format and submission details By Dec. 1, 1993 organizations wishing to participate should respond to the call for participation by submitting a summary of their text retrieval approach and a system architecture description, not to exceed five pages in total. The summary should include the strengths and significance of their approach to text retrieval, and highlight differences between their approach and other retrieval approaches. Each organization should indicate in which category they wish to participate. Please indicate clearly the persons responsible for the summary statement and to whom correspondence should be directed. A full regular address, telephone number, and an email address should be given. EMAIL IS THE PREFERRED METHOD OF COMMUNICATION, although it is realized that diagrams and figures will need to be sent by regular mail or FAX. It is expected that ALL participants have some access to email, as conference communications will be done via email. It is highly likely that some Spanish text and topics (approximately a 1/4 gigabyte of text and 25 topics) will also be available for retrieval tests. If your organization is interested in trying Spanish (in addition to English), please state this and indicate the availability of at least one person who can read Spanish. All responses should be submitted by Dec. 1, 1993 to the Program Chair, Donna Harman: harman@magi.ncsl.nist.gov or Donna Harman, NIST, Building 225/A216, Gaithersburg, Md. 20899 FAX: 301-975-2128 AS NOTED ABOVE, EMAIL IS THE DESIRED FORM OF COMMUNICATION. ***************************************************************************** Any questions about conference participation, response format, etc. should also be sent to the same address. Selection of participants: As the goal of TREC is to further research in large-scale text retrieval, the program committee will be looking for as wide a range of text retrieval approaches as possible, and will select the best representatives of these approaches as participants for categories A and B. Category C participants must be able to demonstrate their ability to work with the full data collection. The program committee has been chosen from a broad range of information retrieval researchers and government users, and will both select the participants and provide guidance in the planning of the conference. Program Committee Donna Harman, NIST, chair Chris Buckley, Cornell University Susan Dumais, Bellcore Darryl Howard, U.S. Department of Defense David Lewis, AT & T Bell Labs Matt Mettler, TRW John Prange, U.S. Department of Defense Alan Smeaton, Dublin City University, Ireland Karen Sparck Jones, Cambridge University Richard Tong, Advanced Decision Systems Steve Walker, City University, London ---------------------------------------------------------------------------- Attachment 1 -- Sample Topic <top> <head> Tipster Topic Description <num> Number: 028 <dom> Domain:Science and Technology <title> Topic: AT&T's Technical Efforts <desc> Description: Document must describe AT&T's technical efforts in computers and communications. <narr> Narrative: To be relevant, a document must contain information on American Telephone and Telegraph's (AT&T) technical efforts in computers and communications. Examples of relevant subject matter would include: product announcements, releases or cancellations, and discussion of AT&T Bell Labs research. Documents focusing either AT&T's efforts to buy other computer companies or AT&T's legal battles with other organizations, or AT&T's Unix operating system are NOT relevant. For the purposes of this topic the Regional Bell Operating Companies, (RBOC's) or the "Baby Bells" are not considered AT&T. <con> Concept(s): 1. AT&T, American Telephone and Telegraph 2. 3B-2 minicomputer, AT&T 386 PC 3. AT&T Starlan 4. PBX, 5. Product announcements, product releases </top> ------------------------------------------------------------------------------- Attachment 2 -- Sample Document (abridged) <DOC> <DOCNO> WSJ880406-0090 </DOCNO> <HL> AT&T Unveils Services to Upgrade Phone Networks Under Global Plan </HL> <AUTHOR> Janet Guyon (WSJ Staff) </AUTHOR> <SO> </SO> <CO> T </CO> <IN> TEL </IN> <DATELINE> NEW YORK </DATELINE> <TEXT> American Telephone & Telegraph Co. introduced the first of a new generation of phone services with broad implications for computer and communications equipment markets. AT&T said it is the first national long-distance carrier to announce prices for specific services under a world-wide standardization plan to upgrade phone networks. By announcing commercial services under the plan, which the industry calls the Integrated Services Digital Network, AT&T will influence evolving communications standards to its advantage, consultants said, just as International Business Machines Corp. has created de facto computer standards favoring its products. . . </TEXT> </DOC> -----------------------------------------------------------------------