info.ephyra.indexing
Class AQUAINT2Preprocessor

java.lang.Object
  extended by info.ephyra.indexing.AQUAINT2Preprocessor

public class AQUAINT2Preprocessor
extends java.lang.Object

A preprocessor for the AQUAINT-2 corpus:

Version:
2007-07-14
Author:
Nico Schlaefer

Field Summary
private static java.lang.String dir
          Directory of the AQUAINT corpus
 
Constructor Summary
AQUAINT2Preprocessor()
           
 
Method Summary
private static boolean addParagraphTags()
          Adds paragraph tags to documents of type 'multi', 'advis' and 'other'.
private static boolean convertToTrectext()
          Converts the documents to the 'trectext' format required by Indri.
static void main(java.lang.String[] args)
          Entry point of the program.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dir

private static java.lang.String dir
Directory of the AQUAINT corpus

Constructor Detail

AQUAINT2Preprocessor

public AQUAINT2Preprocessor()
Method Detail

addParagraphTags

private static boolean addParagraphTags()
Adds paragraph tags to documents of type 'multi', 'advis' and 'other'. Documents of type 'story' are usually already tagged.

Returns:
true, iff the preprocessing was successful

convertToTrectext

private static boolean convertToTrectext()
Converts the documents to the 'trectext' format required by Indri.

Returns:
true, iff the preprocessing was successful

main

public static void main(java.lang.String[] args)

Entry point of the program.

Preprocesses the AQUAINT-2 corpus.

Parameters:
args - argument 1: directory of the AQUAINT-2 corpus