info.ephyra.search.searchers
Class IndriKM

java.lang.Object
  extended by java.lang.Thread
      extended by info.ephyra.search.searchers.Searcher
          extended by info.ephyra.search.searchers.KnowledgeMiner
              extended by info.ephyra.search.searchers.IndriKM
All Implemented Interfaces:
java.lang.Runnable

public class IndriKM
extends KnowledgeMiner

A KnowledgeMiner that deploys the Indri IR system to search a local text corpus. The search results are paragraphs.

It runs as a separate thread, so several queries can be performed in parallel.

This class extends the class KnowledgeMiner.

Version:
2007-07-26
Author:
Nico Schlaefer

Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler
 
Field Summary
private static java.lang.String FORBIDDEN_CHAR
          Regular expression that matches characters that cause problems in Indri queries and thus should be removed from query strings.
private  java.lang.String[] indriDirs
          Directories of Indri indices.
private  java.lang.String[] indriUrls
          URLs of Indri servers.
private static int MAX_DOCS
          Maximum number of documents fetched at a time.
private static int MAX_RESULTS_PERQUERY
          Maximum number of search results per query.
private static int MAX_RESULTS_TOTAL
          Maximum total number of search results.
 
Fields inherited from class info.ephyra.search.searchers.KnowledgeMiner
firstResult, maxResults
 
Fields inherited from class info.ephyra.search.searchers.Searcher
query, results
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
IndriKM(java.lang.String[] locations, boolean isServers)
          Creates a new Indri knowledge miner and sets the directories of indices or the URLs of servers.
 
Method Summary
protected  Result[] doSearch()
          Queries the Indri indices or servers and returns an array containing up to MAX_RESULTS_PERQUERY search results.
 KnowledgeMiner getCopy()
          Returns a new instance of IndriKM.
static java.lang.String[][] getIndriIndices()
          Gets a list of all Indri index directories that have been specified with environment variables 'INDRI_INDEX', 'INDRI_INDEX2', 'INDRI_INDEX3' etc.
static java.lang.String[][] getIndriServers()
          Gets a list of all Indri server URLs that have been specified with environment variables 'INDRI_SERVER', 'INDRI_SERVER2', 'INDRI_SERVER3' etc.
protected  int getMaxResultsPerQuery()
          Returns the maximum number of search results per query.
protected  int getMaxResultsTotal()
          Returns the maximum total number of search results.
static java.lang.String transformQueryString(java.lang.String qs)
          Returns a representation of the query string that is suitable for Indri.
 
Methods inherited from class info.ephyra.search.searchers.KnowledgeMiner
getResults, getResults, start, start
 
Methods inherited from class info.ephyra.search.searchers.Searcher
run
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MAX_RESULTS_TOTAL

private static final int MAX_RESULTS_TOTAL
Maximum total number of search results.

See Also:
Constant Field Values

MAX_RESULTS_PERQUERY

private static final int MAX_RESULTS_PERQUERY
Maximum number of search results per query.

See Also:
Constant Field Values

MAX_DOCS

private static final int MAX_DOCS
Maximum number of documents fetched at a time.

See Also:
Constant Field Values

FORBIDDEN_CHAR

private static final java.lang.String FORBIDDEN_CHAR

Regular expression that matches characters that cause problems in Indri queries and thus should be removed from query strings.

Indri allows the following characters:

However, for some of the special characters Indri fails to retrieve results and therefore they are excluded.

See Also:
Constant Field Values

indriDirs

private java.lang.String[] indriDirs
Directories of Indri indices.


indriUrls

private java.lang.String[] indriUrls
URLs of Indri servers.

Constructor Detail

IndriKM

public IndriKM(java.lang.String[] locations,
               boolean isServers)
Creates a new Indri knowledge miner and sets the directories of indices or the URLs of servers.

Parameters:
locations - directories of indices or URLs of servers
isServers - true iff the first parameter provides URLs of servers
Method Detail

getIndriIndices

public static java.lang.String[][] getIndriIndices()
Gets a list of all Indri index directories that have been specified with environment variables 'INDRI_INDEX', 'INDRI_INDEX2', 'INDRI_INDEX3' etc. One environment variable can specify multiple indices which are queried with the same knowledge miner.

Returns:
Indri index directories grouped by knowledge miners

getIndriServers

public static java.lang.String[][] getIndriServers()
Gets a list of all Indri server URLs that have been specified with environment variables 'INDRI_SERVER', 'INDRI_SERVER2', 'INDRI_SERVER3' etc. One environment variable can specify multiple servers which are queried with the same knowledge miner.

Returns:
Indri server URLs grouped by knowledge miners

transformQueryString

public static java.lang.String transformQueryString(java.lang.String qs)
Returns a representation of the query string that is suitable for Indri.

Parameters:
qs - query string
Returns:
query string for Indri

getMaxResultsTotal

protected int getMaxResultsTotal()
Returns the maximum total number of search results.

Specified by:
getMaxResultsTotal in class KnowledgeMiner
Returns:
maximum total number of search results

getMaxResultsPerQuery

protected int getMaxResultsPerQuery()
Returns the maximum number of search results per query.

Specified by:
getMaxResultsPerQuery in class KnowledgeMiner
Returns:
maximum total number of search results

doSearch

protected Result[] doSearch()
Queries the Indri indices or servers and returns an array containing up to MAX_RESULTS_PERQUERY search results.

Specified by:
doSearch in class Searcher
Returns:
Indri search results

getCopy

public KnowledgeMiner getCopy()
Returns a new instance of IndriKM. A new instance is created for each query.

Specified by:
getCopy in class KnowledgeMiner
Returns:
new instance of IndriKM