info.ephyra.search.searchers
Class KnowledgeMiner

java.lang.Object
  extended by java.lang.Thread
      extended by info.ephyra.search.searchers.Searcher
          extended by info.ephyra.search.searchers.KnowledgeMiner
All Implemented Interfaces:
java.lang.Runnable
Direct Known Subclasses:
GoogleKM, IndriDocumentKM, IndriKM, YahooKM

public abstract class KnowledgeMiner
extends Searcher

A KnowledgeMiner deploys a document retrieval system to search an unstructured knowledge source, e.g. Google to search the World Wide Web.

It runs as a separate thread, so several queries can be performed in parallel.

This class extends the class Searcher and is abstract.

Version:
2007-05-29
Author:
Nico Schlaefer

Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler
 
Field Summary
protected  int firstResult
          The hit position of the first result to be fetched.
protected  int maxResults
          The maximum number of results to be fetched.
 
Fields inherited from class info.ephyra.search.searchers.Searcher
query, results
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
KnowledgeMiner()
           
 
Method Summary
abstract  KnowledgeMiner getCopy()
          Returns a new instance of the KnowledgeMiner.
protected abstract  int getMaxResultsPerQuery()
          Returns the maximum number of search results per query.
protected abstract  int getMaxResultsTotal()
          Returns the maximum total number of search results.
protected  Result[] getResults(java.lang.String[] passages, java.lang.String[] docIDs, boolean isHtml)
          Creates Result objects form an array of text passages and document IDs.
protected  Result[] getResults(java.lang.String[] passages, java.lang.String[] docIDs, java.lang.String[] cacheIDs, boolean isHtml)
          Creates Result objects form an array of text passages, document IDs and IDs of cached documents.
 void start(Query query)
          Creates [MAX_RESULTS_TOTAL / MAX_RESULTS_PERQUERY] threads that fetch up to MAX_RESULTS_TOTAL results.
protected  void start(Query query, int firstResult)
          Sets the query, the hit position of the first result and the number of results to be fetched and starts the thread.
 
Methods inherited from class info.ephyra.search.searchers.Searcher
doSearch, run
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

firstResult

protected int firstResult
The hit position of the first result to be fetched.


maxResults

protected int maxResults
The maximum number of results to be fetched.

Constructor Detail

KnowledgeMiner

public KnowledgeMiner()
Method Detail

getMaxResultsTotal

protected abstract int getMaxResultsTotal()
Returns the maximum total number of search results.

Returns:
maximum total number of search results

getMaxResultsPerQuery

protected abstract int getMaxResultsPerQuery()
Returns the maximum number of search results per query.

Returns:
maximum total number of search results

getResults

protected Result[] getResults(java.lang.String[] passages,
                              java.lang.String[] docIDs,
                              boolean isHtml)
Creates Result objects form an array of text passages and document IDs.

Parameters:
passages - text passages
docIDs - IDs of the documents the text passages are from
isHtml - flag indicating that the passages are HTML code
Returns:
Result objects

getResults

protected Result[] getResults(java.lang.String[] passages,
                              java.lang.String[] docIDs,
                              java.lang.String[] cacheIDs,
                              boolean isHtml)
Creates Result objects form an array of text passages, document IDs and IDs of cached documents.

Parameters:
passages - text passages
docIDs - IDs of the documents the text passages are from
cacheIDs - IDs of the documents in the search engine cache
isHtml - flag indicating that the passages are HTML code
Returns:
Result objects

start

protected void start(Query query,
                     int firstResult)

Sets the query, the hit position of the first result and the number of results to be fetched and starts the thread.

This method should be used instead of the inherited start() method without arguments.

Parameters:
query - Query object
firstResult - hit position of the first result

getCopy

public abstract KnowledgeMiner getCopy()

Returns a new instance of the KnowledgeMiner. A new instance is created for each query.

It does not necessarily return an exact copy of the current instance.

Returns:
new instance of the KnowledgeMiner

start

public void start(Query query)

Creates [MAX_RESULTS_TOTAL / MAX_RESULTS_PERQUERY] threads that fetch up to MAX_RESULTS_TOTAL results.

This method should be used instead of the inherited start() method without arguments.

Parameters:
query - Query object