info.ephyra.answerselection.filters
Class WebDocumentFetcherFilter

java.lang.Object
  extended by info.ephyra.answerselection.filters.Filter
      extended by info.ephyra.answerselection.filters.WebDocumentFetcherFilter

public class WebDocumentFetcherFilter
extends Filter

A filter that fetches web documents that contain the given search engine snippets.

This class extends the class Filter.

Version:
2007-07-24
Author:
Nico Schlaefer

Field Summary
private static java.lang.String CACHE_DIR
          Cache directory where web documents are stored.
private static boolean CACHING
          Enable caching of web documents.
private  java.util.ArrayList<Result> docs
          Documents fetched by the WebDocumentFetcher threads.
private static java.lang.String FORBIDDEN_DOCS
          Forbidden document types.
private static int MAX_DOCS
          Maximum number of documents to fetch.
private static int MAX_PENDING
          Maximum number of documents fetched in parallel.
private  int pending
          Number of active WebDocumentFetcher threads.
 
Constructor Summary
WebDocumentFetcherFilter()
           
 
Method Summary
 void addDoc(Result doc, boolean cached)
          Used by the WebDocumentFetcher threads to return the documents.
 Result[] apply(Result[] results)
          Fetches the top MAX_DOCS documents containing the given search engine snippets.
 void incPending()
          Increments the number of pending fetchers by 1.
private  void waitForDocs()
          Delays the main thread until all documents have been fetched.
 void waitForPending()
          Delays a thread until there are less than MAX_PENDING pending fetchers.
 
Methods inherited from class info.ephyra.answerselection.filters.Filter
apply
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FORBIDDEN_DOCS

private static final java.lang.String FORBIDDEN_DOCS
Forbidden document types.

See Also:
Constant Field Values

MAX_DOCS

private static final int MAX_DOCS
Maximum number of documents to fetch.

See Also:
Constant Field Values

MAX_PENDING

private static final int MAX_PENDING
Maximum number of documents fetched in parallel.

See Also:
Constant Field Values

CACHING

private static final boolean CACHING
Enable caching of web documents.

See Also:
Constant Field Values

CACHE_DIR

private static final java.lang.String CACHE_DIR
Cache directory where web documents are stored.

See Also:
Constant Field Values

docs

private java.util.ArrayList<Result> docs
Documents fetched by the WebDocumentFetcher threads.


pending

private int pending
Number of active WebDocumentFetcher threads.

Constructor Detail

WebDocumentFetcherFilter

public WebDocumentFetcherFilter()
Method Detail

waitForDocs

private void waitForDocs()
Delays the main thread until all documents have been fetched.


waitForPending

public void waitForPending()
Delays a thread until there are less than MAX_PENDING pending fetchers.


incPending

public void incPending()
Increments the number of pending fetchers by 1.


addDoc

public void addDoc(Result doc,
                   boolean cached)
Used by the WebDocumentFetcher threads to return the documents.

Parameters:
doc - document that contains a snippet
cached - flag indicating that the document was fetched from the search engine cache

apply

public Result[] apply(Result[] results)
Fetches the top MAX_DOCS documents containing the given search engine snippets. The original snippets are dropped.

Overrides:
apply in class Filter
Parameters:
results - array of Result objects containing snippets
Returns:
array of Result objects containing entire documents