websphinx
Class DownloadParameters

java.lang.Object
  |
  +--websphinx.DownloadParameters
All Implemented Interfaces:
java.lang.Cloneable, java.io.Serializable

public class DownloadParameters
extends java.lang.Object
implements java.lang.Cloneable, java.io.Serializable

Download parameters. These parameters are limits on how Page can download a Link. A Crawler has a default set of download parameters, but the defaults can be overridden on individual links by calling Link.setDownloadParameters().

DownloadParameters is an immutable class (like String). "Changing" a parameter actually returns a new instance of the class with only the specified parameter changed.


Field Summary
static DownloadParameters DEFAULT
           
static DownloadParameters NO_LIMITS
           
 
Constructor Summary
DownloadParameters()
          Make a DownloadParameters object with default settigns.
 
Method Summary
 DownloadParameters changeAcceptedMIMETypes(java.lang.String types)
          Change accepted MIME types.
 DownloadParameters changeCrawlTimeout(int timeout)
          Change timeout value.
 DownloadParameters changeDownloadTimeout(int timeout)
          Change download timeout value.
 DownloadParameters changeInteractive(boolean f)
          Change interactive flag.
 DownloadParameters changeMaxPageSize(int maxPageSize)
          Change maximum page size.
 DownloadParameters changeMaxThreads(int maxthreads)
          Set maximum threads.
 DownloadParameters changeObeyRobotExclusion(boolean f)
          Change obey-robot-exclusion flag.
 DownloadParameters changeUseCaches(boolean f)
          Change use-caches flag.
 DownloadParameters changeUserAgent(java.lang.String userAgent)
          Change User-agent field used in HTTP requests.
 java.lang.Object clone()
          Clone a DownloadParameters object.
 java.lang.String getAcceptedMIMETypes()
          Get accepted MIME types.
 int getCrawlTimeout()
          Get timeout on entire crawl.
 int getDownloadTimeout()
          Get download timeout value.
 boolean getInteractive()
          Get interactive flag.
 int getMaxPageSize()
          Get maximum page size.
 int getMaxThreads()
          Get maximum threads.
 boolean getObeyRobotExclusion()
          Get obey-robot-exclusion flag.
 boolean getUseCaches()
          Get use-caches flag.
 java.lang.String getUserAgent()
          Get User-agent header used in HTTP requests.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT

public static final DownloadParameters DEFAULT

NO_LIMITS

public static final DownloadParameters NO_LIMITS
Constructor Detail

DownloadParameters

public DownloadParameters()
Make a DownloadParameters object with default settigns.

Method Detail

clone

public java.lang.Object clone()
Clone a DownloadParameters object.

Overrides:
clone in class java.lang.Object

getMaxThreads

public int getMaxThreads()
Get maximum threads.

Returns:
maximum number of background threads used by crawler. Default is 4.

changeMaxThreads

public DownloadParameters changeMaxThreads(int maxthreads)
Set maximum threads.

Parameters:
maxthreads - maximum number of background threads used by crawler
Returns:
new DownloadParameters object with the specified parameter changed.

getMaxPageSize

public int getMaxPageSize()
Get maximum page size. Pages larger than this limit are neither downloaded nor parsed. Default value is 100 (KB). 0 or negative values mean no limit.

Returns:
maximum page size in kilobytes

changeMaxPageSize

public DownloadParameters changeMaxPageSize(int maxPageSize)
Change maximum page size. Pages larger than this limit are treated as leaves in the crawl graph -- neither downloaded nor parsed.

Parameters:
maxPageSize - maximum page size in kilobytes
Returns:
new DownloadParameters object with the specified parameter changed.

getDownloadTimeout

public int getDownloadTimeout()
Get download timeout value.

Returns:
length of time (in seconds) that crawler will wait for a page to download before aborting it. timeout. Default is 60 seconds.

changeDownloadTimeout

public DownloadParameters changeDownloadTimeout(int timeout)
Change download timeout value.

Parameters:
timeout - length of time (in seconds) to wait for a page to download Use a negative value to turn off timeout.
Returns:
new DownloadParameters object with the specified parameter changed.

getCrawlTimeout

public int getCrawlTimeout()
Get timeout on entire crawl.

Returns:
maximum length of time (in seconds) that crawler will run before aborting. Default is -1 (no limit).

changeCrawlTimeout

public DownloadParameters changeCrawlTimeout(int timeout)
Change timeout value.

Parameters:
timeout - maximum length of time (in seconds) that crawler will run. Use a negative value to turn off timeout.
Returns:
new DownloadParameters object with the specified parameter changed.

getObeyRobotExclusion

public boolean getObeyRobotExclusion()
Get obey-robot-exclusion flag.

Returns:
true iff the crawler checks robots.txt on the remote Web site before downloading a page. Default is false.

changeObeyRobotExclusion

public DownloadParameters changeObeyRobotExclusion(boolean f)
Change obey-robot-exclusion flag.

Parameters:
f - If true, then the crawler checks robots.txt on the remote Web site before downloading a page.
Returns:
new DownloadParameters object with the specified parameter changed.

getInteractive

public boolean getInteractive()
Get interactive flag.

Returns:
true if a user is available to respond to dialog boxes (for instance, to enter passwords for authentication). Default is true.

changeInteractive

public DownloadParameters changeInteractive(boolean f)
Change interactive flag.

Parameters:
f - true if a user is available to respond to dialog boxes
Returns:
new DownloadParameters object with the specified parameter changed.

getUseCaches

public boolean getUseCaches()
Get use-caches flag.

Returns:
true if cached pages should be used whenever possible

changeUseCaches

public DownloadParameters changeUseCaches(boolean f)
Change use-caches flag.

Parameters:
f - true if cached pages should be used whenever possible
Returns:
new DownloadParameters object with the specified parameter changed.

getAcceptedMIMETypes

public java.lang.String getAcceptedMIMETypes()
Get accepted MIME types.

Returns:
list of MIME types that can be handled by the crawler (which are passed as the Accept header in the HTTP request). Default is null.

changeAcceptedMIMETypes

public DownloadParameters changeAcceptedMIMETypes(java.lang.String types)
Change accepted MIME types.

Parameters:
types - list of MIME types that can be handled by the crawler. Use null if the crawler can handle anything.
Returns:
new DownloadParameters object with the specified parameter changed.

getUserAgent

public java.lang.String getUserAgent()
Get User-agent header used in HTTP requests.

Returns:
user-agent field used in HTTP requests, or null if the Java library's default user-agent is used. Default value is null (but for a Crawler, the default DownloadParameters has the Crawler's name as its default user-agent).

changeUserAgent

public DownloadParameters changeUserAgent(java.lang.String userAgent)
Change User-agent field used in HTTP requests.

Parameters:
userAgent - user-agent field used in HTTP requests. Pass null to use the Java library's default user-agent field.
Returns:
new DownloadParameters object with the specified parameter changed.