websphinx
Class LinkTransformer

java.lang.Object
  |
  +--websphinx.HTMLTransformer
        |
        +--websphinx.LinkTransformer
Direct Known Subclasses:
Mirror, RewritableLinkTransformer

public class LinkTransformer
extends HTMLTransformer

Transformer that remaps URLs in links.

The default LinkTransformer simply converts all links to absolute URLs. Other common effects are easy to achieve:

The default LinkTransformer strips out <BASE> elements. Instead, it can output a <BASE> element with a user-specified URL. Use setBase() to set the URL and setEmitBaseElement() to indicate that it should be emitted.


Field Summary
protected  java.net.URL base
           
protected  java.util.Hashtable map
           
 
Constructor Summary
LinkTransformer(HTMLTransformer next)
          Make a LinkTransformer writing to another HTMLTransformer
LinkTransformer(java.io.OutputStream out)
          Make a LinkTransformer writing to a stream.
LinkTransformer(java.lang.String filename)
          Make a LinkTransformer writing to a file.
LinkTransformer(java.lang.String filename, boolean seekable)
          Make a LinkTransformer that writes pages to a file.
 
Method Summary
 java.net.URL getBase()
          Get the base URL used by the LinkTransformer.
 boolean getEmitBaseElement()
          Test whether the LinkTransformer should emit a <BASE> element pointing to the base URL.
protected  void handleBase(Element elem)
          Handle the BASE element.
protected  void handleElement(Element elem)
          Handle an element written through the transformer.
protected  void handleLink(Link link)
          Handle a Link's transformation.
 boolean isMapped(java.net.URL url)
          Test whether a URL is mapped.
 java.lang.String lookup(java.net.URL base, java.net.URL url)
          Look up the href for a URL, taking any mapping into account.
 void map(java.net.URL url, java.lang.String href)
          Map a URL to an href.
 void map(java.net.URL url, java.net.URL newURL)
          Map a URL to a new URL.
 void setBase(java.net.URL base)
          Set the base URL used by the LinkTransformer.
 void setEmitBaseElement(boolean emitBase)
          Set whether the LinkTransformer should emit a <BASE> element pointing to the base URL.
 void writePage(Page page)
          Write a page through the transformer.
 
Methods inherited from class websphinx.HTMLTransformer
close, emit, emit, finalize, flush, getFilePointer, getOutputStream, getOutputWriter, getRandomAccessFile, seek, setOutput, setRandomAccessFile, transformContents, transformElement, write, write
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

map

protected java.util.Hashtable map

base

protected java.net.URL base
Constructor Detail

LinkTransformer

public LinkTransformer(java.lang.String filename)
                throws java.io.IOException
Make a LinkTransformer writing to a file.

Parameters:
filename - Filename to write to

LinkTransformer

public LinkTransformer(java.lang.String filename,
                       boolean seekable)
                throws java.io.IOException
Make a LinkTransformer that writes pages to a file.

Parameters:
filename - Name of file to receive HTML output
seekable - True if file should be opened for random access

LinkTransformer

public LinkTransformer(java.io.OutputStream out)
Make a LinkTransformer writing to a stream.

Parameters:
out - stream to write to

LinkTransformer

public LinkTransformer(HTMLTransformer next)
Make a LinkTransformer writing to another HTMLTransformer

Parameters:
next - next transformer in filter chain
Method Detail

getBase

public java.net.URL getBase()
Get the base URL used by the LinkTransformer. A transformed link's URL is written out relative to this URL. For instance, if the base URL is http://www.yahoo.com/Entertainment/, then a link URL http://www.yahoo.com/News/Current/ would be written out as ../News/Current/.

Returns:
base URL, or null if no base URL is set. Default is null.

setBase

public void setBase(java.net.URL base)
Set the base URL used by the LinkTransformer. A transformed link's URL is written out relative to this URL. For instance, if the base URL is http://www.yahoo.com/Entertainment/, then a link URL http://www.yahoo.com/News/Current/ would be written out as ../News/Current/.

Parameters:
base - base URL, or null if no base URL should be used.

getEmitBaseElement

public boolean getEmitBaseElement()
Test whether the LinkTransformer should emit a <BASE> element pointing to the base URL.

Returns:
true if a <BASE> element should be emitted with each page.

setEmitBaseElement

public void setEmitBaseElement(boolean emitBase)
Set whether the LinkTransformer should emit a <BASE> element pointing to the base URL.

Parameters:
emitBase - true if a <BASE> element should be emitted with each page.

lookup

public java.lang.String lookup(java.net.URL base,
                               java.net.URL url)
Look up the href for a URL, taking any mapping into account.

Parameters:
base - base URL (or null if an absolute URL is desired)
url - URL of interest
Returns:
relative href for url from base

map

public void map(java.net.URL url,
                java.lang.String href)
Map a URL to an href. For example, Concatenator uses this call to map page URLs to their corresponding anchors in the concatenation.

Parameters:
url - URL of interest
href - href which should be returned by lookup (null, url)

map

public void map(java.net.URL url,
                java.net.URL newURL)
Map a URL to a new URL. For example, Mirror uses this call to map remote URLs to their corresponding local URLs.

Parameters:
url - URL of interest
newURL - URL which should be returned by lookup (null, url)

isMapped

public boolean isMapped(java.net.URL url)
Test whether a URL is mapped.

Parameters:
url - URL of interest
Returns:
true if map () was called to remap url

writePage

public void writePage(Page page)
               throws java.io.IOException
Write a page through the transformer. If getEmitBaseElement() is true and getBase() is non-null, then the transformer outputs a <BASE> element either inside the page's <HEAD> element (if present) or before the first tag that belongs in <BODY>.

Overrides:
writePage in class HTMLTransformer
Parameters:
page - Page to write

handleElement

protected void handleElement(Element elem)
                      throws java.io.IOException
Handle an element written through the transformer. Remaps attributes that contain URLs.

Overrides:
handleElement in class HTMLTransformer
Parameters:
elem - Element to transform

handleLink

protected void handleLink(Link link)
                   throws java.io.IOException
Handle a Link's transformation. Default implementation replaces the link's URL with lookup(URL).

Parameters:
link - Link to transform

handleBase

protected void handleBase(Element elem)
                   throws java.io.IOException
Handle the BASE element. Default implementation removes if if EmitBaseElement is false, or changes its URL to Base if EmitBaseElement is true.

Parameters:
elem - BASE element to transform