|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectinfo.ephyra.util.HTMLConverter
public class HTMLConverter
The HTMLConverter can be used to convert an HTML document to
plain text.
| Field Summary | |
|---|---|
private static int |
TIMEOUT
Timeout for HTTP connections in milliseconds. |
| Constructor Summary | |
|---|---|
HTMLConverter()
|
|
| Method Summary | |
|---|---|
static java.lang.String |
file2text(java.lang.String filename)
Reads an HTML document from a file and converts it into plain text. |
static java.lang.String |
html2text(java.lang.String html)
Converts an HTML document into plain text. |
static java.lang.String |
htmlsnippet2text(java.lang.String snippet)
Converts a snippet with HTML tags and special characters into plain text. |
static boolean |
isUrl(java.lang.String s)
Checks if the given string is a URL. |
static java.lang.String |
replaceSpecialCharacters(java.lang.String html)
Handles special characters in HTML documents by replacing sequences of the form &... |
static java.lang.String |
url2text(java.lang.String url)
Fetches an HTML document from a URL and converts it into plain text. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final int TIMEOUT
| Constructor Detail |
|---|
public HTMLConverter()
| Method Detail |
|---|
public static boolean isUrl(java.lang.String s)
s - a string
true iff the string is a URLpublic static java.lang.String replaceSpecialCharacters(java.lang.String html)
&...; by the corresponding characters.
html - html document
public static java.lang.String htmlsnippet2text(java.lang.String snippet)
snippet - HTML snippet
public static java.lang.String html2text(java.lang.String html)
html - HTML document
null if the conversion failedpublic static java.lang.String file2text(java.lang.String filename)
filename - name of file containing HTML documents
null if the reading or conversion failed
public static java.lang.String url2text(java.lang.String url)
throws java.net.SocketTimeoutException
url - URL of HTML document
null if the fetching or conversion failed
java.net.SocketTimeoutException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||