Parsing documents |
|
This thread discusses what kind of documents Lemur can process and how to make your documents campatible. It also describes other files that you might need to create before you start indexing, such as a list of stop words or acronyms. |
|
Step 1: Preparing your documents |
|
A general overview of what type of documents Lemur can process. |
|
Step 2: Choosing the right parser |
|
Describes the most commonly used parsers that are included with Lemur. Find one that is right for your documents and parse the way you want. |
|
advanced topic: Customizing a parser |
|
If what you want is not available, this section will explain how to customize existing parsers within Lemur. This will generally require that you know how to use flex, or at least have a general understanding of regular expressions. |
|
Step 3: Connecting other parsing elements |
|
This sections describes what other parsing elements are included with Lemur, such as stopper and stemmers. |
|
advanced topic: Understanding the TextHandler |
|
An overview of how all the text handling elements in Lemur work together. |
|
advanced topic: Writing a parser from scratch |
|
If what you want is not available and you don't want to modify an existing one, this section will describe how to write your own parser so that it still fits within the Lemur parsing architecture so that you can still use it with other Lemur text handling elements.
|