Lemur User Interfaces

Contents

Overview
Tips on building an index for interactive applications
Installing the CGI
Query processing for the CGI
Installing the stand-alone GUI

1. Overview
Lemur was originally designed to be used as a research system with batch retrieval. However, we are moving towards supporting more interactive applications. To this end we have a CGI script for web based retrieval and a stand-alone GUI application. These applications still take advantage of the Lemur API and require Lemur indexes.
2. Tips on building an index for interactive applications
Lemur has more than one index, but they are not all best suited for interactive use (although they are all technically compatible). Especially for larger text collections, the speed at which the index loads might be an issue. For this reason, we recommend using the KeyfileIncIndex (.key) with the CGI and GUI. The KeyfileIncIndex loads quickly because it does not need to load in all of the document ids and term dictionary. It also requires less memory during runtime because of this.
Another common requirement for interactive use is being able to see the full document text from the results list. In Lemur, you have to build a DocumentManager and associate it with your index to have this functionality. Otherwise, you will still get results, but not be able to open the documents for viewing. You can build a DocumentManager and an Index simultaneously by using the application BuildDocMgr. This application is provided with Lemur.
BuildDocMgr is capable of building a few different kinds of DocumentManagers as well. In this case, we recommend using the ElemDocMgr (specify with managerType elem). This DocumentManager is capable of grabbing separate elements back from the original document if it was tagged and parsed appropriately. This way, when you see your list of results, you can see the document title or headline instead of its ID, which may not be easy to read. This does not change the operation of the programs, only the visualization of the results list. The TrecParser and WebParser included with Lemur can parse and recognize titles enclosed in <TTL> and <TITLE> tags, respectively. The TrecParser also recognizes headlines in <HL>, <HEAD>, or <HEADLINE> tags.
It is important that you build the Index and DocumentManager in such a way that it can later be opened and used from any other directory. To do this, you must specify full path names in your parameter files when specifying the Index and DocumentManager names. You should also use full path names to point to your data files (so that the DocumentManager can find them later), or keep them in the same directory as the DocumentManager. Data files are listed in a file specified by the dataFiles parameter.
3. Installing the CGI

Install the Lemur CGI web interface by modifying the file cgi.cpp to point to the index that you want to use. You can also update the contact information for the webpages here. All the information that needs to be modified is plainly marked and at the top of the file. You should not need to touch the rest of the files.
After making these changes, you will need to compile it against the Lemur libraries. To do this, run the configure script and specify the Lemur installation directory. Then run make. For example, you might execute these commands:
> ./configure -with-lemur=/usr/local
> make

After it compiles, it will build a CGI program called lemur.cgi. Copy lemur.cgi, all the .html files, and the .gif files to your webserver. Your webserver must be configured to allow CGI scripts.
4. Query processing for the CGI

Aside from the initial setup changes you've made to cgi.cpp, there is one other common thing you might to change in the Lemur CGI program, and that is query processing. The program currently does no query processing other than recognize the InQueryRetMethod structured query language.
You might want to omit stopwords or stem the query terms. To do this, you will have to modify the method textdb::search in textdb.cpp (section commented "//process struct query"). After modifying the code, recompile by running make and copy the new lemur.cgi file to your webserver.
For example, to add a stopword list:
// process struct query
int qlen = (*qry).length();
char* query = new char[qlen+5];
sprintf(query, "#q1=%s\0", (*qry).c_str());
InQueryOpParser opparser;
// add stopword list
Stopper *stopper = new Stopper("/usr1/data/mystoplist");
QueryDocument* d = new QueryDocument();
opparser.setTextHandler(stopper);
stopper->setTextHandler(d);
opparser.parseBuffer(query, qlen+5);
StructQuery* q = new StructQuery(*d);
QueryRep *qr = model->computeQueryRep(*q);
model->scoreCollection(*qr, results);
results.Sort();

And to add a Krovetz stemmer as well:
// process struct query
int qlen = (*qry).length();
char* query = new char[qlen+5];
sprintf(query, "#q1=%s\0", (*qry).c_str());
InQueryOpParser opparser;
//add stopword list
Stopper *stopper = new Stopper("/usr1/data/mystoplist");
QueryDocument* d = new QueryDocument();
opparser.setTextHandler(stopper);
// add stemmer
string stemdir = "/usr/local/lemur-2.2/kstem_data";
Stemmer* stem = new KStemmer(stemdir);
stopper->setTextHandler(stem);
stem->setTextHandler(d);
opparser.parseBuffer(query, qlen+5);
StructQuery* q = new StructQuery(*d);
QueryRep *qr = model->computeQueryRep(*q);
model->scoreCollection(*qr, results);
results.Sort();

5. Installing the stand-alone GUI

The simplest thing to do would probably be to just download the pre-compiled jar (and JLemur library). If your operating system has .jar associated with java you should be able to just double-click it to run. Most windows systems have the .jar extension associated with java so it should just work. There are more variables on unix systems that would prevent it from working, like java and compiler versions, so you might have to compile it on your system.
If it is not associated, you can start it on the command line or most versions of java, by typing "java -jar lemurgui-2.2.jar". For some versions of java, like IBM's, type just "java lemurgui-2.2.jar". That will start the java GUI, however when you actually run retrieval, it will also need to find the JLemur library. If you get the error "java.lang.UnsatisfiedLinkError: no JLemur in java.library.path", it means that the GUI cannot find JLemur.dll (on windows) or libJLemur.so (on unix). On windows, java looks for the dll in the current path and in paths specified in your PATH environment variable. On unix, it looks for the library in directories in your LD_LIBRARY_PATH environment variable.
If you don't want to type that every time, you can try to create a new association in windows by opening a file explorer window. Then choose the "Tools->Folder Options" menu. A "Folder Options" dialog will pop up. Select the "File Types" tab. Here, you can view/modify all current file associations and add new ones.

The Lemur Project
Last modified: Wed Jun 16 12:49:59 EDT 2004