Installing and Running Lemur (Version 1.1)

Contents

Installation on Unix

Installation on Windows NT

Running Applications

Testing the Toolkit on Sample Data

Using the API

Modifying the Toolkit

1. Installation on Unix
After downloading the Unix Lemur package (lemur-1.1.tar.gz), follow the following steps to install it:
Unpack the source

On the command line, type in the following commands to unpack the package. This should create a directory named lemur-1.1.
> gunzip lemur-1.1.tar.gz 
> tar -xvf lemur-1.1.tar
Configure the makefiles

Go to directory lemur-1.1 and run the configuration script configure. This will generate a file named "MakeDefns", which has some customized definitions to be used in makefiles.
You can configure the makefiles in two different modes -- either debug or optimize -- by giving a value to its optional argument "--with-comp-mode". The argument "--with-comp-mode" takes a value of either "debug" or "optimize", as shown in the following examples. The debug mode would allow you to debug the Lemur code, which is generally recommended, if you are to program on top of Lemur or to change Lemur in any way. The optimize mode would be appropriate, if you are just to run Lemur applications, as it would generate executables that may run faster than the corresponding debug versions.
To configure a debug version of makefiles, use the following command:
lemur-1.1>./configure --with-comp-mode=debug
To configure a optimize version of makefiles, use the following command:
lemur-1.1>./configure --with-comp-mode=optimize
If you do not specify a value for "--with-comp-mode", debug mode is assumed.

Compile Lemur

With directory lemur-1.1 as the current working directory, type in "gmake". This will compile the whole Lemur toolkit and link all the Lemur applications. (We have only tested Lemur using the GNU make.)
lemur-1.1> gmake 
Install Lemur library

First, set the environment variable LEMUR_INSTALL_PATH to where you would like to install Lemur. The default directory for installation is the root "/".
Then, with directory lemur-1.1 as the current working directory, type in "gmake install". This will install the Lemur library and include files according to the directory specified by the environment variable LEMUR_INSTALL_PATH. The library will be in LEMUR_INSTALL_PATH/lemur/lib/liblemur.a and all the include files will be in LEMUR_INSTALL_PATH/lemur/include.
For example, with C-shell,
lemur> setenv LEMUR_INSTALL_PATH /usr0/mydir-for-lemur
lemur> make install
will create /usr0/mydir-for-lemur/lemur/lib/liblemur.a and a bunch of ".hpp" (C++ header files) and ".h" (C header files) in /usr0/mydir-for-lemur/lemur/include/. The application executables will be all in /usr0/mydir-for-lemur/lemur/bin.

For users who are only interested in using Lemur as a library and application suite, the original source tree (i.e., the lemur-1.1 directory) can be removed after this step.

Known Problems with Some C++ Compilers and Operating Systems

In our test, Lemur has been compiled and linked successfully with the following compilers on Unix operating systems.

Gnu g++ 2.95.3 on Linux and Solaris
Gnu g++ 2.96, 3.01, and 3.02 on Linux
2. Installation on Windows NT

Extracting toolkit files

After downloading the toolkit, uncompress it. Even though it's a .tar.gz, winzip can do this if you simply double-click on the file. Choose a directory to extract the files into. The extraction process will create the lemur directory and all folder structures for you.

Building the libraries

There are four .mak files in the lemur directory that are NMAKE files. These can be built using the command line or using Visual C++ (version 6.0). The four files are lemur_utility.mak, lemur_index.mak, lemur_langmod.mak, and lemur_retrieval.mak. Each one builds a .lib with the same name in the same directory.

Using Visual C++

In Visual C++, open the .mak file for the library that you want to build. It will complain about not having a project file and make one for you. Say yes, and save it (anywhere). It will create a project where the only file is the makefile. When you build, it might claim to build a .exe, but it will still build the .lib, and regardless of where you saved the project, it should build the .lib in the lemur directory. For more information, look in VC++ help, indexed under "makefile: porting NMAKE projects to the development environment." If you want to create real projects for these libraries, you can create new static library projects and add in the source files into the project. Each library containts all the source files from the directory with the corresponding names. For example, lemur_utility builds all files from lemur\utility and lemur_index builds all files from lemur\index.

Using command line nmake

On the command line in the lemur directory (you will need to have nmake installed and in your path), using

nmake /f "lemur_utility.mak"

to build the Release version of the lemur_utility.lib, Or type

nmake /f "lemur_utility.mak" CFG="lemur_utility - Win32 Debug"

for the Debug version.

Building an application

To build an existing application or to write your own, compile the main source file and link in the library files. In Visual C++, create a new project (empty win32 console application) and add the source file for the application. Link in the library files by listing them in the "object/library modules" field under the "link" tab of project settings. Make sure you have the lemur path in the "additional library path" field, also under the "link" tab in the "input" category.

3. Running Applications

The executables for Lemur applications are generated in the directory app/obj; they will be copied to LEMUR_INSTALL_PATH/lemur/bin after running "make install'. All Lemur applications assume a parameter file, which provides value definitions for all (or most of) the input variables of an application. The parameter file makes it easier to record the settings of parameters used when running an application. This is especially convenient for complicated experiments that usually involve many parameters. (It can be awkward, though, to have an extra file when the application is simple.)
The following is an example parameter file used for BuildBasicIndex application, which builds a basic index in /usr3/web/ (with a prefix index) from a source file named source in the same directory. (On Windows environment, you will need to specify the file path according to the convention of Windows, e.g., using back-slashes.) Note that the use of the semicolon is mandatory.
inputFile    = /usr3/web/source;
outputPrefix = /usr3/web/index;
maxDocuments = 600000;
maxMemory    = 0x10000000;
To run an application, simply use the parameter file as the only argument. For example, if the parameter file above is named buildparam in the directory /usr3/web, then just do:
/usr3/web> BuildBasicIndex buildparam
All applications will display a usage or a list of required input variables, if you run it with the "--help" option. For details about how to use each of the applications in the Lemur toolkit, see the Lemur Applications User's Guide .

4. Testing the Toolkit on Sample Data

The Lemur Toolkit comes with a sample data directory which includes a small public information retrieval testing collection (i.e., the CACM collection available from the Cornell ftp site ftp://ftp.cs.cornell.edu/pub/smart/). This sample data is to let you easily try the toolkit and will help you to understand the capabilities of the toolkit as well as how to use them.
The directory has three shell scripts test_basic_index.sh, test_pos_index.sh, and clean.sh. test_basic_index.sh is a script that uses the basic index and demonstrates most of the functionality of Lemur, i.e., from formatting a database, building an index, to running various kinds of retrieval experiments. test_pos_index.sh does the same thing, but using the position index. (See the API documentation for more information about these two indices.) clean.sh cleans up any files generated by any of the two testing scripts. test_basic_index.sh and test_pos_index.sh are both self-documented, so if you look at the scripts you will be able to understand what they do.
Your output should not be too different from the output contained in the sample output files listed here. Roundoff error should only lead to minor deviations from these results.
Basically, both test_basic_index.sh and test_pos_index.sh would start from a source database file and a query file with some simple SGML format, and build an index of the database and a support file that is necessary to make some retrieval algorithms fast, and then, they will run different retrieval experiments with different parameter files. The retrieval results are evaluated with a perl script ( ireval.pl ) in the app/src directory. A precision recall summary file is generated for each experiment.
You can try to change some of the settings in the parameter files and see how it will affect the retrieval performance.
Windows users might be able to run these scripts under cygwin. Even if they can not run the scripts directly, they should still be able to repeat the commands in these shell scripts manually or in some other automatic way.

5. Using the Lemur API
To use the Lemur API on Unix, you will need both the Lemur library file and all the header files. The installation script of Lemur generally puts the library file in LEMUR_INSTALL_PATH/lemur/lib/liblemur.a and all the header files in LEMUR_INSTALL_PATH/lemur/include/. Header files in C have the extension of .h, while a C++ header file has an extension of .hpp. You will use the Lemur library exactly in the same way as you would use any other C++ library. This means you generally do the following:

In your C++ code, include the relevant Lemur header files.

When compiling your code, use -I<LEMUR_INSTALL_PATH>/lemur/include as an option so that the compiler can find the included files. (Of course, you need to replace <LEMUR_INSTALL_PATH> with the actual directory where you install Lemur.

When linking your code, use -L<LEMUR_INSTALL_PATH>/lemur/lib as an option so that the linker can find the Lemur library. Also, you need to specify -llemur as a linking option to indicate that you want the Lemur library to be linked with your code. You may need to be careful about the order of the libraries you specified. The order reflects the assumed dependency among the libraries.

To use the Lemur API on Windows, see Installation on Windows NT

6. Modifying the Toolkit

The directory structure and makefiles
The toolkit version 1.1 has the following directories:

utility: some common and basic utility classes, including document stream handlers.
index: indexing support
langmod: general language model support
retrieval: retrieval algorithms
app: applications
data: sample data

The code exists in all the directories, except "data". Each "code directory" (i.e., a "module") has four subdirectories: include, src, depend, and obj. "include" has all the header files, while "src" has all the implementation files. "depend" is to store a depend file for each source file. "obj" is to hold the compiled object code. obj and depend are created automatically by the makefiles. There are two types of modules: library module and application module. A library module will be compiled and linked as a module library, whereas an application module has a bunch of application programs. Each of them has a main() function. Thus, all the applications must be linked individually. "utility", "index", "langmod", and "retrieval" are library modules; only "app" is an application directory. The makefiles are organized in the following way:

A main Makefile in the toolkit root directory. It is to descend to each subdirectory and make it.
MakeDefns and MakeRules in the toolkit root directory. MakeDefns has all the common definitions in the module Makefile and MakeRules specify the common rules for making several types of targets.
MakeMod and MakeApp in the toolkit root directory. MakeMod is some common definitions for making a library based on a module directory. MakeApp is some common definitions for making all the applications.
A Makefile in each of the src subdirectory of each module. It includes MakeDefns, MakeRules, and either MakeMod or MakeApp, depending on the type of the module directory.

The dependency of an object code on the source code is managed by generating a dependency file (with an extension ".d") for each of the source implementation file.

To make (compile/link) the lemur toolkit, go to directory lemur-1.1, and type in "gmake all" or "gmake"
To make all the module libraries, go to the main directory lemur-1.1, and type in "gmake lib".
To make the executable for one specific application program (e.g., RetEval) after all the libraries have been made by executing "gmake lib", go to app/obj, and type in "gmake RetEval". This will avoid linking other applications, so is useful when you are only interested in running one specific application.
To make the executables for all the application programs in the directory app after all the libraries have been made by executing "gmake lib", go to app/obj, and type in "gmake all".
To cleanup the Lemur toolkit (remove everything but the source), go to directory lemur-1.1, type in "gmake clean".

Note that before you can run gmake, you need to run "configure" first, i.e., go to "lemur-1.1", type in "./configure", which will probe your system environment and generate the right MakeDefns, needed by the makefiles in the subdirectories.

To modify an existing file or add a file to an existing directory:

Make the changes
Go to the Lemur root directory
Type in "gmake all" to update the whole toolkit.

OR

If you are only testing one specific application, which is often the case, then, do NOT type in "gmake all" after making the changes. Instead, go to the Lemur root directory, and type in "gmake lib" to first update all the libraries, and then, go to "app/obj" (or any other application object directory), and type in "gmake YOUR_APPLICATION", where YOUR_APPLICATION is the name of the application that you want to link (e.g., RelEval). There is also a shell script mkapp in the root directory of Lemur that does all these for you. That is, after making the changes, just go to the Lemur root directory and type in "./mkapp YOUR_APPLICATION".

To add a new (library) module to the toolkit:

Create the module subdirectory in the lemur root directory.
Put all include files in a subdirectory named "include" under the new module directory
Put all implementation files in a subdirectory named "src" under the new module directory.
Add the module directory name to the main Makefile in the toolkit root directory.
Copy a Makefile from an existing module directory (e.g, index/src/Makefile) to <new-module-dir>/src, and change the first two lines to define this module and its dependent modules.

To add a new application directory to the toolkit:

Create the application subdirectory in the lemur root directory
Put all include files in a subdirectory named "include" under the new module directory
Put all implementation files in a subdirectory named "src" under the new module directory
Add the application directory name to the Makefile in the lemur root directory.
Copy a Makefile from an existing application directory (e.g, app/src/Makefile) to <new-application-dir>/src, and change the line that defines the dependent modules.

The Lemur Project
Last modified: Wed Mar 27 15:19:48 EST 2002