The `Bow' Toolkit

Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering

Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).

The library and its front-ends were designed and written by Andrew McCallum, with some contributions from several graduate and undergraduate students.

About the library

The library provides facilities for:

The library does not:

It is known to compile on most UNIX systems, including Linux, Solaris, SUNOS, Irix and HPUX. Over a year ago, it compiled on WindowsNT (with a GNU build environment); it doesn't do this any more, but probably could with small fixes. Patches to the code are most welcome. It is developed on a Linux system.

The code conforms to the GNU coding standards. It is released under the Library GNU Public License (LGPL).


You are welcome to use the code under the terms of the licence for research or commercial purposes, however please acknowledge its use with a citation:

Obtaining the Source

Source code for the library can be downloaded from this directory. Different versions are indicated by eight digit sequences that indicate year, month and day. Thus, the most recent version is the one with the largest version number.

Unfortunately I do not have time to help rainbow's many users with all their compilation and usage problems. Feel free to send me mail asking for help, but please do not necessarily expect me to have time to help. Most appreciated are bug reports accompanied by fixes.

Bow Library Front-Ends

Provided in the library source distribution, there are currently three executable programs based on the library.
