How to use the SLIF Text Components

Invocation and basic options

The SLIF text components are distributed as single large JAR file. To run it you will need a copy of Java. A typical invocation would be

% java -cp slifTextComponents.jar -Xmx500M SlifTextComponent -labels DIR -saveAs FILE -use COMPONENT1,COMPONENT2,.... [OPTIONS]

where -Xmx500M allocates additional memory for the Java heap, and the additional arguments are as follows:

The components available are:

The Minorthird format for stand-off annotation

The format for output is the one used by Minorthird. Specifically, the output (in the default format) is a series of lines in one of these formats:



Other options

-help Gives brief command line help
-gui Pops up a window that allows you to interactively fill in the other arguments, monitor the execution of the annotation process, etc.
-showLabels Pops up a window that displays the set of documents being labeled. (This is not recommended for a large document collection, due to memory usage.)
-showResult Pops up a window that displays the result of the annotation. (Again, not recommended for a large document collection.)
-format strings Outputs results as a tab-separated table, instead of minorthird format. The first column summarizes the type of the span, the file the span was taken from, and the start and end byte positions, in a colon-separated format. (E.g., "cellLine:p11029059-fig_4_1:1293:1303".) The remaining column(s) are the text that is contained in the span (e.g., "HeLa cells", for the span above) almost exactly as it appears in the document; the only change is that newlines are replaced with spaces.



A number of people have contributed to these tools, including William Cohen, Zhenzhen Kou, Quinten Mercer, Robert Murphy, Richard Wang, and other members of the SLIF team. The initial development of these tools was supported by grant 017396 from the Commonwealth of Pennsylvania Tobacco Settlement Fund. Further development is supported by National Institutes of Health grant R01 GM078622.