Download the sourcecode for version 2.0 of the toolkit. It may be installed on a unix machine or using Cygwin on windows platform. Download the sourcecode and unzip and untar the file using the following commands:
gunzip blmt_v1.0.tar.gz
tar -xvf blmt_v1.0.tar
cd blmt
make
1. Compilation of programs
There is a makefile (called "makefile") in the Final directory.
[
Tutorial help for non-computer scientists:
-c creates an object file
*.o are the object files
-o links two or more object files to create the executable (o is output)
]
The following commands will be applicable to makefile:
Global compilation of all the programs in the toolkit:
make [or make all]: removes all *.o files and compiles all programs.
make clean-all: removes all *.o files and all executables [So that you can re-compile them afresh].
make clean: removes only *.o files
Each individual program can also be compiled separately:
make faa2srt: [Compiles the faa2srt.cpp file and creates the faa2srt executable. Note that this does not remove *.o files before. *.o files need to be removed anytime there is a change to the C-code and you want that to be updated].
make srt2lcp
make ngrams
make proteinCount
make proteinNGram
make yule
make map2srt
make langmodel
make wcngrams
Usage of the programs (input/output options etc)
./programname -help shows the different options that go with the program called programName
Example usage (see below): ./ngrams -fsrt bb.faa.srt -flcp bb.faa.lcp -n 5 -printall -sortc
faa2srt: Creates a Suffix Array from a Fasta format Genome file.
./faa2srt -help
Usage:
./faa2srt
-ffaa
-fsrt
-help display this help message
Example usage: ./faa2srt -ffaa human.faa
Note:
For long genomes, you have to adjust the maximum length of the genome to suit your file: In mylib.h find SUPERLEN 12000000, change to larger value if needed.
srt2lcp: Creates the Least Common Prefix (LCP) and Rank arrays corresponding to a Suffix Array.
./srt2lcp -help
Usage:
./srt2lcp
-fsrt
-flcp
-frnk
-help display this help message
Example: ./srt2lcp -fsrt human.faa.srt
ngrams:Finds the various n-grams occuring in a Genome and also the number of times that a particular n-gram occurs. Also computes listing the n-grams in descending order of their number of occurances. Prints out counts of n-grams alone (without the n-gram itself), to allow the output to be used easily by other programs (plots?)
./ngrams -help
Usage:
./ngrams
./ngrams
-fsrt
-flcp
-fngrams