Parana 2 (version 0.1 Alpha)

Note: We're currently working migrating this project to Github and working on an easier-to-compile release which should be available shortly.


This software accompanies ISMB 2013 publication

"Predicting protein interactions via parsimonious network history inference"

The latest source code is available at the GitHub repository. Note, however, that this (updated) source code has not been pre-compiled for any platform, and the interface has expanded a bit from earlier versions.

Installation

This software uses the CMake build system. It depends on a C++11 compatible (e.g. G++-4.7) compiler and the following external libraries:

Bio++
Boost
GMP
MPFR
pugixml

From the top level directory, one should execute the following commands (> designates the prompt):

> mkdir build
> cd build
> cmake ..
> make
> mv bin ..

Any errors are likely the result of a missing library. The last line moves the binary directory (containing the parana2 executable) to the same relative path it's in in the binary release.

Running the program

The current version of the program exposes a number of options that are not fully implemented or which may be removed in the final version. A typical invocation of the program looks something like this:

> ./parana2 pars -u -m [costs] -t [target network] -d [dup. hist] -o [output file]

The target network

The target network should be in the NetworkX adjacency list format. The comment character "#" is respected, and the file should list nodes with no incident edges on lines by themselves.

The duplication history

The duplication history should be in PhyloXML format (the NHX format will be supported shortly). Networks currently in the NHX format (specifically those that result from Notung) can be easily converted to NHX format using the phyloXML converter. For example, if one currently has a file "tree.ntg" (a NHX format tree that is the result of a gene-species tree reconciliation using Notung), she can obtain the appropriate phyloXML format file (assuming forester.jar is in the current directory) by running the command:

java -cp forester.jar org.forester.application.phyloxml_converter -f=nn tree.ntg tree.xml

This will output an appropriately formated phyloXML file, "tree.xml".

The output file

The output file is a list of posterior scores for all potential edges (for those with scores > 0). The format is very simple, each line is given as follows:

p1  p2  et  s

Where p1 and p2 are the names of two proteins, et is the edge-type (currently, this is just 'b'), and s is the score assigned to this edge.

Additionally

Additional arguments are supported which affect the scores of different ancestral histories (e.g. the branch-length penalty and the ratio of cost of edge creation to deletion). These parameters are briefly documented with the program's help option (which can be viewd by running ./parana2 pars -h). The following parameters all currently work:


-m [--parsimonyCosts]
-b [--beta] (called gamma in the paper)
-k [--numOpt]
-p [--timePenalty]