Previous Up Next

Chapter 12  Command Line Options and Batch Processing

Notung offers a command line interface (CLI) that can perform most operations from the command line without launching the graphical user interface. The CLI allows the use of batch processing to apply Notung to many trees in a large-scale analysis without human intervention. It can also be used to analyze a small number of trees without launching the GUI, for example, by a user executing Notung on a remote computer over the network. The GUI can also be launched from the command line, rather than by clicking on an icon, allowing the user to initiate the GUI with parameter settings other than than the default settings. Finally, when used as an applet, Notung is launched from a web page using CLI syntax.

We follow the following stylistic conventions in this chapter.

12.1  Opening and Using a Command Window/Terminal

Prior to running Notung’s command line interface, you will need to open a command or “terminal” window.

On Windows XP

Opening a command window

Click on the Start button, and select the “Run...” item. A dialog box will pop up. Enter “cmd.exe” into the box, and click “OK.”

Navigating to the Notung directory

In the command window, type the following
      cd   <pathname>
   
where <pathname> is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder:
 
      C:\Documents and Settings\User\Desktop\Notung-2.6
   
Then you should use quotes so that it looks like this in the command window:
 
      cd "C:\Documents and Settings\User\Desktop\Notung-2.6"
   
Hit Enter, and you will now be in the Notung Folder.
NOTE: To find the path of the Notung directory, select the Notung folder in Explorer, and right click on it. This will pop up a menu - select the Properties item. This will pop up a dialog listing the properties of the Notung folder, including its location.

On Windows Vista

Opening a command window in the Notung directory

Select the Notung folder in Explorer, and right click on it. This will pop up a menu - select “Start command window here.”

On Mac OS X

Opening a terminal

The Terminal application is located in the Applications folder in the Utilities subfolder.

Navigating to the Notung directory

In the terminal window, type the following
      cd <pathname>
   
where <pathname> is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder
      /Users/user/Desktop/New Folder/Notung-2.6
   
Then it should look like this in the terminal window
      cd "/Users/user/Desktop/New Folder/Notung-2.6"
   
Hit Enter, and you will now be in the Notung Folder.
NOTE: To find the path of the Notung directory, select the Notung folder in the Finder, and select “Get Info” from the File menu. This will pop up a dialog listing the properties of the Notung folder, including its location. You could also drag and drop the Notung folder into the Terminal window to paste the folder’s path into the window.

On Linux

Navigating to the Notung directory

In the terminal window, type the following
      cd  <pathname>
   
where <pathname> is the path of the Notung directory. If the folder location has any spaces in it, it must be enclosed in quotes. For example, if the following is the location of the Notung folder
 
      /Users/user/Desktop/New Folder/Notung-2.6
   
Then it should look like this in the terminal window
      cd "/Users/user/Desktop/New Folder/Notung-2.6"
   
Hit Enter, and you will now be in the Notung Folder.

12.2  Running Notung from the command line

Notung can carry out its four main tasks, reconcile, rearrange, rooting and resolve, from the command line. In each case, Notung reads in gene and species trees (the input trees) and executes the specified task, resulting in one or more modified trees (the output tree(s)). This modified tree is written to a file. Notung can also generate images in PNG format from the command line. This function can be carried out in conjunction with any of the four main tasks, or independently to generate an image of an existing tree without performing any analysis. The I/O requirements differ somewhat in the latter case; only one tree is required as input and an image rather than a tree file is generated as output. In this section, we discuss executing the four main tasks from the command line, postponing image generation to a later section. In Section 12.3 (Running Notung from a Batch File), automated execution of Notung is described. Commands and options specific to image generation are described in Section 12.4 (Saving PNG Images of Trees). Commands and options specific to reconciliation with non-binary species trees are described in Section 12.5 (Options for Reconciling with Non-Binary Trees).

For the four major tasks, Notung is executed from the command line using the following format:

   java -jar Notung-2.6.jar [input tree(s)] [task] [options]

The four main tasks require both a gene tree and a species tree. These are usually supplied as two separate input files. A single file containing a previously reconciled tree in Notung format is also acceptable, since such files contain both a gene tree and species tree. If a gene tree file containing a reconciled tree in Notung format and a species tree in a separate file are both given, the latter is used; the species tree in the gene tree file is ignored. The task parameter must be one of --reconcile, --rearrange, --root, and --resolve (the fifth task, --savepng, is discussed in Section 12.4.) Options are described below.

NOTE:

The following list describes Notung’s command line options. For more details on tree formats, including information on edge weights, species tags and output files, see Appendix A - File Formats.

Output options

Output Gene Tree(s)

If one of the four main functions is given, the output gene tree will be saved to a file called <genetree>.<function> (where <function> is one of the four major tasks, reconcile, rearrange, resolve, or rooting.) If the analysis results in more than one optimal history, then the output files are numbered, (e.g. <genetree.rearrange.0, <genetree.rearrange.1, etc.). By default only one tree is saved. To save more than one tree, use --maxtrees.

PNG Image of Tree (optional)

If the --savepng option is given, an image of the tree is saved in PNG format. For more information on saving PNG images with --savepng, see Section 12.4 - Saving PNG Images of Trees.

Pruned Species Tree (optional)

If the species tree contains species that do not appear in the gene tree, during reconciliation Notung constructs a pruned species tree that only contains those species required to reconcile the gene tree. If the --stpruned option is given, this pruned species tree is saved in the file <genetree>.<function>.species.

Log (optional)

When run on the command line, Notung outputs status information to the terminal window. This information can be saved in the log file <genetree>.<function>.ntglog by using the --log option. For a batch run, a log file is not saved for each tree; rather, a single log file for the entire batch run is saved to the file <batchfile>.<function>.ntglog.

General Tree Statistics (optional)

General tree statistics can be saved in the file <genetree>.<function>.stats by giving the option --treestats. This file includes information on both the gene tree and the pruned species tree. For more information on tree statistics, see Section 3.4 - General Tree Statistics.

Duplication Bounds and Loss Information (optional)

Information on the timing of each duplication and loss is saved in the file <genetree>.<function>.info when the --info option is used. For each duplication, an upper and lower bound (represented as nodes from the species tree) are given. For losses, each node in the species tree is listed with the number of losses associated with that taxon. For more information on duplications and losses, see Chapter 5 - Reconciliation Mode.

Ortholog/Paralog Tables (optional)

Notung can output tables of orthologs and paralogs for all pairs of leaf nodes in the reconciled tree. This table can be generated in several formats: comma-separated values (CSV), tab-delimited values, or an html-formatted table. Use options --homologtablecsv, --homologtabletabs or --homologtablehtml, respectively. For more information on orthologs and paralogs, see Section 5.3 - Inferring Orthologs and Paralogs.

File Input

-g <genetree>
 

Load the file <genetree> as a gene tree. NOTE: The -g is optional.

-s <speciestree>
 

Load the file <speciestree> as a species tree. The -s is required.

-b <batchfile>
 

Load the trees listed in <batchfile>. Requires that the --speciestag option be set. If rearranging, requires the --edgeweights and --threshold options. With this option, -g <genetree> and -s <speciestree> should not be specified. See Section 12.3 - Running Notung from a Batch File for more information.

-absfilenames
 

Files listed in <batchfile> use absolute paths. See Chapter 12.3 - Running Notung from a Batch File for more information.

-gu <gene tree URL location>
 

Load gene tree from a URL. This option is only used when running Notung as an applet.

-su <species tree URL location>
 

Load species tree from a URL. This option is only used when running Notung as an applet.

Tasks

--reconcile
 

Reconcile a gene tree with a species tree. In batch mode, --speciestag is required. For more information on reconciliation, see Chapter 5 - Reconciliation Mode.

--rearrange
 

Rearrange the gene tree. The option --threshold must be set. In batch mode, --speciestag and --edgeweights are also required. For more information on rearranging gene trees, see Chapter 7 - Rearrange Mode.

--resolve
 

This task, which removes polytomies from a non-binary tree, can only be carried out if the gene tree is non-binary. In batch mode, --speciestag is required. For more information on resolving non-binary nodes in a gene tree, see Chapter 8 - Resolve Mode.

--root
 

Root the gene tree. The top <maxtrees> best scoring rooted trees are saved in files named <genetree>.rooting.#. By default, <maxtrees> is set to 1. In batch mode, --speciestag is required. For more information on rooting gene trees, see Chapter 6 - Rooting Mode.

Duplication and Loss Parameters

--costdup <duplication cost>
 

Sets the cost of gene duplications. If not set, the cost is set to 1.5, by default.

--costconddup <conditional duplication cost>
 

Sets the cost of conditional gene duplications. These only occur when reconciling a binary gene tree with a non-binary species tree. If not set, the cost is set to zero, by default. See Chapter 5 - Reconciliation Mode for more information.

--costloss <lost gene cost>
 

Sets the cost of gene losses. If not set, the default cost of 1.0 is used.

Input Data Options

--speciestag [prefix|postfix|nhx]
 

Indicates the format of species tags in the gene tree. If not set, Notung tries to guess the correct format. See Appendix A.4 - Specifying the Species Associated with Each Gene.

--threshold <threshold>|<percentage>%
 

Edges with weight higher than <threshold> are preserved during rearrangement. This can be given as an absolute value or or as a percentage of the maximum value, using <percentage>%; e.g.--threshold 90%” sets the threshold at 90 percent of the highest edge weight in the tree. See Section 3.5 - Parameter Values for more information.

--edgeweights [name|length|nhx]
 

Indicates where in the tree file the edge weights, if any, are specified. If this option is not set, and the gene tree has values in more than one location, Notung will guess the location of edge weights when using --rearrange. See Appendix A.6 - Location of Edge Weight Values for more information.

--bootstraps [name|length|nhx]
 

Same setting as --edgeweights. Kept for backwards compatibility.

--annotationfile <filename>
 

Attach the given annotation file to each input tree.

--imagemap <filename>
 

Used with --savepng. Notung uses the contents of <filename> to create an image map file, which is saved in <outputtreename>.png.html. For more information, see Section 12.4 - Saving PNG Images of Trees.

Output Options

--treeoutput [newick|notung|nhx]
 

Specify output tree file format. See Appendix A - File Formats for more information.

--nolosses
 

Remove loss nodes from gene trees before they are saved. Useful when outputting tree in Newick or NHX formats, which do not recognize loss nodes, or with --savepng to output a tree image without loss nodes.

--maxtrees <maxtrees>
 

Maximum number of optimal trees to output during reconciliation, rearrangement, rooting, and resolving. Default is one.

--outputdir <outputDir>
 

Save output files in the directory, <outputDir>. Default is the current working directory.

--usegenedir
 

Save output trees in the directory in which <genetree> is located.

--log
 

Writes diagnostic output to the file <genetree>.<function>.ntglog, where <function> is one of the four modes. For batch runs, the log file is saved in <batchfile>.<function>.ntglog.

--info
 

Save information on duplications and losses in the file <genetree>.<function>.info.

--treestats
 

Save general statistics for a tree. Saved in <genetree>.<function>.stats. Statistics on the pruned species tree will be included in this file. See Section 3.4 - General Tree Statistics for more information.

--stpruned
 

Save a version of the species tree that contains only the species found in the gene tree. Saved in the file <genetree>.<function>.species.

--rootscores
 

Report a list of ordered root scores to standard output (only used with --root). This option is useful for statistical examination of root scores for the gene tree. These scores can be saved in a file with the --log option.

--silent
 

Suppresses reporting of diagnostic information to the terminal.

--progressbar
 

In batch mode, print a simple progress bar to stderr for each tree analyzed. Useful with –silent.

--savepng
 

Save the tree as a PNG image. Unlike Notung’s other main functions, this function does not require a species tree. For more information about --savepng, see Section 12.4 - Saving PNG Images of Trees.

Ortholog / Paralog Tables

For more information on orthologs and paralogs, see Section 5.3 - Inferring Orthologs and Paralogs.

--homologtablecsv
 

Save a comma separated table of orthologs and paralogs to the file
<genetreename>.<function>.homologs.csv.

--homologtabletabs
 

Save a tab-delimited table of orthologs and paralogs to the file
<genetreename>.<function>.homologs.tabs.

--homologtablehtml
 

Save a table of orthologs and paralogs in html format to the file
<genetreename>.<function>.homologs.html. This format can be included in a a web page.

Display Options

--show-species-tree
 

GUI only: if an input gene tree is reconciled, open the attached species tree in a separate tab. Useful for displaying Notung format trees in the Notung applet.

--homologgui
 

GUI only: if an input gene tree is reconciled, start Notung in the Reconciliation tab with the Orthologs/Paralogs button selected. Useful for ortholog / paralog analysis in the Notung applet.

Help Message

--help
 

Print information about these options.

12.3  Running Notung from a Batch File

Batch processing allows the user to apply Notung to many trees in a large-scale, automated analysis. The input trees are given in a batch file, which consists of a list of tree file names, one per line. Blank lines and lines which start with # are ignored.

To create a batch file:

A sample batch file is provided with the Notung 2.6 distribution in the sampleTrees/batch directory. This batch file includes all combinations of binary and non-binary gene and species trees. Because not all of Notung’s task modes work for each of these combinations, you will receive one or more warnings and errors when running this batch file. In addition, the batch file lists a gene tree which does not exist, to give an example of the appropriate warning.

To run Notung from a batch file:

Use the -b <batchfile> option.

For example, from the Notung directory, enter the following on the command line:

java -jar Notung-2.6.jar -b sampleTrees/batch/batch.run --reconcile --speciestag prefix

The --reconcile option tells Notung to reconcile all the gene trees listed in batch.run with the species tree listed in batch.run. The --speciestag prefix option tells Notung how species labels are specified in the gene tree files, and is required in batch mode. See Appendix A.4 - Specifying the Species Associated with Each Gene for more information on species labels.

NOTE: All gene trees in the same batch file must use the same species tag format, which is specified using the --speciestag option.

Required Options

In batch mode, the --speciestag option is always required. In addition, when using --rearrange, --edgeweights and --threshold must be used to set the edge weight locations and threshold, respectively.

Batch Output

As Notung reads and processes each gene tree in the batch file, it prints diagnostic information to the terminal. Notung will also print this information to a log file when the --log option is given. Any errors that occur in the processing of a batch file are reported to the terminal as they occur. The total number of errors is reported at the end of the batch run.

To print status information to a file:

Use the --log option from the command line. The information will then be written to the file <batch_file_name>.ntglog.

To save trees to a different directory:

By default, Notung saves each reconciled tree to the directory from which the program was run.

Progress Bar

For long runs, it may be convenient to use the options --silent and --progressbar together. This will suppress all output to the terminal with the exception of a simple progress bar to stderr. The option --log can still be used to save the (now suppressed) output to a file.

12.4  Saving PNG Images of Trees

The option --savepng saves a simple image representation of a tree in PNG format. The option --savepng can be used with one of the four main tasks (--reconcile, --root, --rearrange and --resolve), in which case an image of the final output tree is saved, in addition to the output tree file. This behavior is similar to other output options such as --treestats and --homologtablecsv. Alternatively, --savepng can be used alone to save an image of a tree without performing any other tasks.

Using --savepng alone

When --savepng is used without one of the main four tasks, Notung reads in a tree and generates and saves an image of that tree in PNG format. Unless a batch file is used, only a single tree can be processed at a time (i.e., a gene tree and a species tree cannot both be given). If the input tree is a previously reconciled tree in Notung format, the image will show the appropriate duplications and losses (to save an image without losses, use --nolosses). If the tree has not been reconciled, the tree image will show only the structure of the tree and the names of the leaves of the tree.

When using a batch file, each tree specified in the file is saved as an image. When generating images without performing a major task, the batch file format format differs slightly: Species trees and gene trees can be listed in any order.

Output File Names

When --savepng is used alone, an image of the input tree is saved in the file <treename>.png. When used with --reconcile, --root, --rearrange or --resolve, an image of the output tree is saved in the file <genetreename>.<function>.png. For analyses with more than one optimal history, an image file is saved for each history. The number of files is limted by the parameter --maxtrees.

Color Annotations

If a tree in Notung format contains color annotations, the leaves in images of that tree will be colored as specified by those annotations. Additionally, an annotation file can be specified with the option --annotationfile. For more information on color annotations, see Chapter 10 - Annotations.

Making an Imagemap

Notung provides the option to produce an html imagemap for a tree image. If an imagemap and image file are both included in a web page, each gene in the image will provide a link to a specified web page. The format of these links is determined by the imagemap specification file given with --imagemapfile <imagemapfilename>, described below. The resulting imagemap is saved in the file <outputtreename>.png.html, where <outputtreename> is either <genetree>.<function> or <treename>.

To include the image and imagemap in a web page, insert the entire contents of the saved imagemap file into the html of the web page. The saved image must be in the same directory as the web page, unless you specify a different location for the image by changing <imagefile> in the line:

<img border=0 src='<imagefile>' ...

Imagemap Specification

The specification file given by --imagemapfile <imagemapfilename> consists of a list of gene/link pairs. Blank lines and lines that start with # are ignored. An example specification file:

# Danio rerio links:
gene: Danio_rerio|(id)
link: http://zfin.org/cgi-bin/ZFIN_jump?record=(id)

# generic imagemap - everything else links to google
gene: (id)
link: http://www.google.com/search?q=(id)
  

Lines starting with ‘gene:’ match genes in the gene tree; lines starting with ‘link:’ specify the format of links for those genes. For each gene in the gene tree, the first gene/link pair that matches will be used. If a gene does not match any of the ‘gene:’ lines, a warning will be printed.

The identifier ‘(id)’ will match any text string, and that text string is used in the link. Any other text present in the ‘gene:’ line must match gene names exactly. In the example above, the gene Danio_rerio|ZDB-GENE-031007-1 would match the first ‘gene:’ line. The identifier (id) would be ZDB-GENE-031007-1, and the link would be
http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031007-1. The gene Homo_sapiens|gene1 would match the second pair, because ‘(id)’ will match any text string. The resulting link would be
http://www.google.com/search?q=Homo_sapiens|gene1.

An example gene tree and imagemap specification from the Princeton Protein Orthology Database (http://ortholog.princeton.edu/) are included in the Notung distribution.

12.5  Options for Reconciling with Non-Binary Trees

When inferring losses during reconciliation with a non-binary species tree, it is not possible to determine unambiguously the edge in the the gene tree to which a loss should be assigned. Notung uses two different methods to deal with this problem. An exact algorithm finds all possible assignments that minimize the total number of losses but has exponential time complexity. A heuristic, which runs in polynomial time, is not guaranteed to find the optimal assignment, but usually does in practice. These issues and algorithms are discussed in detail in Section 4 (Non-Binary Trees).

Only the heuristic is implemented in the GUI. Either method may be used when executing Notung from the command line. The CLI runs the heuristic by default. To use the exact algorithm, include the --exact-losses option when running Notung from the command line with the --reconcile or --root tasks.

The running time of the exact algorithm is exponential in the size of the largest polytomy. Even when --exact-losses is used, Notung does not apply the exact algorithm to polytomies with more than 12 children. Instead, the heuristic is applied to these polytomies. To change the maximum polytomy size for which Notung uses the exact algorithm, use the --polytomy-cutoff <maxPolytomySize> option when including the --exact-losses option in the command line.

NOTE: Changing the polytomy cut-off to a larger value and using the exact algorithm on a species tree with a polytomy with more than 12 children may greatly increase running time.
Command Line Options for Losses with Non-Binary Species Trees
--exact-losses
 

Computes the minimum number of losses when reconciling a binary gene tree with a non-binary species tree. If this option is not included on the command line, the heuristic used. NOTE: In Notung 2.5, this option was named --combine-losses.

--polytomy-cutoff <maxPolytomySize>
 

Using this option with --exact-losses will change the default value for polytomy cut-off. Only for losses associated with polytomies less than or equal to <maxPolytomySize> will the exact algorithm be used. The default value is 12. If a polytomy greater than <maxPolytomySize> is encountered, a warning will be printed to the terminal window and/or log file.

--report-heuristic-losses
 

When run with --exact-losses, this option will report both the number of losses obtained with the heuristic and with the exact algorithm. This is useful for determining whether the heuristic is overestimating the number of losses and by how much. NOTE: In Notung 2.5, this option was named --report-explicit-losses.


Previous Up Next