Previous Up Next

Chapter 3  Getting Started

Notung is a tool for comparing gene and species trees. Notung takes tree files as input and allows users to refine and manipulate them. The modified trees can be saved as output. The following subsections introduce basic input and output in Notung, general tree statistics, the graphical user interface, and the parameter values used in Notung’s tree refinement tasks.

3.1  Gene and Species Trees

To perform its functions, Notung requires a gene tree and a species tree. The species tree must contain all the species from which genes in the gene tree were sampled. The species tree may contain additional species as well - these will be ignored. A correspondence between the leaves of the species and gene trees is determined by comparing the leaf labels in the gene and species trees: each leaf label in the gene tree must include a substring that specifies the species from which the gene was sampled. Trees may be provided in Newick, NHX, or Notung format. See Appendix A - File Formats for further information.

Notung can operate on a non-binary gene tree or a non-binary species tree. However, its functions cannot be performed when both the gene tree and corresponding species tree are non-binary. For a complete summary of functions that Notung can perform, see Table 1.1.

NOTE: If you are interested in using Notung to analyze non-binary trees, see Chapter 4 - Non-Binary Trees for more a more detailed and theoretical discussion on non-binary trees.

Species Trees

The species tree must be rooted, with leaf nodes labeled with species names. Internal nodes may be given taxonomic labels (e.g., “tetrapoda”), but this is not required. If the internal nodes are not labeled, Notung will assign alphanumeric labels (such as n1, n2, etc.). If the species tree has edge weights or branch lengths, this information will be ignored. For more information on species names, see Appendix A.4 - Specifying the Species Associated with Each Gene.

The tasks that Notung performs are based on the assumption that the user has selected a species tree that is a reliable representation of the true species relationships. Using Notung with an incorrect species tree will give incorrect results. For more information on selecting an appropriate species tree, see Chapter B - Building a Species Tree.

Gene Trees

In order to perform its reconcile, rearrange and resolve functions, Notung requires a rooted gene tree. If the gene tree is not rooted, Notung can be used to root the gene tree. See Chapter 6 - Rooting Mode. The leaf nodes in the gene tree must be labeled with a unique identifier specifying the gene, as well as the species from which the gene was sampled. See Appendix A.4 - Specifying the Species Associated with Each Gene for more information. The internal nodes may be labeled. If the internal nodes are not labeled, Notung will assign alphanumeric labels (e.g. n5, n6, etc.).

In Rearrangement mode, Notung requires that the tree have edge weights. These are used to identify edges that are weakly supported and may be rearranged. These weights may be bootstrap values, posterior probabilities, edge lengths, or any other weighting scheme selected by the user. Several different fields in the Newick and NHX formats may be used to store edge weights. See Appendix A - File Formats for a detailed explanation of these formats and how to indicate to Notung which field is being used for edge weights in a particular input tree.

Unrooted binary gene trees

Many tree reconstruction programs represent an unrooted binary tree as a mostly binary tree, with a single trifurcation at the root. Unless a root is selected for these trees (in Notung or another program), Notung will incorrectly treat them as rooted non-binary trees. If such a tree is actually an unrooted binary tree, failing to root it will affect Notung’s diagnostics. See Chapter 6 - Rooting Mode for more information on rooting gene trees.

3.2  The Graphical User Interface

Notung’s graphical interface facilitates tree visualization and manipulation, enabling the user to inspect duplicated and lost nodes in a tree, view orthologs and paralogs, visualize alternate optimal trees, and color annotate genes for visual differentiation or presentation.

To run Notung:

Using the graphical user interface on Windows or Mac OS X:

Using the graphical user interface on Linux:

In addition, Notung can perform many of its operations from the command line without launching the GUI. See Chapter 12 - Command Line Options and Batch Processing for a description of the command line interface.)

When Notung is first launched, the program window will be blank. Figure 3.1a and Figure 3.1b show Notung’s graphical interface once a gene tree and species tree have been opened. Notung’s graphical user interface has the following components:

Tree panel: The tree that is currently selected appears in the tree panel. Trees are rendered with the root at left and leaf nodes at right. Nodes are denoted by small blue squares in the tree. Edge weights and leaf node names appear in the tree by default. Notung fits the whole tree in the tree panel by default. The size of the tree and tree labels can be modified using the Zoom and Fonts menus, respectively. See Chapter 11 - Changing the Appearance of the Tree Panel.

Click on image to see larger version


Figure 3.1: Notung’s graphical user interface displaying (a) a gene tree, and (b) a species tree. The tree panel is highlighted in red, the task panel in blue, and the parameters panel in yellow. Only the tree panel and the task panel are applicable to species trees.

Although multiple trees can be open in Notung at once, Notung operates on only one tree at a time. To facilitate working with many trees, Notung marks each open tree with a tab at the top of the tree panel. Clicking on a tab selects the corresponding tree. Tabs are labeled with the file name and special icons to identify them as a gene or species tree - a DNA helix for gene trees, and a cartoon of the evolution of humankind for species trees (see Figure 3.2).

Click on image to see larger version


Figure 3.2: Tree tabs for a gene tree (left) and a species tree (right)

Task panel: Operations on the tree are performed in the task panel (highlighted in blue in Figure 3.1). Tabs at the top of the task panel correspond to the various tasks that Notung can perform. Clicking on a tab puts Notung in the corresponding task mode, revealing the buttons that control tasks specific to that mode. If a gene tree is selected, six modes are available: History, Reconciliation, Rooting, Rearrange, Resolve, and Annotations. Only the History and Annotation modes can be used when a species tree is selected.

Parameter values: When a gene tree is selected, a box displaying the Edge Weight Threshold and Costs/Weights for Duplications, Conditional Duplications, and Losses appears in the bottom-right corner of the program window. These values can be changed by clicking the “Edit Values” button directly below them. Note that when a species tree is selected, the program window will not display the parameter values.

3.3  File Menu & Opening and Saving Trees

Notung can read and save tree files in Newick, NHX, and Notung file formats. NHX and Notung file formats are extensions of Newick; See Appendix A - File Formats for details. Notung can also save the image in the tree panel as a Portable Network Graphic (PNG) file.

To open trees:

  1. Click “File Open Gene Tree” or “File Open Species Tree.
  2. In the Open dialog box, select a tree file and click “Open.”
    NOTE: Notung cannot distinguish gene trees from species trees automatically. If a gene tree is opened as a species tree, or a species tree is opened as a gene tree, reconciliation will produce incorrect results.

To save trees:

  1. Click “File Save As.
  2. In the drop-down menu, “Files of Type,” select one of the following formats:
  3. Click “Save.”
    NOTE: The default format for saving trees is the Notung File Format. If you have modified the tree in Notung and wish to reopen this tree in Notung, it may be best to save the tree in Notung format. If you wish to reopen the modified tree in another tree program, Newick format may be a better option.

To view text formatted trees in a dialog box:

  1. Click “File View Tree in Text Format.
  2. In the drop-down menu, select one of the following formats:

    To copy this information, click the “Copy to clipboard” button. This text can then be pasted in any text editor.

  3. When finished reviewing this information, close this window to continue using Notung.
    NOTE: Selecting “About Tree Formats” from the drop-down menu will provide a dialog box containing a summary on the different tree formats. See Appendix A - File Formats for more information.

To save the current view of a tree as a PNG file:

To save an image of the whole tree as a PNG file:

To print an image of a tree:

  1. Click “File Print Current View.”
  2. The print dialog box will appear. Change the settings as necessary and click “Print.”
    NOTE: For most printers the default page layout will be portrait; however, the landscape layout is usually preferred for printing trees from Notung. You may wish to change your printer settings before printing.
  3. A red rectangle will appear in the tree panel. Only the view inside this rectangle will be printed.
  4. To proceed with printing, click “Print.”
  5. If you wish to change the printer’s settings or the size of the tree, click “Cancel.” The red rectangle will disappear and the appearance of the tree can be manipulated.
    NOTE: Printing a view of the tree that shows exactly what you want may be difficult as it may be necessary to change both the printer’s settings (i.e. page layout, margins, etc.) and the appearance of the tree so that the desired print area fits within the red rectangle. See Chapter 11.2 - Zoom for more information on zooming in and out of the tree. It may be easier to obtain the desired view by first saving the tree as a PNG image, and then editing and printing that image using another program.

To reload a tree:

To export color annotations to a file:

  1. Click “File Export Annotations.”
  2. Provide a file name and click “Save.”
    NOTE: Exported annotations can be imported into other trees, or loaded on the command line using the option --annotationfile. For more information about color annotations, see Chapter 10 - Annotations.

To import color annotations from a file:

  1. Click “File Import Annotations.”
  2. Select the desired annotations file and click “Open.”
    NOTE: Annotations can be imported from previously exported annotations files. Additionally, selecting a Notung format tree which contains annotations will import annotations from that tree. Annotations can also be loaded via the command line using the option --annotationfile. For more information about color annotations, see Chapter 10 - Annotations.

To close trees:

  1. Select the tree to close.
  2. Click “File Close.”

To quit Notung:

3.4  General Tree Statistics

Notung compiles information on tree characteristics, such as height, number of leaves, number of nodes, etc. Notung reports this information in the general tree statistics box under the “About This Tree” menu. The properties examined depend on whether the given tree is a gene tree or a species tree, and whether the gene tree has been reconciled or not. A description of the possible information displayed is described below.

For all trees

Total nodes:
the total number of nodes.
Internal nodes:
the total number of internal nodes (Total nodes minus Leaf nodes).
Leaf nodes:
the total number of leaves.
Polytomies:
the total number of polytomies in the tree. This number will be zero if the tree is binary.
Size of largest polytomy:
the number of children of the largest polytomy in the tree. This number will be zero if the tree is binary.
Height:
the maximum path length from a leaf node to the root.

Figure 3.3 shows an example of the tree statistics provided for a species tree.

Click on image to see larger version


Figure 3.3: General tree statistics for a species tree.

For gene trees, but not for species trees

Edge Weight Range:
the range of edge weights in the gene tree in the form, [minimum edge weight, maximum edge weight].

For reconciled gene trees

Under the heading Reconciliation Information:

Duplications:
the total number of duplications in the reconciled gene tree.
Conditional Duplications:
the number of conditional duplications in the reconciled gene tree. This number will be zero if the associated species tree is binary or there are no conditional duplications. See Chapter 4 - Non-Binary Trees for more information on conditional duplications.
Losses:
the total number of losses in the reconciled gene tree.

Statistics about the topology of the tree (number of leaf nodes, number of internal nodes, etc.) are reported twice: once for the gene tree without losses, and once for the tree with losses.

In addition, the species tree used for reconciliation will be reported, as well as simple statistics for the pruned species tree. Figure 3.4 shows an example of the tree statistics displayed for a reconciled gene tree.

Click on image to see larger version


Figure 3.4: General tree statistics for a reconciled gene tree.

To get general statistics for a tree:

NOTE: Information on duplication bounds and losses can also be gathered through the About This Tree Menu with Duplication Bounds and Loss Counts. For more information on duplication bounds, see Chapter 12.2 - Duplication Bounds and Loss Information.

3.5  Parameter Values

The parameter values used in Notung - the Edge Weight Threshold, Duplication Cost, Conditional Duplication Cost, and Loss Cost - can be specified by the user. These values influence the results produced by Notung’s tasks.

Notung uses a Duplication/Loss Score to score reconciled trees and evaluate alternate hypotheses. The D/L Score is defined to be: cL L + cD D + cC C where L is the number of losses, D is the number of duplications and C is the number of conditional duplications implied by the current reconciliation. The loss cost, cL, duplication cost, cD, and conditional duplication cost, cC reflect the relative importance of losses, duplications, and conditional duplications in scoring the tree. The cost of conditional duplications is only relevant when reconciling a gene tree with a non-binary species tree (see Chapter 4 - Non-Binary Trees). The default values are 1.0 for losses, 1.5 for duplications, and no cost for conditional duplications, but these values can be changed by the user. Notung displays the D/L Score of a reconciled tree, as well as the number of losses, duplications, and conditional duplications, in the bottom-left corner of the program window (see Figure 3.5).

Click on image to see larger version


Figure 3.5: If the gene tree has been reconciled, the D/L Score, the number of duplications, conditional duplications and losses, and the species tree used to reconcile it appear at the bottom of the program window.

The Edge Weight Threshold is a parameter used to define the set of strong edges in the gene tree. In Rearrange mode, edges weighted below the Edge Weight Threshold are considered weak and may be rearranged (for more information about rearrangement, see Chapter 7 - Rearrange Mode). Edges with no weight specified are assigned an edge weight of zero, and are considered to be weak. The default threshold is 90% of the highest edge weight in the gene tree file. If no edge weights are found, the threshold is set to one. The user may change this cutoff if a different threshold is desired for the current data set.

NOTE: For some sources of edge weights, such as bootstrap values, setting the threshold to a percentage of the highest edge weight works well. For other sources, such as branch lengths, where a single very large value could cause all other edges in the tree to be weak, it may be better to set the threshold with a fixed, minimum value.

To change the parameter values:

  1. Click the “Edit Values” button. A dialog box appears.
  2. Enter the appropriate values in the text field, and then click “Apply Changes.
    NOTE: This will change the value settings only for the gene tree that is currently selected. Also, each history state saves the parameter values used at that state; when moving through the history, parameter values may change depending on the state and tree viewed. For more information on history states, see Chapter 9 - History.

Previous Up Next