Visualization and Exploration of Large Multiple sequence Alignments

We wish to set protein sequence alignments free from the usual grid of letters visualization. As the number of sequences in an alignment grows, that representation becomes less useful for capturing all of the information in the data. Information is also lost when the amino acids are grouped based on one property or a particular combination of properties to create a single color scheme for the letter grid. We prefer to consider individual properties separately and simultaneously.

Consider this display of an alignment. Each column in the alignment is represented as a vertical histogram of amino acid property values, in this case hydrophobicity. The height of each bar represents the proportion of sequences with a given value. The color is scaled with the property value: red for hydrophobic, blue for hydrophilic.
Property Distributions by Position

Compare it to this view, which requires 11 screens (at 1280x1024) to see all of the data from any given column.
Grid of letters display

We also promised multiple simultaneous views. The screen below demonstrates this with a possible sample session. There are two separate property distribution displays, a display of the 3D structure of the protein, a scattergram plotting two features of each position and the grid of letters view for traditionalists. All of these views are linked as well - selecting a range of positions in one display will update the selection in all of the other displays as well.
Sample VELMA session screen capture

Have we piqued your interest? Click here to find out how to get the software up and running on your machine.


VELMA makes use of functions provided by the following open source libraries (no need to download; just giving credit where credit is due):