Data analysis hints

Selecting sequences

Multiple Sequence Alignment

  • A good alignment is essential to obtain a good tree. In addition to Muscle and ClustalW2, try other alignment programs. T-coffee and Mafft are available through the EBI webserver. ProbCons is considered one of the most accurate, so try that if their webserver is up. Inspect the resulting alignments and choose the one that seems best for your family as a starting point for manual refinement. No one program works best on all families. You may also find that one program does well with the N terminal region and a different program works well with the C terminal region. In this case, split your sequences in half, align each separately, and then combine the alignments in the editing step.

  • In some cases, you may get better results by partitioning the sequences into two subsets, aligning the sets separately, and then aligning the alignments. Clustal is one program that can align alignments. The initial alignment of the subsets can be carried out in any program (I think.)

  • Check your alignment(s) against what has been reported about conserved features of your family in the literature. If there is a published structure, make use of that, as well. Use these features (1) to decide which program gave you the best alignment and (2) to improve your alignment through manual editing (e.g., in GeneDoc). For a serious, publishable analysis of a small number of families, you should always plan to include manual refinement of the multiple sequence alignment in your data analysis plan.

  • For some projects, MEME will be useful. For others, less so. If you have strong conservation throughout your alignment, you may not need MEME to guide your alignment. If you have weak conservation or big insertions and deletions in your data, MEME can be very helpful.

  • Trimming: Multiple alignments should be trimmed before submitting them to tree reconstruction programs. Most of the trimming should take place after manual refinement. However, if you have sequences with unalignable regions, you may want to do some preliminary trimming before alignment. For example, if some of your sequences have a long string of leading or trailing repeats, it may be better to remove those first.
    Return to course homepage

    Last modified: November 2, 2011. Maintained by Dannie Durand (durand@cs.cmu.edu).