Genome Analysis Program : 
   Multiple Sequence Alignment by Parallel Simulated Annealing

1. What is multiple sequence alignment?

Biologists often align DNA and protein sequences in order to determine
how similar they are. DNA is a chain of four kinds of nucleic acids
and a protein is a chain of twenty kinds of amino acids, which are
translated from a chain of nucleic acids. Strong similarities between
sequences may result from a common evolutionary relationship, and
these sequences may have almost same function.

Figure 1 shows a typical multiple sequence alignment. Twelve fractions
of enzyme proteins are aligned. Each letter stands for an amino acid:
D is aspartic acid, R is arginine, H is histidine, and P is proline. A
good alignment has same or similar amino acids in each column.  To
make an alignment good, each sequence is shifted or gaps (dash
characters) are inserted into the sequence.

2. Simulated annealing algorithm

In many important practical problems, a solution is an arrangement of
a set of discrete objects according to a given set of constraints.
Such problems are typically known as combinatorial problems. The set
of all solutions is referred to as the solution space and an energy
function is defined for all solutions. To solve a combinatorial problem
is to find a minimum-energy spot in the solution space.

A general strategy to search in the space is the method of `iterative
improvement'. The method requires a set of moves that can be used to
modify a solution. One starts with an initial solution and examines
its moves until a neighboring solution with a lower energy is
discovered. The neighbor becomes the new solution and the process is
continued to examine the neighbors of the new solution.  This
iteration terminates when it arrives at a spot that has locally
minimum energy.

Simulated annealing algorithm is an extension of the method of
iterative improvement based on an analogy between a combinatorial
problem and the problem of determining the ground state of a physical
system. To bring a fluid to a highly ordered state like a single
crystal, a process called `annealing' can be employed. We first melt
the system by heating it to a high temperature, then cool it
slowly, spending a long time at temperatures in the vicinity of the
freezing point. Kirkpatrick et al suggested that better results to
combinatorial problems can be obtained by simulating the annealing
process of physical systems.

3. Multiple alignment as a combinatorial problem

There may be some ways to formulate multiple sequence alignment as a
combinatorial problem. Kanehisa, a professor at Kyoto university,
developed an ingenious formulation in order to solve multiple
alignment problems by simulated annealing algorithm. We adopt his
formulation.

Kanehisa's idea is as follows. First, we make an initial alignment by
adding a number of gaps to both head and tail of each sequence. 
To modify the alignment, we focus on one sequence in the
alignment and select a gap and an amino acid randomly in that
sequence. Moving the gap to the other side of the selected amino acid
gives the modified alignment.

The energy of an alignment is calculated by summing up each
correlation value of pairs of characters located in the same column. 
The correlation value comes from Dayhoff's odds matrix. If the energy
of the modified alignment is lower than that of the previous one, the
modified alignment is always regarded as a new alignment. If not,
whether the modified one is regarded as a new alignment or not depends
on the probability derived by temperature. The temperature is decided
according to a cooling schedule. This annealing operation often brings
good alignment.

4. Scheduleless parallel simulated annealing

Designing a cooling schedule is troublesome because the optimal
cooling schedule depends on the type and the scale of combinatorial
problems.  Without careful temperature reduction, a solution is trapped
in a local minimum which has relatively high energy. Kimura, a member
of ICOT, developed the method of parallel simulated annealing that
makes it possible to avoid designing the cooling schedule.

In Kimura's method, each process element (PE) maintains one solution
and performs the annealing operation concurrently under a constant
temperature that differs from PE to PE. The solutions obtained by the
PEs are occasionally exchanged between PEs that hold neighbor
temperatures (Figure 3). This exchange of solutions is controlled in
some probabilistic way. Kimura proposed a scheme of the probabilistic
exchange, and justified it from the viewpoint of the probability theory.
He applied his method to a graph-partitioning problem, one of the
representative combinatorial problem. That proved his method to be
efficient.

5. Demonstration

The demonstration system solves multiple sequence alignment problems by
the parallel simulated annealing method. The multiple alignment problem is
formulated as a combinatorial problem by Kanehisa's idea, and the
simulated annealing operation is processed by Kimura's method.

Generally, it takes hundreds of hours for optimization by simulated
annealing. The demonstration is a brief version of multiple alignment.
It shows you gradual improvement of the alignment of some small
protein sequences.

