## 03-511/711, 15-495/856 Course Notes - August 31, 2010

## Pairwise alignment continued

### Alignment algorithms

The dynamic programs for sequence alignment compute a matrix *a[i,j]*,
which gives the scores of the optimal alignments of all prefixes. These algorithms have four components:

__Initialization__ of the first row and column of *a[i,j]*.
- A
__recurrence relation__ for *a[i,j], i,j > 1*.
- Determination of the
__score of the optimal alignment__
from the matrix *a[i,j]* in *O(mn)* time.
__Trace back__ through the alignment matrix to obtain the
optimal alignment in *O(m+n)* time.

The details of each of these steps are what differentiate global,
semi-global and local alignment.

#### Global alignment with similarity scoring

*p(x,y)*: similarity of *x* and *y*
*p(x,"_")*: gap cost
- Score of alignment
* = ∑(p(s'[i], t'[i])), i =
1..l*

- A simple similarity scoring function that treats all characters equally:
*p(i, i) = M*
*p(i, j) = m*
*p(i, "_") = g*

- We require that
*2g ≤ m < M*. If we allow *2g ≥ m*
then there will be no substitutions.

In this case, all matches are
accorded the same weight, as are all mismatches. Later in the semester we will
consider *substitution matrices* where the scores for matches and
mismatches vary for different characters *i* and *j*.
Under this simple scoring function, the dynamic programming algorithm for
global alignment has the following initialization and recursion steps:

- Initialization
*a[0,s[i]] = a[i-1,0] + g*
*a[t[j],0] = a[0,j-1] + g*

- Recurrence relation:
*a[i, j] = max { * |
*a[i,j-1] + g* |

*a[i-1, j-1] + p(i,j)* |

*a[i-1,j] + g* |

#### Semiglobal Alignment

Semiglobal alignment is global alignment with no end gap penalties. Some
applications include:

- Finding overlaps between fragments for sequence assembly.
- Aligning cDNA's or EST's with genomic DNA to identify gene structure.

The global dynamic programming algorithm can be modified for semi-global
alignment as follows:

- Initialization
- initialize the first row or the first column
of
*a[i,j]* to zero, to avoid leading gap penalties.

- Recurrence relation

- To avoid trailing gap penalties, the score of the optimal semiglobal alignment is
*MAX*_{i}
a[i,n] or *MAX*_{j} a[m,j]

- To avoid trailing gap penalties, start the trace back at the
cell(s) in the last row (or column) that with maximum score.
Note that when the first row (or column) of the matrix is
initialized to zero, the traceback will end in the first row (or
column) but not necessarily in the cell a[1,1].

Last modified: August 31, 2010.

Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.