## Pairwise alignment continued

### Alignment algorithms

The dynamic programs for sequence alignment compute a matrix a[i,j], which gives the scores of the optimal alignments of all prefixes. These algorithms have four components:

• Initialization of the first row and column of a[i,j].
• A recurrence relation for a[i,j], i,j > 1.
• Determination of the score of the optimal alignment from the matrix a[i,j] in O(mn) time.
• Trace back through the alignment matrix to obtain the optimal alignment in O(m+n) time.

The details of each of these steps are what differentiate global, semi-global and local alignment.

#### Global alignment with similarity scoring

• p(x,y): similarity of x and y
• p(x,"_"): gap cost
• Score of alignment   =   ∑(p(s'[i], t'[i])), i  =  1..l

• A simple similarity scoring function that treats all characters equally:
• p(i, i)   =   M
• p(i, j)   =   m
• p(i, "_")   =   g

• We require that 2g ≤ m < M.   If we allow 2g ≥ m then there will be no substitutions.
In this case, all matches are accorded the same weight, as are all mismatches. Later in the semester we will consider substitution matrices where the scores for matches and mismatches vary for different characters i and j.

Under this simple scoring function, the dynamic programming algorithm for global alignment has the following initialization and recursion steps:

• Initialization
• a[0,s[i]]  =  a[i-1,0] + g
• a[t[j],0]  =  a[0,j-1] + g

• Recurrence relation:  a[i, j]  =  max { a[i,j-1] + g a[i-1, j-1] + p(i,j) a[i-1,j] + g

#### Semiglobal Alignment

Semiglobal alignment is global alignment with no end gap penalties. Some applications include:

• Finding overlaps between fragments for sequence assembly.
• Aligning cDNA's or EST's with genomic DNA to identify gene structure.

The global dynamic programming algorithm can be modified for semi-global alignment as follows:

• Initialization
• initialize the first row or the first column of a[i,j] to zero, to avoid leading gap penalties.

• Recurrence relation
• Same as global.

• To avoid trailing gap penalties, the score of the optimal semiglobal alignment is MAXi a[i,n] or MAXj a[m,j]

• To avoid trailing gap penalties, start the trace back at the cell(s) in the last row (or column) that with maximum score.    Note that when the first row (or column) of the matrix is initialized to zero, the traceback will end in the first row (or column) but not necessarily in the cell a[1,1].