- Looking for errors; compare output of two sequencing runs of the same DNA fragment
- Comparing closely related gene and protein sequences
- Comparing cDNA with genomic DNA
- Protein structure prediction
- Single nucleotide polymorphisms (SNPs)

*∑' = ∑ ∪ {"_"}*- Given
*s[1..m]*and*t[1..n], α(s',t')*is an alignment if*s', t'*in*(∑')***|s'| = |t'| = l ≥ max{m,n}**s*is the subsequence obtained by removing "_" from*s'*(ditto for*t*and*t')*- There is no value of
*i*for which s'[i] = t'[i] = "_".

*D[s,t] = ∑(d[s*^{'}[i],t^{'}[i]), i = 1..l*d(x,x) = 0**d(x,y) ≥ 0**d(x,"_") ≥ 0**d(x,z) < d(x,y) + d(y,z)*

- If
*d(x,y) = 1*and*d(x,"_") = 1*, then*D(s,t)*is the minimum number of operations required to transform*s*into*t*, where the operations are substitution, insertion and deletion. This is called the "edit distance". - If
*d(x,y) ≥ 1*and*d(x,"_") ≥ 1*, then it is called the "*weighted*edit distance". *D[s,t]*is a metric. It satisfies the triangle inequality.*D[s,t]*is the sum of the distances for positions in the alignment. This implies that we assume positional independence.

- Initialization
*D[0,t[j]] = D[0,t[j-1]] + d(t[j],"_")**D[s[i],0] = D[s[i-1],0] + d("_",s[i])*

- Recurrence

*D[i,j] = min {**D[i-1,j] + d(s[i], "_")**D[i-1,j-1] + d(s[i], t[j])**D[i,j-1] + d("_", t[j])*

- Compute score of all pairs of prefixes in
*O(m • n)*time.*D[m,n]*gives the score of the optimal alignment. - Trace back through the alignment matrix in
*O(m+n)*time to obtain the optimal alignment. - There may be more than one optimal alignment

Last modified: August 26th, 2010

Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.