03-511/711, 15-495/856 Course Notes - Nov. 18th, 2010

Amino Acid Substitution Matrices

Scoring overview

Scoring in pairwise alignment

For comparison, consider the Jukes Cantor model, which is a Markov model of point mutations in nucleic acid sequences.

                However, the Jukes Cantor model does not take the biophysical differences between residues into account.

Amino Acid Substitution Matrices


Goal: Amino acid similarity matrices that take into account

Markov models of sequence evolution require

Two commonly used families of amino acid substitution matrices

Each family is parameterized by evolutionary distance. Both use the following approach
  1. "Trusted" MSA's (ungapped)
  2. Count substitutions, correcting for sample bias in choice of sequences
  3. Estimate substitution frequencies
      PAM - evolutionary model
      BLOSUM - directly from data
  4. Construct Log odds scoring matrix

PAM matrices

PAM2 matrix:

P2[j,k] = ΣP[j,l] P[l,k] = (P1[j,k])2

PAMn matrix:

Pn[j,k] = (P1[j,k])n

  • Obtain log odds scoring matrix

    Let qn(j,k) = pj Pn[j,k] be the probability that, at a given position, we see amino acid j aligned with amino acid k;
    i.e., that amino acid j is replaced by amino acid k after n PAMs of mutational change. Then the PAM n scoring matrix is

    S[j,k] = λ log q[jk]
                          pj pk

              = λ log Pn[j,k]

    where λ is a constant. Typically λ = 10 and the entries of S[] are rounded to the nearest integer.

  • Are PAM matrices symmetric?

    Last modified: November 19, 2010.
    Maintained by Dannie Durand (durand@cs.cmu.edu) and Annette McLeod.