Lengths of Antigen Binding Sites across PDB Data
with Andrew Walsh and Roni Rosenfeld
Proteins are sequences of symbols over the alphabet of the twenty amino acids,
called residues. Protein sequences fold in space, producing complex
three-dimensional structures in which some subsequences are entwined in the
interior of the structure while other sub- sequences (words) are presented on
the exterior. Antigens are proteins which are targeted by an organism's immune
system; for each antigen a recognizer protein, called an anti- body, must be
constructed. An antibody must be equipped with a hard-coded sequence of residues
which will properly bond to some part of the antigen in order to allow the
immune system to dispose of the antigen. An antibody does not generally need to
recognize (bind to) the entire sequence of symbols in the antigen, but rather to
a subset of the words which must be on the exterior of the antigen protein.
Binding sites between antigens and antibodies are of crucial importance to the
function of the human immune system. The characteristics of an antigen-antibody
binding site have traditionally been examined in the context of a single
antigen-antibody complex. The Protein Data Bank (PDB) website , a worldwide
repository for structural biology data, makes available the specifications of
thousands of antigen-antibody complexes. This makes possible a lateral study of
all available antibody-antigen binding sites which can take advantage of
computational methods over PDB's large and growing corpus.
Here we characterize the antigen-antibody binding site across all complexes
available from PDB. Specifically, we analyze the number of contiguous residues
which participate in the binding site on the antigen side. We examine the
lengths of the words and the distribution of the lengths in the PDB corpus. We
begin by culling antibody-antigen com- plexes from the PDB using queries and
post-processing, then detecting and eliminating various kinds of noise implicit
in the database. We approximate the antigen residues which participate in the
binding site by antigen residues which are spatially close to residues in the
antibody, i.e. within some threshold e. We discuss bounds on the optimal value
for e, obtained from physical knowledge about the bonds as well as from
empirical data from PDB.
Our results indicate that the words on the antigen chain which are recognized by
the corresponding antibody are generally quite short. This indicates that the
antigen "signa- ture" is not generally local to a subsequence of the linear
sequence of residues: antigen structure seems to play a large role in
 Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge
Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The Protein Data Bank. Nucl. Acids
Res., 28(1):235242, 2000.
 M. Michael Gromiha and S. Selvaraj. Inter-residue Interactions in Protein Folding and
Stability. Prog. Biophys. Mol. Biol., 86:235277, 2004.