ABSTRACTin American Journal of Human Genetics, 55(4): 777-787, 1994.
Human genetic maps have made quantum leaps in the last few years due to the characterization of over 2,000 CA dinucleotide repeat loci: these PCR-based markers offer extraordinarily high polymorphism information content (PIC) (Botstein et al. 1980), and their density is expected to reach intervals of a few centimorgans per marker within the next year. These new genetic maps open new avenues for disease gene research, including large scale genotyping for both simple and complex disease loci. However, the allele patterns of many dinucleotide repeat loci can be complex and difficult to interpret, with genotyping errors a recognized problem. Furthermore, the possibility of genotyping individuals at hundreds or thousands of polymorphic loci requires improvements in data handling and analysis. The automation of genotyping and analysis of computer-derived haplotypes would remove many of the barriers preventing optimal use of dense and informative dinucleotide genetic maps. Towards this end, we have automated the allele identification, genotyping, phase determinations, and inheritance consistency checks generated by four CA repeats within the 2.5 million base pair, 10 cM X-linked dystrophin gene using fluorescein-labeled multiplexed PCR products analyzed on automated sequencers. The described algorithms can: deconvolute and resolve closely spaced alleles, despite interfering stutter "noise"; set phase in females; propagate the phase through the family; and identify recombination events. We show the implementation of these algorithms for the completely automated interpretation of allele data and risk assessment for five Duchenne/Becker muscular dystrophy families. The described approach can be scaled up to perform genome-based analyses with hundreds or thousands of CA repeat loci using multiple fluorophors on automated sequencers.