The HapMotif Package: Programs for Finding and Using Conserved Segments in Haploid Genetic Data

Version 0.0.0.2

Sample Haplotype Motif Profile

This page contains links to a preliminary version of the HapMotif package, which contains code for locating conserved sequences of genetic polymorphisms in aligned haploid sequence data and applying them to various problems related to association study design and applications. The definitions and methods for motif detection are described in the paper "Haplotype Motifs: An Algorithmic Approach to Locating Evolutionarily Conserved Patterns in Haploid Sequences" by Russell Schwartz, which appeared in the proceedings of the 2003 IEEE Computer Society Bioinformatics Conference. Applications to downstream problems --- including missing site prediction, informative SNP selection, and case-control association testing --- are described in a forthcoming paper. For more information, see the following documentation:

README.1st: general description of the package contents and installation instructions

README.hapmotif: a description of usage of the motif detection program

README.htsnp: a description of usage of a program using motifs for htSNP selection

README.predict: a description of usage of two programs for inferring missing sites using haplotype motifs

README.case-control: a description of usage of a simple motif-based case-control association testing program

README.files: documentation on file formats used in the code

Changes from version 0.0.0.1:
- statistical model has changed to more accurately estimate motif frequencies
- correcting for sequencing errors/recent mutations is now an option in the missing data inference programs
- htSNP methods are much faster and have lower memory overhead

The source code is in C++. Due to limited computational and labor resources, it has only been tested on a Linux (RedHat 7.3, gcc 2.96) and a Mac OS X (10.2, Apple Computer Inc. GCC 11161) computer. Both binaries run in command-line mode only. I am unaware of any reason it would not work with any ANSI-compliant C++ compiler, but tweaking of code and Makefiles will likely be required. The source code and precompiled Linux and Mac OS X binaries are available here:

source code in tarred gzip format

Mac OS X binaries
hapmotif
predictg
predictb
htsnp
case-control

Linux binaries
hapmotif
predictg
predictb
htsnp
case-control

Questions, comments, and bug reports may be sent to the author at russells@andrew.cmu.edu. Please note, however, that development of this code is a research project which is aimed at creating theoretical methods for computational genomics, not at producing production quality code. This code is being released to allow others to review, experiment with, and improve upon these methods. The code is not suitable for mission critical work and should not be used as if it were. The code and all associated materials are provided as is, with no warranty of any kind, explicit or implicit, and no explicit or implicit promise of support.

The HapMotif codes and any associated files are released under the terms of the MIT License, reproduced below. The author would, however, appreciate hearing about any interesting applications or results derived with this tool and requests that the code be appropriately cited in any publications making use of it.



Copyright (c) 2003 Russell Schwartz, Department of Biological Sciences, Carnegie Mellon University.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.