One of the fundamental problems in computational biology is to detect genetic variants associated with output traits such as disease status, heights, or gene expressions. However, detecting trait-associated genetic variants has been a challenging problem because in practice, we do not have enough statistical power to detect them reliably; we usually have a small number of samples with a large number of genetic variants.
In this work, we present a novel method that uses prior biological knowledge to boost the statistical power of detecting genetic variants associated with traits. Specifically, we use biological knowledge about groups of correlated genetic variants (e.g. genetic variants in linkage disequilibrium) and groups of correlated traits (e.g. co-expressed genes). Given the grouping information, we assume that a group of correlated traits may be affected by the common genetic variants, or a group of genetic variants may affect the common traits. Under such assumptions, we incorporate the biological knowledge into a sparse regression model using L1/L2 penalties. We illustrate our approach with examples, and show how prior biological knowledge helps increase the power to detect associations between genetic variants and traits.
This is joint work with Eric Xing.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.