Figure showing an exponential growth in the number of influenza sequences

As the above figure illustrates, the amount of publicly available biological sequence data has exploded in recent years. In Project GATTACA, we seek to tap into this historical opportunity to develop algorithms, visualization tools and predictive models which are all geared towards very large amounts of bio-sequence data (DNA, RNA, protein). Currently, we focus on RNA viruses and other fast evolving pathogens because the data are plentiful, the need is great, and the potential for impacting public health is significant.

Project Personnel

Roni RosenfeldRoni's Home Page, Professor, School of Computer ScienceHome Page for the School of Computer Science (LTIHome Page for the Language Technologies Institute, MLDHome Page for the Machine Learning Department, CSDHome Page for the Computer Science Department), Carnegie Mellon

Andy WalshAndy's Home Page, Postdoctoral Fellow, Language Technologies InstituteHome Page for the Language Technologies Institute, School of Computer ScienceHome Page for the School of Computer Science, Carnegie Mellon

Chuang WuChuang's Home Page, PhD student, Program in Computational BiologyHome Page for the Computational Biology Program

Sponsors

This work is generously sponsored by the following grants: