As the above figure illustrates, the amount of publicly available biological sequence data has exploded in recent years. In Project GATTACA, we seek to tap into this historical opportunity to develop algorithms, visualization tools and predictive models which are all geared towards very large amounts of bio-sequence data (DNA, RNA, protein). Currently, we focus on RNA viruses and other fast evolving pathogens because the data are plentiful, the need is great, and the potential for impacting public health is significant.
Roni RosenfeldRoni's Home Page, Professor, School of Computer ScienceHome Page for the School of Computer Science (LTIHome Page for the Language Technologies Institute, MLDHome Page for the Machine Learning Department, CSDHome Page for the Computer Science Department), Carnegie Mellon
Andy WalshAndy's Home Page, Postdoctoral Fellow, Language Technologies InstituteHome Page for the Language Technologies Institute, School of Computer ScienceHome Page for the School of Computer Science, Carnegie Mellon
This work is generously sponsored by the following grants:
- NSF Center for Biological Language Modeling