Office: Gates-Hillman Complex 8002 Phone +1-412-268-7678 Fax +1-412-268-6298
Assistant: Christina Melucci, GHC 8004, +1-412-268-1593
Mailing Address: Carnegie Mellon University, MLD, GHC 8002, 5000 Forbes Ave., Pgh., PA 15213-3890 USA
Short CV Long CV
Welcome! My interests
- Forecasting Epidemics: The long term vision of our Delphi research group is to
make epidemiological forecasting as universally accepted and useful as weather
forecasting is today. As was the
case with weather forecasting, this will likely take a long time. In the shorter term, we select high
value epidemiological forecasting targets (currently Influenza and
Dengue); create baseline forecasting methods for them; establish metrics
for measuring and tracking forecasting accuracy; estimate the limits of forecastability for each target; and identify new
sources of data that could be helpful to the forecasting goal.
challenges: We have participated, and done very well, in all epidemiological
forecasting challenges organized by the U.S. government to date: Influenza
2013—2014 (CDC); Chikungunya 2015 (DARPA); Dengue 2009—2014 (White House OSTP);
Influenza 2014—2015 (CDC, winner); Influenza 2015—2016 (CDC, winner); Influenza 2016—2017 (CDC, winner).
Ř Try our operational,
geographically detailed, real time flu nowcasting
Ř Try our operational,
weekly updated flu forecasting.
We are part of the multi-university MIDAS research group.
2016: CDC has just named us “Most Accurate
Forecaster” for 2015-2016.
October 2017: We did it again! Our two systems took the top two spots out of 28 submissions
in the 2016-2017 flu challenge.
- Information and Communication Technologies for
and specifically Spoken Language Technologies for Development (SLT4D), which
is the term we coined for our own subfield of ICT4D: finding ways to use
spoken language technologies (like automatic speech recognition, speech
synthesis, and human-machine dialog systems) to aid socio-economic
development around the world.
Our current project, Polly, uses telephone-based viral
entertainment to reach low-literate people in Pakistan and India, familiarizing
them with speech interfaces and then introducing them to development-related
services. First deployed in Lahore in
May 2012, Polly reached over 165,000 users all over Pakistan and fielded over
2.5 million phone calls in 8 months. In
2013 we launched Polly in Bangalore, India, and it ended up spreading virally
to West Bengal, New Delhi and other areas of India. In March 2015 we deployed Polly in Guinea,
for person-to-person spreading of approved Public Health messages about Ebola
in many languages, in collaboration with the US embassy in Conakry. In 2016, in collaboration with Information
Technology University (Lahore) we launched two new services in Pakistan: Baang, a
voice-based Reddit, and Sawaal, a voice-based quiz game.
A previous project, HealthLine,
investigated the use of a telephone-based automated dialog system for access to
healthcare information by low-literate community health workers in Pakistan.
- Machine Learning for Social Good
(ML4SG). I continuously seek problems in
non-profits and government organizations, domestically and abroad, which
can benefit from machine learning solutions, and match them with suitable
teams of students and supervising faculty.
If your organization could use free machine learning or data
science expertize to help improve its societal impact, please contact
me. Best cases are those where the
potential for societal impact is evident, the questions are well defined,
and significant relevant data is available. Otherwise, I can work with you to get
your problem ready for our students.
This initiative benefits from a generous gift from Uptake
- Data Numeracy for
All. I believe that universal data
numeracy is as important in the 21st century as universal
literacy was in the 20th.
We need to increase the understanding of (and comfort with) data in
all segments of society. I am
interested in devising effective ways of doing that.
Students (department, topic): Logan Brooks (CSD, Epi-forecasting), Amanda Coston (MLD and Heinz, ML4SG), Aaron Rumack (MLD, Epi-forecasting), Nuoyu Li
(MLD, Epi-forecasting), Jiaxian (Chris) Sheng (CS, Epi-forecasting).
David Farrow (CompBio,
viral evolution + Epi-forecasting), Ali
ICT4D), Chuang Wu (CompBio,
viral genotype-phenotype mapping), Jahanzeb Sherwani (CSD,
Yong Lu (CSD, CompBio), Dan Bohus (CSD,
dialog systems), Stefanie Tomko
(LTI, speech communication), Jerry
(Xiaojin) Zhu (LTI, MLD, semi-supervised learning),
Chase (RI, speech recognition).
Past Post-docs: Andy Walsh (computational
virology), Xiaojin Wang
(machine learning), Stan F. Chen (language modeling), Pierre DuPont (language modeling).
My favorite quotes.