Kathryn (Katie) Rivard Mazaitis

Principal Research Programmer

This site was frozen in August 2023 with my departure from CMU. Find me on LinkedIn for future adventures!

Until recently I managed engineers, designed software systems, wrote code, managed servers, and did UIs for the Delphi Research Group in the Machine Learning department of the School of Computer Science and the Statistics department of the Dietrich College of Humanities and Social Sciences at CMU. You can find (a possibly dated version of) my complete CV online. Prior to joining Delphi in March 2020, I worked with Tom Mitchell, William Cohen, or Anthony Tomasic, depending on how far back you go. I also completed most of the coursework for an MHCI degree. My recent history is as follows:

August 2022 - April 2023 :: FlaSH :: With Ananya Joshi. FlaSH is an anomaly detection system for public health data streams, addressing some of the unique challenges and making use of the unique advantages of public health data. Resulted in Computationally Assisted Quality Control for Public Health Data Streams, IJCAI 2023.
April 2021 - August 2022 :: Epidata v4 :: With Roni Rosenfeld, Joe Greene, george haff, and Brian Clark. A systematic redesign of our database schema for storing epidemiologically-relevant, obsessively-versioned, geographically-detailed time series data at billion scale. Epidata on GitHub. Resulted in Introducing Epidata v4, Delphi Blog 2022.
October 2020 - March 2021 :: Delphi Google.org Fellowship :: With Sumit Agrawal, Sarah Colquhoun, David Farrow, Jed Grabman, Kate Harwood, Raphael Hyde, Daniel LaLiberte, Phil McGuinness, Mike O'Brien, Chris Scott, Ben Smith, Ben Weaver, and Spencer Whitman. Thirteen valiant Google employees joined Delphi for six months of furious effort to show us how to plan, build, and maintain production software systems, bring the group's web presence into the modern era, and close the feedback loop between Delphi and its users. A stunningly transformative experience I will never forget.
April 2020 - July 2022 :: The COVID-19 Trends and Impact Survey (CTIS) :: With Ryan Tibshirani, Logan Brooks, Alex Reinhart, Nat DeFries, Wichada La Motte-Kerr. A daily survey of Meta (then Facebook) users in the USA, initially about participants' own and household COVID symptoms. Along with the global sister survey run by University of Maryland, grew into the largest public health survey ever conducted, with over 29 million total responses. Aggregate data is available publicly through COVIDcast. Microdata is available to researchers who sign a DUA. More about the survey; survey data dashboard. Resulted in The US COVID-19 Trends and Impact Survey: Continuous Real-Time Measurement of COVID-19 Symptoms, Risks, Protective Behaviors, Testing, and Vaccination, PNAS 2021.
April 2020 - August 2020 :: COVIDcast :: With Roni Rosenfeld, Ryan Tibshirani, Alex Reinhart, Brian Clark, and dozens of grad student and faculty volunteers. COVIDcast is a data system in support of the COVID-19 response in the United States. It provides access to real-time, geographically-detailed indicators of COVID activity from a wide variety of sources and covering nearly every rung of the disease severity pyramid. Data is available via a free API, web visualizations, and customizable dashboards to data scientists, epidemiologists, public health officials, researchers, and the general public. Too many links for inline:
Resulted in An Open Repository of Real-Time COVID-19 Indicators, PNAS 2021.
March 2020 - April 2020 :: Crowdcast/Epicast :: With Roni Rosenfeld, Ryan Timbshirani, and Chris Shen. Epicast is a wisdom-of-crowds tool which collects predictions from individuals on the course of a seasonal epidemic like flu. The predictions can then be used in forecasting. We adapted Epicast for COVID in the early days of the pandemic's arrival in the USA, incorporating reference data from the European CDC and Korean CDC to guide users' predictions. Epicast archival deployment; Epicast on GitHub.
January 2019 - March 2020 :: Theo :: With Tom Mitchell. Theo is a combined knowledge base and inference system with first-class facts (i.e. you can have facts about facts, and facts about facts about facts, etc). In the past, a Java implementation of Theo underlaid NELL, the Never-Ending Language Learner. For LIA, we developed a Python implementation of Theo with a Django-mediated database backend. PyTheo on bitbucket.
June 2018 - March 2020 :: LIA :: With Tom Mitchell, Forough Arabshahi, Igor Labutov, and a variety of undergrad and master's students. The Learning by Instruction Agent is a digital assistant that can learn to combine simple tasks into more complex ones via natural language instructions.
September 2017 - June 2018 :: RollMe :: With William Cohen, Vidhisha Balachandran. RollMe is an organization and planning scheme for machine learning pipelines and research groups. Common research tasks are organized in a network, providing support for documentation, porting proven ideas to related datasets or problems, and collecting best practices. The document hub works in concert with the CodaLab cloud computing system, which helps track experiments and prevent results from getting lost. RollMe document hub, Task browser, CodaLab node.
April 2016 - June 2018 :: TensorLog :: With William Cohen. TensorLog is a differentiable deductive database which solves queries by expressing clauses of a logical theory as factor graphs. TensorLog on github.
June 2016 - 2017+ :: GNAT :: With William Cohen, Lidong Bing, Bhuwan Dhingra, Eli Whitney, Lam Wing Chang, Joseph Gibli. GNAT is A Grounded NELL-like AKBC Toolkit, and incorporates several knowledge base building and completion projects in an attempt to identify repeated tasks and develop tools suited to them. GNAT website; github of prototype tools.
Jan 2013 - 2016+ :: ProPPR :: With William Cohen, William Wang. Programming with Personalized Page Rank uses graph-walking algorithms to make inferences over logic program proof graphs. ProPPR on github. Resulted in:

I pay attention to how my code is organized, and work to minimize code duplication and other hacks wherever possible. I transform deadline-driven code into reusable workhorses, and prototypes into solid software that can process data quickly at scale. I'm diagram-driven, and generally produce UI sketches as well as domain, class, and sequence diagrams as a part of my normal development process. This habit becomes invaluable when it comes to generating documentation and passing projects on. I have some basic background in user-oriented design, and furthered my study of the topic through coursework in the MHCI program at CMU. Thanks is very much due to my contact with John Zimmerman's teams, which grounded our projects in real user habits, needs, and goals, and convinced me that good design provides an excellent avenue through which academia can become accessible to the public.

For "August 2023" values of "Present"
Kathryn (Katie) Mazaitis :: krivard@andrew :: GHC 8112