Location: Zoom (this class will not be recorded to preserve privacy of participants and their opinions)

Instructors: Ashique KhudaBukhsh, Mark Kamlet, Tom Mitchell

Course Research Engineer: Rupak Sarkar

Course Description: Online social media provides a rich source of detailed data reflecting the evolution of political sentiments over time, and in response to various news events. This seminar course studies a diverse set of recent papers at the intersection of machine learning, natural language processing and political science with an aim to pose research questions concerning US politics and devise ML and NLP framework for answering them. A key component of the course is a semester-long research project with the view towards a peer-reviewed publication. To this end, the course provides a large text data set relevant to US politics. Each student will formulate, explore and address a focused research question through the lens of this data. By the end of the course, apart from acquiring hands-on experience in realizing the synergy between large scale data, creative research questions and effective NLP solutions, we all hope to have an improved understanding of why we are in what we are in.

Prerequisites: The course requires familiarity with machine learning at a level of being able to complete a substantial project. Any of the following courses can serve as a prerequisite: 10-601, 10-701, 10-605, 10-805, 11-685, 11-785, 95-845. Interested students without this requirement should contact the instructor (Ashique KhudaBukhsh, akhudabu@cs.cmu.edu) to check if they have the required background.

Course project and resources: In this course, students are encouraged to explore ambitious, open-ended projects. The course will provide some useful data sets and a list of potential project ideas. Students will work in small teams (1-2 members) on course projects. The course research engineer will provide useful scripts and code to efficiently process data.

Grading: Weekly reading summaries (10%), class participation (5%), class presentation (15%), weekly/bi-weekly progress discussions (10%), midterm project evaluation (20%), final project evaluation (40%).

Tentative syllabus: A single lecture (lecture 7 onward) is organized around a main paper (and few supplementary readings) with a student presenting the key ideas in the first half of the lecture followed by class discussions. Students are required to submit a short reading summary (not more than a single page) outlining the key ideas of the papers scheduled for the week before class starts on Monday. We are keeping 6 lectures open-ended to have some room for (1) new events (2) students presenting their results for feedback from a broader audience (3) any exciting new paper suggested by the students or the instructors.

1Course overview
2Our first paper on the data set; data set characterizationChapter 6 from Dan Jurafsky and James H. Martin's book.
3Labor DayNo class.
4Potential research questions and project ideas (Ashique)Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes; Garg, Schiebinger, Jurafsky, Zou; PNAS, 2018. Paper.

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings; Manzini, Chong, Tsvetkov, Black; NAACL, 2019. Paper.
5Potential research questions and project ideas (Tom)
6Potential research questions and project ideas (Mark) Reading summary due for the week.

A Computational Model of the Citizen as Motivated Reasoner: Modeling the Dynamics of the 2000 Presidential Election; Kim, Taber, Lodge; Political Behavior, 2010. Paper.
7Predicting political outcomesReading summary due for the week.

Student presentations start from this lecture.

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series; O'Connor, Balasubramanyan, Routledge, Smith; ICWSM 2010. Paper.
Presenter: Christian Deverall.

Does the @realDonaldTrump Really Matter to Financial Markets?; Benton, Philips; American Journal of Political Science, 2020. Paper.
Presenter: Clay Yoo.
8 Predicting political outcomesProject proposal due by 11 am.

Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum; Tsakalidis, Aletras, Cristea, Liakata; CIKM 2018. Paper.
Presenter: Rupak Sarkar.
9Political discourse and moral foundation Reading summary due for the week.

Proposal feedback.

Classification of Moral Foundations in Microblog Political Discourse; Johnson, Goldwasser; ACL, 2018. Paper.
Presenter: Jeremiah Milbauer.

Modeling of Political Discourse Framing on Twitter; Johnson, Goldwasser; ICWSM 2017. Paper.
Presenter: Tushar Kanakagiri.
10Moral sentimentText-based Inference of Moral Sentiment Change; Xie, Ferreira Pinto Jr., Hirst, Xu; EMNLP, 2019. Paper.
Presenter: Anirban Chowdhury.
Guest: Renato Ferreira Pinto, Jr. (Google).
11Censorship and moderationYou Can't Stay Here: The Efficacy of Reddit's 2015 Ban Examined Through Hate Speech; Chandrasekharan, Pavalanathan, Srinivasan, Glynn, Eisenstein, Gilbert; CSCW 2017. Paper.
Presenter: Jose Eduardo Oros Chavarria.

Perceptions of Censorship and Moderation Bias in Political Debate Forums; Shen, Yoder, Jo, Rosè, ICWSM 2018. Paper.
Presenter: Qinlan Chen.
12PolarizationThe Polarization of Contemporary American Politics; Hare, Poole; Polity, 2014. Paper.
Mark Kamlet.
13PolarizationAnalyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings; Demszky, Garg, Voigt, Zou, Shapiro, Gentzkow, Jurafsky; NAACL 2019. Paper.
Presenter: Helen Zheng.
Guest: Dorottya Demszky (Stanford University).
14ControversySomething's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features; Hessel, Lee; NAACL 2019. Paper.
Presenter: Brian Yan.
Guest: Jack Hessel (AI2).

Events and Controversies: Influences of a Shocking News Event on Information Seeking; Koutra, Bennett, Horvitz; WWW 2015. Paper.
Presenter: Dian Yu.
15Opinion aggregation using language modelsMining Insights from Large-scale Corpora Using Fine-tuned Language Models; Palakodety, KhudaBukhsh, Carbonell; ECAI 2020. Paper.
Presenter: Hyeonsu Kang.

How Can We Know What Language Models Know; Jiang, Xu, Araki, Neubig; TACL 2020. Paper.
Presenter: Yuchen Li.
16Hate speech, counter speech Trumping Hate on Twitter? Online Hate in the 2016 US Election and its Aftermath; Siegel, Nikitin, Barber, Sterling, Bethany, Pullenk, Bonneau, Jonathan Nagler, Tucker; Quarterly Journal of Political Science, forthcoming. Paper.
Presenter: Ashique KhudaBukhsh.

Hate Speech Detection is Not as Easy as You May Think; Arango, Pèrez, Poblete; SIGIR 2019. Paper.
Presenter: Qinlan Chen.
17Spotlight talksMidterm project evaluation
18Spotlight talksMidterm project evaluation
19Spotlight talksMidterm project evaluation
Reading summaries due for the week.

Framing and Agenda-Setting in Russian News: a Computational Analysis of Intricate Political Strategies; Field, Kliger, Wintner, Pan, Jurafsky, Tsvetkov; EMNLP 2018. Paper.
Presenter: Lingwei Chen.
20Counter speech Revised proposal submission (due 11 am).
Thou Shalt Not Hate: Countering Online Hate Speech; Mathew, Saha, Tharad, Rajgaria, Singhania, Maity, Goyal, Mukherjee; ICWSM 2019. Paper.
Presenter: Gaurav Deshpande.

Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas; Palakodety, KhudaBukhsh, Carbonell; AAAI 2020. Paper.
Presenter: Ramon Alfanso Villa Cox.
21Counter speechRacism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis; Ziems, He, Soni, Kumar; arXiv, 2020. Paper.
Presenter: Kunal Khadilkar.

Hope Speech Detection: A Computational Analysis of the Voice of Peace; Palakodety, KhudaBukhsh, Carbonell; ECAI 2020. Paper.
Presenter: Ashique KhudaBukhsh
22Fake news and misinformationA Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities; Zhou, Zafarani; arXiv, 2018. Paper.

Capturing the Style of Fake News; Przybyla; AAAI 2020. Paper.
Presenter: Anurag Katakkar.

Political Knowledge and Misinformation in the Era of Social Media: Evidence from the 2015 U.K. Election; Munger, Egan, Nagler, Ronen, Tucker; British Journal of Political Science, forthcoming. Paper.
Presenter: Daniel Connolly.
23Politics and news mediaStrategic Candidate Entry and Congressional Elections in the Era of Fox News; Arceneaux, Dunaway, Johnson, Vander Wielen; American Journal of Political Science, 2020. Paper.
Presenter: Mauro Moretto.

Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election; Faris, Roberts, Etling, Bourassa, Zuckerman, Benkler; SSRN 2017. Paper.
Presenter: Antonio Carlos Theophilo Costa Junior.
24Global issuesModeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New York City; Wilder, Charpignon, Killian, Ou, Mate, Jabbari, Perrault, Desai, Tambe, Majumder; PNAS 2020. Paper.
Presenter: Bryan Wilder.
25News framingMulti-Label and Multilingual News Framing Analysis; Akyürek, Guo, Elanwar, Ishwar, Betke, Wijaya; ACL 2020. Paper.
Presenter: Derry T. Wijaya.
29Final project presentations

Useful links

A parallel CMU course on Voting.
An excellent book (in progress) by Dan Jurafsky and James H. Martin to learn fundamental concepts of NLP.