Course Research Engineer: Rupak Sarkar
Course Description: Online social media provides a rich source of detailed data reflecting the evolution of political sentiments over time, and in response to various news events. This seminar course studies a diverse set of recent papers at the intersection of machine learning, natural language processing and political science with an aim to pose research questions concerning US politics and devise ML and NLP framework for answering them. A key component of the course is a semester-long research project with the view towards a peer-reviewed publication. To this end, the course provides a large text data set relevant to US politics. Each student will formulate, explore and address a focused research question through the lens of this data. By the end of the course, apart from acquiring hands-on experience in realizing the synergy between large scale data, creative research questions and effective NLP solutions, we all hope to have an improved understanding of why we are in what we are in.
Prerequisites: The course requires familiarity with machine learning at a level of being able to complete a substantial project. Any of the following courses can serve as a prerequisite: 10-601, 10-701, 10-605, 10-805, 11-685, 11-785, 95-845. Interested students without this requirement should contact the instructor (Ashique KhudaBukhsh, akhudabu@cs.cmu.edu) to check if they have the required background.
Course project and resources: In this course, students are encouraged to explore ambitious, open-ended projects. The course will provide some useful data sets and a list of potential project ideas. Students will work in small teams (1-2 members) on course projects. The course research engineer will provide useful scripts and code to efficiently process data.
Grading: Weekly reading summaries (10%), class participation (5%), class presentation (15%), weekly/bi-weekly progress discussions (10%), midterm project evaluation (20%), final project evaluation (40%).
Tentative syllabus: A single lecture (lecture 7 onward) is organized around a main paper (and few supplementary readings) with a student presenting the key ideas in the first half of the lecture followed by class discussions. Students are required to submit a short reading summary (not more than a single page) outlining the key ideas of the papers scheduled for the week before class starts on Monday. We are keeping 6 lectures open-ended to have some room for (1) new events (2) students presenting their results for feedback from a broader audience (3) any exciting new paper suggested by the students or the instructors.
Lecture | Topic | Papers |
---|---|---|
1 | Course overview | |
2 | Our first paper on the data set; data set characterization | Chapter 6 from Dan Jurafsky and James H. Martin's book. |
3 | Labor Day | No class. |
4 | Potential research questions and project ideas (Ashique) | Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes; Garg, Schiebinger, Jurafsky, Zou; PNAS, 2018. Paper. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings; Manzini, Chong, Tsvetkov, Black; NAACL, 2019. Paper. |
5 | Potential research questions and project ideas (Tom) | |
6 | Potential research questions and project ideas (Mark) | Reading summary due for the week. A Computational Model of the Citizen as Motivated Reasoner: Modeling the Dynamics of the 2000 Presidential Election; Kim, Taber, Lodge; Political Behavior, 2010. Paper. |
7 | Predicting political outcomes | Reading summary due for the week. Student presentations start from this lecture. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series; O'Connor, Balasubramanyan, Routledge, Smith; ICWSM 2010. Paper. Presenter: Christian Deverall. Does the @realDonaldTrump Really Matter to Financial Markets?; Benton, Philips; American Journal of Political Science, 2020. Paper. Presenter: Clay Yoo. |
8 | Predicting political outcomes | Project proposal due by 11 am. Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum; Tsakalidis, Aletras, Cristea, Liakata; CIKM 2018. Paper. Presenter: Rupak Sarkar. |
9 | Political discourse and moral foundation |
Reading summary due for the week. Proposal feedback. Classification of Moral Foundations in Microblog Political Discourse; Johnson, Goldwasser; ACL, 2018. Paper. Presenter: Jeremiah Milbauer. Modeling of Political Discourse Framing on Twitter; Johnson, Goldwasser; ICWSM 2017. Paper. Presenter: Tushar Kanakagiri. |
10 | Moral sentiment | Text-based Inference of Moral Sentiment Change; Xie, Ferreira Pinto Jr., Hirst, Xu; EMNLP, 2019. Paper. Presenter: Anirban Chowdhury. Guest: Renato Ferreira Pinto, Jr. (Google). |
11 | Censorship and moderation | You Can't Stay Here: The Efficacy of Reddit's 2015 Ban
Examined Through Hate Speech; Chandrasekharan, Pavalanathan, Srinivasan, Glynn, Eisenstein, Gilbert; CSCW 2017. Paper. Presenter: Jose Eduardo Oros Chavarria. Perceptions of Censorship and Moderation Bias in Political Debate Forums; Shen, Yoder, Jo, Rosè, ICWSM 2018. Paper. Presenter: Qinlan Chen. |
12 | Polarization | The Polarization of Contemporary American Politics; Hare, Poole; Polity, 2014. Paper. Mark Kamlet. |
13 | Polarization | Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings; Demszky, Garg, Voigt, Zou, Shapiro, Gentzkow, Jurafsky; NAACL 2019. Paper. Presenter: Helen Zheng. Guest: Dorottya Demszky (Stanford University). |
14 | Controversy | Something's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features; Hessel, Lee; NAACL 2019. Paper. Presenter: Brian Yan. Guest: Jack Hessel (AI2). Events and Controversies: Influences of a Shocking News Event on Information Seeking; Koutra, Bennett, Horvitz; WWW 2015. Paper. Presenter: Dian Yu. |
15 | Opinion aggregation using language models | Mining Insights from Large-scale Corpora Using Fine-tuned Language Models; Palakodety, KhudaBukhsh, Carbonell; ECAI 2020. Paper. Presenter: Hyeonsu Kang. How Can We Know What Language Models Know; Jiang, Xu, Araki, Neubig; TACL 2020. Paper. Presenter: Yuchen Li. |
16 | Hate speech, counter speech |
Trumping Hate on Twitter?
Online Hate in the 2016 US Election and its Aftermath; Siegel, Nikitin, Barber, Sterling, Bethany, Pullenk,
Bonneau, Jonathan Nagler, Tucker; Quarterly Journal of Political Science, forthcoming. Paper. Presenter: Ashique KhudaBukhsh. Hate Speech Detection is Not as Easy as You May Think; Arango, Pèrez, Poblete; SIGIR 2019. Paper. Presenter: Qinlan Chen. |
17 | Spotlight talks | Midterm project evaluation |
18 | Spotlight talks | Midterm project evaluation |
19 | Spotlight talks | Midterm project evaluation Reading summaries due for the week. Framing and Agenda-Setting in Russian News: a Computational Analysis of Intricate Political Strategies; Field, Kliger, Wintner, Pan, Jurafsky, Tsvetkov; EMNLP 2018. Paper. Presenter: Lingwei Chen. |
20 | Counter speech |
Revised proposal submission (due 11 am). Thou Shalt Not Hate: Countering Online Hate Speech; Mathew, Saha, Tharad, Rajgaria, Singhania, Maity, Goyal, Mukherjee; ICWSM 2019. Paper. Presenter: Gaurav Deshpande. Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas; Palakodety, KhudaBukhsh, Carbonell; AAAI 2020. Paper. Presenter: Ramon Alfanso Villa Cox. |
21 | Counter speech | Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis; Ziems, He, Soni, Kumar; arXiv, 2020. Paper. Presenter: Kunal Khadilkar. Hope Speech Detection: A Computational Analysis of the Voice of Peace; Palakodety, KhudaBukhsh, Carbonell; ECAI 2020. Paper. Presenter: Ashique KhudaBukhsh |
22 | Fake news and misinformation | A Survey of Fake News:
Fundamental Theories, Detection Methods, and Opportunities; Zhou, Zafarani; arXiv, 2018. Paper. Capturing the Style of Fake News; Przybyla; AAAI 2020. Paper. Presenter: Anurag Katakkar. Political Knowledge and Misinformation in the Era of Social Media: Evidence from the 2015 U.K. Election; Munger, Egan, Nagler, Ronen, Tucker; British Journal of Political Science, forthcoming. Paper. Presenter: Daniel Connolly. |
23 | Politics and news media | Strategic Candidate Entry and Congressional Elections in the Era of Fox News; Arceneaux, Dunaway, Johnson, Vander Wielen; American Journal of Political Science, 2020. Paper. Presenter: Mauro Moretto. Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election; Faris, Roberts, Etling, Bourassa, Zuckerman, Benkler; SSRN 2017. Paper. Presenter: Antonio Carlos Theophilo Costa Junior. |
24 | Global issues | Modeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New York City; Wilder, Charpignon, Killian, Ou, Mate, Jabbari, Perrault, Desai, Tambe, Majumder; PNAS 2020. Paper. Presenter: Bryan Wilder. |
25 | News framing | Multi-Label and Multilingual News Framing Analysis; Akyürek, Guo, Elanwar, Ishwar, Betke, Wijaya; ACL 2020. Paper. Presenter: Derry T. Wijaya. |
26 | TBD | |
27 | TBD | |
28 | TBD | |
29 | Final project presentations |