Location: Zoom (this class will not be recorded to preserve privacy of participants and their opinions)

Time: 3: 4.15 pm (Monday and Wednesday)

Instructors: Ashique KhudaBukhsh (RIT) and Mark Kamlet (CMU)





Course Research Engineer: Sujan Dutta

Course Description: Social media provides a rich source of detailed data reflecting the evolution of political sentiments over time and in response to various news events. This seminar course studies a diverse set of recent papers drawn from machine learning, large language models, and the broader natural language processing literature, as well as a select set of papers from the political science literature.

Students will have access to a large social media text data set relevant to US politics. This data set is updated and expanded to the present and back to 2014. A key component of the course is a semester-long research project in which students will pose interdisciplinary research questions concerning US politics that can be explored using this data set and applying the methodologies studied in the class.

This course is being delivered through Zoom and is being offered concurrently at Carnegie Mellon and the Rochester Institute of Technology. The course meets Monday and Wednesday (3 - 4.15 pm).

Students should have an interest in political science and have sufficient knowledge of machine learning and natural language technologies to be able to read and understand the papers mentioned above. In past offerings of this class (in Fall 2020 and Fall 2022) most students have been in Ph.D. programs or data analytic masters programs. Many of the research papers that students began in the class were published in journals or peer-reviewed conferences (AAAI 2021; ACM Web Sci 2022; SocInfo 2022; AAAI 2023; EMNLP 2023; and IJCAI 2023). Students are encouraged to be in touch with either of the co-instructors to discuss where the course fits well for them.

Prerequisites: The course requires familiarity with machine learning at a level of being able to complete a substantial project. Any of the following courses can serve as a prerequisite: 10-601, 10-701, 10-605, 10-805, 11-685, 11-785, 95-845. Interested students without this requirement should contact the instructor (Ashique KhudaBukhsh, akhudabu@cs.cmu.edu) to check if they have the required background.

Course project and resources: In this course, students are encouraged to explore ambitious, open-ended projects. The course will provide some useful data sets and a list of potential project ideas. Students will work in small teams (2-3 members) on course projects. The course research engineer will provide useful scripts and code to efficiently process the data.

Grading: Weekly reading summaries (10%), class participation (10%), class presentation (15%), weekly/bi-weekly progress discussions (10%), midterm project evaluation (15%), final project evaluation (40%).

Reading summaries: Weekly reading summaries are due every Monday (in case Monday is a holiday, Wednesday) before the first class of the week begins. The summary will cover all the papers we are scheduled to read for the week. The primary goal of the summary is to make sure we have read the papers beforehand, and are ready to discuss the finer points during the class. The summary can be fairly informal, no need to regurgitate the whole paper. Mentioning few interesting lines of thoughts that came to you when you were reading these papers is what we are looking for. Please email the reading summary to akhudabu@cs.cmu.edu. Summaries are not required for the optional readings.

Tentative syllabus: A single lecture (lecture 8 onward) is organized around a main paper (and few supplementary readings) with a student presenting the key ideas in the first half of the lecture followed by class discussions. Students are required to submit a short reading summary (not more than a single page) outlining the key ideas of the papers scheduled for the week before class starts on Monday. Lectures way ahead in the future may get reshuffled if we need to adjust to the schedule of some of the invited guests and speakers.



LectureTopicPapers
Aug 26Course overview; data set characterization.
Aug 28Our first paper on the data setGuest: Rupak Sarkar (University of Maryland, College Park)

Chapter 6 from Dan Jurafsky and James H. Martin's book.
Sep 2Labor DayNo class.
Sep 4Why does the Internet behave the way it does: a brief history of content moderation. Reading summary due for the week (before class starts).

We Don't Speak the Same Language: Interpreting Polarization through Machine Translation; KhudaBukhsh*, Sarkar*, Kamlet, Mitchell; AAAI 2021. Paper.

Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes; Garg, Schiebinger, Jurafsky, Zou; PNAS, 2018. Paper.

Sep 9The political evolution in the US (Mark) Reading summary due for the week (before class starts).

The Polarization of Contemporary American Politics; Hare, Poole; Polity, 2014. Paper.

A Computational Model of the Citizen as Motivated Reasoner: Modeling the Dynamics of the 2000 Presidential Election; Kim, Taber, Lodge; Political Behavior, 2010. Paper.
Sep 11Potential research questions and project ideas Text-based Inference of Moral Sentiment Change; Xie, Ferreira Pinto Jr., Hirst, Xu; EMNLP, 2019. Paper.

(Optional reading) Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings; Manzini, Chong, Tsvetkov, Black; NAACL, 2019. Paper.
Sep 16The political evolution in the US continued (Mark) Reading summary due for the week (before class starts).

"i have a feeling trump will win..................": Forecasting Winners and Losers from User Predictions on Twitter; Swamy, Ritter, de Marneffe; EMNLP, 2017. Paper.
Sep 18Potential research questions and project ideas Fringe News Networks: Dynamics of US News Viewership following the 2020 Presidential Election; KhudaBukhsh*, Sarkar*, Kamlet, Mitchell; ACM Web Science 2022. Paper.

Project Proposal is due (Monday, September 23, 11.59 PM AoE).

Sep 23Predicting political outcomesReading summary due for the week (before class starts).

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series; O'Connor, Balasubramanyan, Routledge, Smith; ICWSM 2010. Paper.

(Optional reading) Does the @realDonaldTrump Really Matter to Financial Markets?; Benton, Philips; American Journal of Political Science, 2020. Paper..
Sep 25Large language models and politicsMining Insights from Large-scale Corpora Using Fine-tuned Language Models; Palakodety, KhudaBukhsh, Carbonell; ECAI 2020. Paper.

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models; Feng, Park, Liu, Tsvetkov; ACL 2023. Paper.
Sep 30Political usersReading summary due for the week (before class starts).

Student presentations start from this lecture.
Classification without (Proper) Representation: Political Heterogeneity in Social Media and Its Implications for Classification and Behavioral Analysis; Alkiek, Zhang, Jurgens; ACL 2022. Paper. Presenter: Soumyajit
Oct 2Censorship and moderation Censorship and Deletion Practices in Chinese Social Media; Bamman, O'Connor, and Smith; First Monday, 2012. Paper Presenter: Raman.

You Can't Stay Here: The Efficacy of Reddit's 2015 Ban Examined Through Hate Speech; Chandrasekharan, Pavalanathan, Srinivasan, Glynn, Eisenstein, Gilbert; CSCW 2017. Paper. Presenter: Sam
Oct 7Content moderations and politics Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection; Sap, Swayamdipta, Vianna, Zhou, Choi, Smith; NAACL 2022 . Paper. Presenter: Rupa

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive; Weerasooriya, Dutta, Ranasinghe, Zampieri, Homan, KhudaBukhsh; EMNLP 2023. Presenter: Sujan
Oct 9Jailbreaking LLMs and a deeper look at LLM biasesUniversal and Transferable Adversarial Attacks on Aligned Language Models; Zou, Wang, Carlini, Nasr, Kolter, Fredrikson; ArXiv. Paper. Presenter: Arka

Down the Toxicity Rabbit Hole: A Novel Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny; Dutta, Khorramrouz, Dutta, KhudaBukhsh; IJCAI 2024. Paper. Presenter: Ashique
Oct 14No class. Fall break.
Oct 16No class. Fall break.
Oct 21Spotlight talks (5-7 minutes per group) Reading summary due for the week (before class starts).

Midterm project evaluation
Oct 23PolarizationPolitical Polarization in Online News Consumption; Garimella, Smith, Weiss, West; ICWSM 2021. Presenter: Miftahul
Oct 28Polarization Reading summary due for the week (before class starts).

Aligning Multidimensional Worldviews and Discovering Ideological Differences; Milbauer, Mathew, Evans; EMNLP 2021. Presenter: Mallikarjuna

Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings; Demszky, Garg, Voigt, Zou, Shapiro, Gentzkow, Jurafsky; NAACL 2019. Paper. Presenter: Pravallika
Oct 30ControversySomething's Brewing! Early Prediction of Controversy-causing Posts from Discussion Features; Hessel, Lee; NAACL 2019. Paper. Presenter: Khushi

Events and Controversies: Influences of a Shocking News Event on Information Seeking; Koutra, Bennett, Horvitz; WWW 2015. Paper. Presenter: Kelsey
Nov 4Hate speech, counter speech Reading summary due for the week (before class starts).

Thou Shalt Not Hate: Countering Online Hate Speech; Mathew, Saha, Tharad, Rajgaria, Singhania, Maity, Goyal, Mukherjee; ICWSM 2019. Paper. Presenter: Jonathan


Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas; Palakodety, KhudaBukhsh, Carbonell; AAAI 2020. Paper. Presenter: Ashutosh

(Optional reading) Hate Speech Detection is Not as Easy as You May Think; Arango, Pèrez, Poblete; SIGIR 2019. Paper.
Nov 6Dog Whistles.Guest Presenter: Rijul Magu.
Nov 11Fake news and misinformation Reading summary due for the week (before class starts).

Capturing the Style of Fake News; Przybyla; AAAI 2020. Paper. Presenter: Sooraj

Political Knowledge and Misinformation in the Era of Social Media: Evidence from the 2015 U.K. Election; Munger, Egan, Nagler, Ronen, Tucker; British Journal of Political Science, forthcoming. Paper.
Nov 13PolicingA Murder and Protests, the Capitol Riot, and the Chauvin Trial: Estimating Disparate News Media Stance; Dutta, Li, Nagin, KhudaBukhsh; IJCAI 2022. Paper. Presenter: Sujan

(Optional reading:) Language from police body camera footage shows racial disparities in officer respect; Voigt, Camp, Prabhakaran, Hamilton, Hetey, Griffiths, Jurgens, Jurafsky, Eberhardt; PNAS, 2017. Paper.
Nov 18Politics and news media Reading summary due for the week (before class starts).

Strategic Candidate Entry and Congressional Elections in the Era of Fox News; Arceneaux, Dunaway, Johnson, Vander Wielen; American Journal of Political Science, 2020. Paper. Presenster: Christopher


Partisanship, Propaganda, and Disinformation: Online Media and the 2016 U.S. Presidential Election; Faris, Roberts, Etling, Bourassa, Zuckerman, Benkler; SSRN 2017. Paper.
Nov 20Immigration and climate change Reading summary due for the week (before class starts).

Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration; Card, Chang, Becker, Mendelsohn, Voigt, Boustan, Abramitzky, Jurafsky; PNAS 2022. Paper. Presenter: Yogesh
Dec 2Final project presentations
Dec 4Final project presentations

Useful links

A parallel CMU course on voting.
An excellent course on natural language processing.
An excellent course on advanced natural language processing.
An excellent book (in progress) by Dan Jurafsky and James H. Martin to learn fundamental concepts of NLP.