Special Topic: Human-Centered NLP

Canvas:

https://canvas.cmu.edu/courses/32856

Lecture recording:

on Canvas, Zoom Cloud Recording

Semester:

2023 Spring (05-499/899)

Instructors:

Sherry Tongshuang Wu

Time:

Monday / Wednesday 9:30-10:50am

Location:

WEH 4625

“HCI people design useful things that NLP people cannot build; NLP people make things that nobody uses.” (Yang et al., 2019) This course aims to help students develop the mindsets and skills necessary to build useful NLP systems, by exploring the intersection between HCI and NLP. The course will discuss the strengths and weaknesses of the status quo NLP techniques in interactive scenarios, as well as ways to integrate humans into designing, developing, and evaluating NLP resources, models, and systems. Importantly, it will highlight topics shared between HCI and NLP (data curation, model interpretability, etc.) and reflect on how the two communities approach similar topics differently.

The primary goal of the course is offer an overview of HCI+NLP, and to help students get access to, and understand, both HCI and NLP research papers and methods. The course will be half lecture and half seminar style – every 1-2 weeks, students will sign up to lead the discussion of certain given papers.

Coursework includes lectures, paper readings, class presentations, and group projects; It will not contain exams.

Schedule and Readings

This schedule is tentative and subject to changes.

Wed, Jan 18

What is Human-Centered NLP? + Course Logistics (Lecture)

Definition of HCNLP, connection to relevant fields, and logistics.

Slides

Mon, Jan 23

Natural Language Understanding Tasks and Applications (Lecture)

Basics of text data processing: tokenization, text classification, token classification, token relation detection, etc.

Slides

Optional Training Classifiers with Natural Language Explanations by Braden Hancock et al. in ACL 2018

Optional Towards natural language-based visualization authoring by Yun Wang et al. in VIS 2022

Optional Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols by Andrew Head et al. in CHI 2021

Wed, Jan 25

Natural Language Generation + Word Embeddings (Lecture)

Basics of text generation, language modeling, and an overview on static word embedding.

Slides

Optional Scim: Intelligent Skimming Support for Scientific Papers by Raymond Fok et al. in ArXiv 2022

Optional TLDR: Extreme Summarization of Scientific Documents by Isabel Cachola et al. in EMNLP 2020

Optional Explainpaper.com by in 2022

Mon, Jan 30

State-of-the-art modeling (Lecture)

Basics of language modeling, pretraining, and tooling (HuggingFace), etc.

Slides

Optional On the Opportunities and Risks of Foundation Models (Introduction) by Rishi Bommasani et al. in ArXiv 2022

Optional Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer by Adam Roberts, Colin Raffel in 2020

Wed, Feb 01

Data Collection (Lecture)

Annotation task design, annotator population

Slides

Deadline Reading 0: Sign up for paper presentation

Optional Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks by Paul Röttger et al. in NAACL 2022

Optional Jury Learning: Integrating Dissenting Voices into Machine Learning Models by Mitchell L. Gordon et al. in CHI 2022

Mon, Feb 06

Data Curation (Lecture)

Data artifacts & fixes, dataset difficulties, and data updates

Slides

Optional Annotation Artifacts in Natural Language Inference Data by Suchin Gururangan et al. in NAACL 2018

Optional Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamic by Swabha Swayamdipta et al. in EMNLP 2020

Optional Understanding and Visualizing Data Iteration in Machine Learning by Fred Hohman et al. in CHI 2020

Optional On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations by Roy Schwartz, Gabriel Stanovsky in NAACL 2022

Wed, Feb 08

The Importance of Data (Reading) Discussion Slides 1 Slides 2

Required Dynabench: Rethinking Benchmarking in NLP by Douwe Kiela et al. in NAACL 2021

Required Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI by Nithya Sambasivan et al. in CHI 2021

Optional Changing the World by Changing the Data by Anna Rogers in ACL 2021

Mon, Feb 13

Model Eval 1 - Standard Metrics & Pitfalls (Lecture)

Standard metrics & limitations, quality of good benchmarks

Slides

Deadline Group project: Form group + short project description

Optional Beyond Accuracy: Behavioral Testing of NLP models with CheckList by Marco Tulio Ribeiro et al. in ACL 2020

Optional Ditch the Gold Standard: Re-evaluating Conversational Question Answering by Huihan Li et al. in ACL 2022

Wed, Feb 15

Guest Lecture (Elizabeth Clark): Model Eval 2 - Best practices of Human Evaluation (Lecture)

Some important variables in human evaluation (e.g., statistical test, between vs. within subject studies, etc.)

Slides

Optional How to Do Human Evaluation: Best Practices for User Studies in NLP by in 2021

Mon, Feb 20

Human-in-the-loop Evaluations (Reading) Discussion Slides 1 Slides 2

Required All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text by Elizabeth Clark et al. in ACL 2021

Required Evaluating Human-Language Model Interaction by Mina Lee et al. in 2022

Wed, Feb 22

Guest Lecture (Maarten Sap): Responsible NLP (Lecture)

Bias definition, quantification and mitigation

Slides

Optional Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings by Tolga Bolukbasi et al. in NeurIPS 2016

Mon, Feb 27

Feedback to NLP models (Lecture)

Different forms of human feedback & different modeling approaches to incorporate human feedback, Reinforcement Learning from Human Feedback (RLHF)

Slides

Optional Putting humans in the natural language processing loop: A survey. by Zijie Wang et al. in HCINLP 2021

Wed, Mar 01

chatGPT / InstructGPT (Reading)

Learning from human feedback (around instruct GPT)

Discussion Slides 1 Slides 2

Deadline Assignment 1: Evaluate existing Huggingface models with CheckList

Required Training language models to follow instructions with human feedback by Long Ouyang et al. in ArXiV 2022

Required Power to the People: The Role of Humans in Interactive Machine Learning by Saleema Amershi et al. in 2022

Optional ChatGPT: Optimizing Language Models for Dialogue by OpenAI in 2022

Mon, Mar 06

No Class - Spring Break Slides

Wed, Mar 08

No Class - Spring Break Slides

Mon, Mar 13

Human-Model Interaction, Prompting (Lecture)

Desiderata for human-model interaction, more on prompting

Slides

Optional CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities by Mina Lee et al. in CHI 2022

Optional Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork by Gagan Bansal et al. in AAAI 2021

Wed, Mar 15

Guest Lecture (Katy Gero): Human Controllability, Assisted Writing (Lecture)

AI+writers, How humans use model-generated-text.

Slides

Deadline Signup for the milestone presentation

Optional In2Writing: Intelligent and Interactive Writing Assistants by In2Writing Team in 2022

Optional Metaphoria: An Algorithmic Companion for Metaphor Creation by Katy Ilonka Gero et al. in CHI 2019

Optional Sparks: Inspiration for Science Writing using Language Models by Katy Ilonka Gero et al. in DIS 2022

Mon, Mar 20

Project Presentation & Peer Feedback - 1 (Presentation) Slides

Wed, Mar 22

Project Presentation & Peer Feedback - 2 (Presentation) Slides

Mon, Mar 27

Build the system 1: Design thinking (Lecture)

User-Centered Design, interview studies, etc.

Slides

Optional Sketching NLP: A Case Study of Exploring the Right Things To Design with Language Intelligence by Qian Yang et al. in CHI 2019

Wed, Mar 29

Prototyping with NLP Models (Reading) Discussion Slides 1 Slides 2

Deadline Assignment 1 Peer Grading

Required Planning for Natural Language Failures with the AI Playbook by Matthew K. Hong et al. in CHI 2021

Required Social Simulacra: Creating Populated Prototypes for Social Computing Systems by Joon Sung Park et al. in UIST 2022

Mon, Apr 03

Build the System 2: Interaction Design + Usability Testing (Lecture)

Mixed-initiative sytems, interface evaluation

Slides

Optional Principles of Mixed-Initiative User Interfaces by Eric Horvitz in CHI 1999

Optional Human Effort and Machine Learnability in Computer Aided Translation by Spence Green et al. in EMNLP 2014

Optional Predictive Translation Memory: A Mixed-Initiative System for Human Language Translation by Spence Green et al. in UIST 2014

Wed, Apr 05

NLP models in the Wild (Reading) Discussion Slides 1 Slides 2

Deadline Assignment 2: Prompting+Crowdsourcing strategies

Required Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks by Zae Myung Kim et al. in EMNLP 2022

Required Interacting with Opinionated Language Models Changes Users’ Views by Maurice Jakesch et al. in 2022

Mon, Apr 10

Interpretability 1: Explanation Methods (Lecture)

Different explanation generation methods e.g., LIME, NL explanation, etc.

Slides

Optional Explanation in Artificial Intelligence: Insights from the Social Sciences by Tom Miller in arXiv 2018

Optional 'Why Should I Trust You?': Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro et al. in KDD 2016

Optional 'Turorial: Interpreting Predictions of NLP Models by Eric Wallace et al. in EMNLP 2020

Wed, Apr 12

Interpretability 2: Explanation Evaluation (Lecture)

The automated and human-centered evaluation of explanations

Slides

Optional Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing by Sarah Wiegreffe et al. in NeurIPS 2021

Optional Human-centered Evaluations of Explanations by Jordan Boyd-Graber et al. in NAACL 2022

Optional Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance by Gagan Bansal et al. in CHI 2021

Mon, Apr 17

The Need of Interpretation (Reading) Discussion Slides 1 Slides 2

Required Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs by Harini Suresh et al. in CHI 2021

Required Rethinking Explainability as a Dialogue: A Practitioner's Perspective by Himabindu Lakkaraju et al. in arXiv 2022

Wed, Apr 19

Model Visualization (Lecture)

Different ways to visualize data, model decisions, and interpretations.

Slides

Deadline Assignment 2 Peer Grading

Optional Interfaces for Explaining Transformer Language Models by Jay Alammar in CHI 2022

Optional Errudite: Scalable, Reproducible, and Testable Error Analysis by Tongshuang Wu et al. in ACL 2019

Mon, Apr 24

Final project presentation - 1 (Presentation) Slides

Wed, Apr 26

Final project presentation - 2 (Presentation) Slides

Fri, May 05

Final project report (No Class) Slides

Deadline Final project report submission

Additional course information available on Canvas.

Syllabus

Course Goals

The learning goals of the course are as follows:

To introduce basic concepts of NLP and HCI, to the extent that students can read research papers in the relevant field.
To introduce a variety of emerging topics related to human-centered NLP.
To introduce practical tools for building & analyzing, NLP models, and building interactive systems.

Notice that this new course is mostly designed to be a graduate-level, semi-seminar-style course for students interested in HCI+NLP research. This means:

The course is largely project-oriented, and does not involve exames.
A large proportion of your grade will depend on research paper reading and digestion.
This is a special topic course (not well-established) and may only be offered once, so please expect some glitches in the schedule or assignments :)
I cannot guarantee it will count as a part of a technical requirement.

Prerequisites

There is no explicit prerequisite; However, students are expected to (1) be proficient in Python (for completing assignments), and (2) know basic ML concept — To the extent that you understand concepts like train/dev/test set, model fitting, feature, supervised learning, etc. (We will not cover these in this course!)

If you are familiar with NLP and relevant programming libraries (e.g. HuggingFace), you might find certain parts of the course introducing NLP concepts significantly easier (or, unnecessary :D).

Course Materials and Communications

Slides and readings will be on this page.
Assignments and discussions will be on Canvas when their time comes. The course webpage will have links that direct you to them.
If you have questions related to course materials or logistics, please post on Canvas Discussion page, so either the instructor or other students can help answer that (I will check Discussion regularly).
If you have special requests, please email the instructor at sherryw@cs.cmu.edu.
This is an in-person class by default, but if you cannot come, there’s a zoom option on Canvas.

Major Research Work

Grade

The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity. See major course work

36% Homework Assignments
- 30% two assignments
- 6% two peer grading
40% Final Project
- 5% Form group + short project description
- 10% Midterm presentation
- 15% Final presentation
- 10% Project report
14% Paper presentation
10% Participation and Discussions (Through discussions on Canvas)

Late Day Policy: Attending class in person and submitting course deliverables on time is critical. However, we realize that things happen, and that you might sometimes not be able to turn in your assignments. To accommodate this, you will each receive 4 free late days. Beyond those days, you receive a 5% penalty for each day late. You are welcome to budget late days as you like for the two assignments, two grading deadlines, and the final report.

Project presentation, paper presentation, and paper discussions cannot use late days as they are time sensitive; Final project report cannot use more than two late day as it affects grade submission.

Assignments

Model Evaluation (link to opensourced repo). We will evaluate existing Huggingface models with a model testing framework called CheckList. Hopefully the assignment will help you get familiar with the basic programming environment setup for playing with NLP models locally, and the concept of model evaluation.
Prompting via Crowdsourcing Strategies. We will apply decades of research findings in Crowdsourcing instructions to LLM prompting. we will select one paper in Crowdsourcing, replicate the idea by writing prompts to instruct different LLM modules as if they are crowdworkers, see whether the crowdsourcing task design strategies can transfer to LLM prompting. Hopefully the assignment will help you get familiar with the concept of LLM prompting, the OpenAI interface, and some crowdsourcing techniques.

Each assignment will have 100 points, and will be peer-graded:

A base grade up to 80. I will try my best to make this easy to get, as long as you follow the instructions. If you find the assignment difficult, there will also be some ways to earn partial credits.
A peer-graded point, up to 20. Three students will rate your assignment on some pre-set Likert Scale (e.g. 1 being poorly constructed 5 being very impressive), and your score depends on the relative ranking of averaged your test score usefulness. you will get {20, 15, 10, 5} scores if your averaged score is ranked top {25%, 50%, 75%, 100%} percent respectively.

We will provide more instructions, grading details, and the starter code repo through GitHub Classroom. More details will be posted on Canvas once the assignments are released.

Assignment Peer Grading

You will be assigned to grade other students’ assignments. This is a way for you to see how others think about a given task. You will get full grades for peer gradings unless:

You do not complete them or submit them late.
Your score deviates from other people’s rating drastically without justified reasoning (e.g. everyone gave 5 but you gave 1 because you wanted higher ranking).

This is how we will see annotator disagreement in real time :)

Presentations and Discussions

A large proportion of your grade will depend on reading and digesting papers from top HCI or NLP venues. The grading will split into two parts:

Paper presentation. The reading lectures will be led by students. 2-3 students will sign up to lead the same session, prepare the slides deck and discuss multiple aspects of the paper. To achieve deep paper digestion, you can take inspirations from the role playing model of Jacobson and Raffel. No need to pick explicit roles, just cover relevant discussion points.
Earn participation scores through discussions. Before each reading lecture, we will open corresponding discussion threads on Canvas. Students not leading the session are expected to participate submit comments on those required readings on Canvas. This is how you earn participation scores! Good comments typically exhibit one or more of the following:

Critiques of arguments made in the papers
Analysis of implications or future directions for work discussed in lecture or readings
Clarification of some point or detail presented in the class
Insightful questions about the readings or answers to other people’s questions
Links to web resources or examples that pertain to a lecture or reading

Final Project

The most substantial portion of your coursework is a team-based project (2-4 people). You will self-propose a project broadly relevant to HCI+NLP, with four milestones (they will be posted on Canvas when the time comes):

Form research group + topic selection. You will fill in a short Google Form that documents your group members, and a general description of your project. This will act as a forcing function for you to start think about the project. In the form, you will mostly address these questions: 1 (what are you trying to do), 2 (how is it done today), 3 (what’s new), 4 (who cares), 5 (your proposed method), and 6 (metrics of success). If you are looking for project partners, please post to Canvas!
Midterm presentation + peer feedback. Shortly after the Spring Break, Each group will do a 7-8 minute in-class presentation on the project progress, so the instructor and other students can provide feedback.
Final presentation. Each group will do a 7-8 minute in-class presentation on the final project result. This will be similar to the midterm presentation.
Final report. Each group will also submit a 4-8 page final report (not counting references) written in the form of a conference paper submission. The paper might include content that is typical of papers that appear at ACL or CHI.

Special Thanks!

Thanks to Diyi Yang who also created a Human-Centered NLP course at Stanford during a similar time. She taught me the importance of interchanging lectures with reading assignments, helped me revise the course syllabus, and pointed me to awesome resources for various topics.
Thanks to Haiyi Zhu for inspiring me on the course assignment design!
Thanks to Jay Alammar and Huggingface tutorial whose interactive visualization on Transformers, interpretability, etc. helped ground the modeling part of the course.
Lectures on many topics were built upon amazing conference tutorials. Thanks to the organizers for their thorough reflection on specific topics, and for allowing me to borrow the materials. Most noteably: EMNLP 2020: Interpreting Predictions of NLP Models; EMNLP 2021: Crowdsourcing Beyond Annotation: Case Studies in Benchmark Data Collection; NAACL 2022: Human-centered Evaluations of Explanations.
Thanks to the guest speakers for giving expert-covers on several topics. At one point, I’m gonna do a all-guest-speaker course – I can very easily think of someone more knowledgeable than me on every topic!
I also borrowed specific materials here and there from various other courses and online blog posts. I tried my best to credit everyone in slides. Thank you!

Other Information

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you are experiencing COVID-like symptoms or have a recent COVID exposure, do not attend class if we are meeting in-person. Please email the instructors for accomodations.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help; call 412-268-2922 and visit their website at www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226

If the situation is life threatening, call the police. On campus call CMU Police: 412-268-2323. Off campus: 911.

If you have questions about this, please let the instructors know. Thank you, and have a great semester.

Sherry @ CMU

Special Topic: Human-Centered NLP

Schedule and Readings

Syllabus

Course Goals

Prerequisites

Course Materials and Communications

Major Research Work

Grade

Assignments

Assignment Peer Grading

Presentations and Discussions

Final Project

Special Thanks!

Other Information

Respect for Diversity

Accommodations for Students with Disabilities

Health and Well-being