Special Topic: Human-Centered NLP

Canvas:

https://canvas.cmu.edu/courses/32856

Lecture recording:

on Canvas, Zoom Cloud Recording

Semester:

2023 Spring (05-499/899)

Instructors:

Sherry Tongshuang Wu

Time:

Monday / Wednesday 9:30-10:50am

Location:

WEH 4625

“HCI people design useful things that NLP people cannot build; NLP people make things that nobody uses.” (Yang et al., 2019) This course aims to help students develop the mindsets and skills necessary to build useful NLP systems, by exploring the intersection between HCI and NLP. The course will discuss the strengths and weaknesses of the status quo NLP techniques in interactive scenarios, as well as ways to integrate humans into designing, developing, and evaluating NLP resources, models, and systems. Importantly, it will highlight topics shared between HCI and NLP (data curation, model interpretability, etc.) and reflect on how the two communities approach similar topics differently.

The primary goal of the course is offer an overview of HCI+NLP, and to help students get access to, and understand, both HCI and NLP research papers and methods. The course will be half lecture and half seminar style – every 1-2 weeks, students will sign up to lead the discussion of certain given papers.

Coursework includes lectures, paper readings, class presentations, and group projects; It will not contain exams.

Schedule and Readings

This schedule is tentative and subject to changes.

Kick-off Session
Wed, Jan 18
What is Human-Centered NLP? + Course Logistics (Lecture)
Definition of HCNLP, connection to relevant fields, and logistics.
Slides
NLP crash course - tasks and some applications
Mon, Jan 23
Natural Language Understanding Tasks and Applications (Lecture)
Basics of text data processing: tokenization, text classification, token classification, token relation detection, etc.
Slides
Optional Training Classifiers with Natural Language Explanations by Braden Hancock et al. in ACL 2018
Optional Towards natural language-based visualization authoring by Yun Wang et al. in VIS 2022
Wed, Jan 25
Natural Language Generation + Word Embeddings (Lecture)
Basics of text generation, language modeling, and an overview on static word embedding.
Slides
Optional Scim: Intelligent Skimming Support for Scientific Papers by Raymond Fok et al. in ArXiv 2022
Optional TLDR: Extreme Summarization of Scientific Documents by Isabel Cachola et al. in EMNLP 2020
Optional Explainpaper.com by in 2022
Mon, Jan 30
State-of-the-art modeling (Lecture)
Basics of language modeling, pretraining, and tooling (HuggingFace), etc.
Slides
Optional On the Opportunities and Risks of Foundation Models (Introduction) by Rishi Bommasani et al. in ArXiv 2022
Reflect Humans in Model Development: Data + Evaluation
Wed, Feb 01
Data Collection (Lecture)
Annotation task design, annotator population
Slides
Optional Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks by Paul Röttger et al. in NAACL 2022
Optional Jury Learning: Integrating Dissenting Voices into Machine Learning Models by Mitchell L. Gordon et al. in CHI 2022
Mon, Feb 06
Data Curation (Lecture)
Data artifacts & fixes, dataset difficulties, and data updates
Slides
Optional Annotation Artifacts in Natural Language Inference Data by Suchin Gururangan et al. in NAACL 2018
Optional Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamic by Swabha Swayamdipta et al. in EMNLP 2020
Optional Understanding and Visualizing Data Iteration in Machine Learning by Fred Hohman et al. in CHI 2020
Optional On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations by Roy Schwartz, Gabriel Stanovsky in NAACL 2022
Wed, Feb 08
The Importance of Data (Reading) Discussion Slides 1 Slides 2
Required Dynabench: Rethinking Benchmarking in NLP by Douwe Kiela et al. in NAACL 2021
Optional Changing the World by Changing the Data by Anna Rogers in ACL 2021
Mon, Feb 13
Model Eval 1 - Standard Metrics & Pitfalls (Lecture)
Standard metrics & limitations, quality of good benchmarks
Slides
Optional Beyond Accuracy: Behavioral Testing of NLP models with CheckList by Marco Tulio Ribeiro et al. in ACL 2020
Wed, Feb 15
Guest Lecture (Elizabeth Clark): Model Eval 2 - Best practices of Human Evaluation (Lecture)
Some important variables in human evaluation (e.g., statistical test, between vs. within subject studies, etc.)
Slides
Mon, Feb 20
Human-in-the-loop Evaluations (Reading) Discussion Slides 1 Slides 2
Required Evaluating Human-Language Model Interaction by Mina Lee et al. in 2022
Wed, Feb 22
Guest Lecture (Maarten Sap): Responsible NLP (Lecture)
Bias definition, quantification and mitigation
Slides
Mon, Feb 27
Feedback to NLP models (Lecture)
Different forms of human feedback & different modeling approaches to incorporate human feedback, Reinforcement Learning from Human Feedback (RLHF)
Slides
Optional Putting humans in the natural language processing loop: A survey. by Zijie Wang et al. in HCINLP 2021
Wed, Mar 01
chatGPT / InstructGPT (Reading)
Learning from human feedback (around instruct GPT)
Discussion Slides 1 Slides 2
Required Training language models to follow instructions with human feedback by Long Ouyang et al. in ArXiV 2022
Mon, Mar 06
No Class - Spring Break Slides
Wed, Mar 08
No Class - Spring Break Slides
Deployed models: Design and Test Model-infused Systems with Humans
Mon, Mar 13
Human-Model Interaction, Prompting (Lecture)
Desiderata for human-model interaction, more on prompting
Slides
Optional Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork by Gagan Bansal et al. in AAAI 2021
Wed, Mar 15
Guest Lecture (Katy Gero): Human Controllability, Assisted Writing (Lecture)
AI+writers, How humans use model-generated-text.
Slides
Optional Metaphoria: An Algorithmic Companion for Metaphor Creation by Katy Ilonka Gero et al. in CHI 2019
Optional Sparks: Inspiration for Science Writing using Language Models by Katy Ilonka Gero et al. in DIS 2022
Mon, Mar 20
Project Presentation & Peer Feedback - 1 (Presentation) Slides
Wed, Mar 22
Project Presentation & Peer Feedback - 2 (Presentation) Slides
Mon, Mar 27
Build the system 1: Design thinking (Lecture)
User-Centered Design, interview studies, etc.
Slides
Wed, Mar 29
Prototyping with NLP Models (Reading) Discussion Slides 1 Slides 2
Deadline Assignment 1 Peer Grading
Required Planning for Natural Language Failures with the AI Playbook by Matthew K. Hong et al. in CHI 2021
Mon, Apr 03
Build the System 2: Interaction Design + Usability Testing (Lecture)
Mixed-initiative sytems, interface evaluation
Slides
Optional Principles of Mixed-Initiative User Interfaces by Eric Horvitz in CHI 1999
Optional Human Effort and Machine Learnability in Computer Aided Translation by Spence Green et al. in EMNLP 2014
Wed, Apr 05
NLP models in the Wild (Reading) Discussion Slides 1 Slides 2
Deadline Assignment 2: Prompting+Crowdsourcing strategies
Required Interacting with Opinionated Language Models Changes Users’ Views by Maurice Jakesch et al. in 2022
More on Model Understanding and Interpretability
Mon, Apr 10
Interpretability 1: Explanation Methods (Lecture)
Different explanation generation methods e.g., LIME, NL explanation, etc.
Slides
Optional 'Why Should I Trust You?': Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro et al. in KDD 2016
Optional 'Turorial: Interpreting Predictions of NLP Models by Eric Wallace et al. in EMNLP 2020
Wed, Apr 12
Interpretability 2: Explanation Evaluation (Lecture)
The automated and human-centered evaluation of explanations
Slides
Optional Human-centered Evaluations of Explanations by Jordan Boyd-Graber et al. in NAACL 2022
Mon, Apr 17
Wed, Apr 19
Model Visualization (Lecture)
Different ways to visualize data, model decisions, and interpretations.
Slides
Deadline Assignment 2 Peer Grading
Optional Interfaces for Explaining Transformer Language Models by Jay Alammar in CHI 2022
Optional Errudite: Scalable, Reproducible, and Testable Error Analysis by Tongshuang Wu et al. in ACL 2019
Mon, Apr 24
Final project presentation - 1 (Presentation) Slides
Wed, Apr 26
Final project presentation - 2 (Presentation) Slides
Fri, May 05
Final project report (No Class) Slides
Deadline Final project report submission

Additional course information available on Canvas.

Syllabus

Course Goals

The learning goals of the course are as follows:

Notice that this new course is mostly designed to be a graduate-level, semi-seminar-style course for students interested in HCI+NLP research. This means:

Prerequisites

There is no explicit prerequisite; However, students are expected to (1) be proficient in Python (for completing assignments), and (2) know basic ML concept — To the extent that you understand concepts like train/dev/test set, model fitting, feature, supervised learning, etc. (We will not cover these in this course!)

If you are familiar with NLP and relevant programming libraries (e.g. HuggingFace), you might find certain parts of the course introducing NLP concepts significantly easier (or, unnecessary :D).

Course Materials and Communications

Major Research Work

Grade

The tentative breakdown for grading is below. As a reminder, here is the university policy on academic integrity. See major course work

Late Day Policy: Attending class in person and submitting course deliverables on time is critical. However, we realize that things happen, and that you might sometimes not be able to turn in your assignments. To accommodate this, you will each receive 4 free late days. Beyond those days, you receive a 5% penalty for each day late. You are welcome to budget late days as you like for the two assignments, two grading deadlines, and the final report.

Project presentation, paper presentation, and paper discussions cannot use late days as they are time sensitive; Final project report cannot use more than two late day as it affects grade submission.

Assignments

  1. Model Evaluation (link to opensourced repo). We will evaluate existing Huggingface models with a model testing framework called CheckList. Hopefully the assignment will help you get familiar with the basic programming environment setup for playing with NLP models locally, and the concept of model evaluation.

  2. Prompting via Crowdsourcing Strategies. We will apply decades of research findings in Crowdsourcing instructions to LLM prompting. we will select one paper in Crowdsourcing, replicate the idea by writing prompts to instruct different LLM modules as if they are crowdworkers, see whether the crowdsourcing task design strategies can transfer to LLM prompting. Hopefully the assignment will help you get familiar with the concept of LLM prompting, the OpenAI interface, and some crowdsourcing techniques.

Each assignment will have 100 points, and will be peer-graded:

We will provide more instructions, grading details, and the starter code repo through GitHub Classroom. More details will be posted on Canvas once the assignments are released.

Assignment Peer Grading

You will be assigned to grade other students’ assignments. This is a way for you to see how others think about a given task. You will get full grades for peer gradings unless:

This is how we will see annotator disagreement in real time :)

Presentations and Discussions

A large proportion of your grade will depend on reading and digesting papers from top HCI or NLP venues. The grading will split into two parts:

  1. Paper presentation. The reading lectures will be led by students. 2-3 students will sign up to lead the same session, prepare the slides deck and discuss multiple aspects of the paper. To achieve deep paper digestion, you can take inspirations from the role playing model of Jacobson and Raffel. No need to pick explicit roles, just cover relevant discussion points.

  2. Earn participation scores through discussions. Before each reading lecture, we will open corresponding discussion threads on Canvas. Students not leading the session are expected to participate submit comments on those required readings on Canvas. This is how you earn participation scores! Good comments typically exhibit one or more of the following:

Final Project

The most substantial portion of your coursework is a team-based project (2-4 people). You will self-propose a project broadly relevant to HCI+NLP, with four milestones (they will be posted on Canvas when the time comes):

  1. Form research group + topic selection. You will fill in a short Google Form that documents your group members, and a general description of your project. This will act as a forcing function for you to start think about the project. In the form, you will mostly address these questions: 1 (what are you trying to do), 2 (how is it done today), 3 (what’s new), 4 (who cares), 5 (your proposed method), and 6 (metrics of success). If you are looking for project partners, please post to Canvas!

  2. Midterm presentation + peer feedback. Shortly after the Spring Break, Each group will do a 7-8 minute in-class presentation on the project progress, so the instructor and other students can provide feedback.

  3. Final presentation. Each group will do a 7-8 minute in-class presentation on the final project result. This will be similar to the midterm presentation.

  4. Final report. Each group will also submit a 4-8 page final report (not counting references) written in the form of a conference paper submission. The paper might include content that is typical of papers that appear at ACL or CHI.

Special Thanks!

  1. Thanks to Diyi Yang who also created a Human-Centered NLP course at Stanford during a similar time. She taught me the importance of interchanging lectures with reading assignments, helped me revise the course syllabus, and pointed me to awesome resources for various topics.
  2. Thanks to Haiyi Zhu for inspiring me on the course assignment design!
  3. Thanks to Jay Alammar and Huggingface tutorial whose interactive visualization on Transformers, interpretability, etc. helped ground the modeling part of the course.
  4. Lectures on many topics were built upon amazing conference tutorials. Thanks to the organizers for their thorough reflection on specific topics, and for allowing me to borrow the materials. Most noteably: EMNLP 2020: Interpreting Predictions of NLP Models; EMNLP 2021: Crowdsourcing Beyond Annotation: Case Studies in Benchmark Data Collection; NAACL 2022: Human-centered Evaluations of Explanations.
  5. Thanks to the guest speakers for giving expert-covers on several topics. At one point, I’m gonna do a all-guest-speaker course – I can very easily think of someone more knowledgeable than me on every topic!
  6. I also borrowed specific materials here and there from various other courses and online blog posts. I tried my best to credit everyone in slides. Thank you!

Other Information

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you are experiencing COVID-like symptoms or have a recent COVID exposure, do not attend class if we are meeting in-person. Please email the instructors for accomodations.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help; call 412-268-2922 and visit their website at www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police. On campus call CMU Police: 412-268-2323. Off campus: 911.

If you have questions about this, please let the instructors know. Thank you, and have a great semester.