Special Topic: Human-Centered NLP

Canvas:

https://canvas.cmu.edu/courses/49594

Semester:

2025 Fall (05-499/899-D)

Instructors:

Sherry Tongshuang Wu (Office hour: Mondays 2-3pm, NSH 3525)

Time:

Monday / Wednesday 12:30-01:50pm

Location:

TEP 3808

“HCI people design useful things that NLP people cannot build; NLP people make things that nobody uses.” (Yang et al., 2019) This course aims to help students develop the mindsets and skills necessary to build useful NLP systems, by exploring the intersection between HCI and NLP. The course will discuss the strengths and weaknesses of the status quo NLP techniques in interactive scenarios – with a focus on LLMs and their applications, which has inspired profound transformation in the field of human-AI interaction. We will also discuss ways to integrate humans into designing, developing, and evaluating NLP resources, models, and systems. Importantly, it will highlight topics shared between HCI and NLP (agents, model trust, task delegation, data curation, etc.) and reflect on how the two communities approach similar topics differently. The primary goal of the course is to offer an overview of HCI+NLP, and to help students get access to, and understand, both HCI and NLP research papers and methods. The course will be half lecture and half seminar style – every 1-2 weeks, students will sign up to lead the discussion of certain given papers. Coursework includes lectures, paper readings, class presentations, and group projects; It will not contain exams.

Schedule and Readings

This schedule is tentative and subject to changes.

Kick-off
Mon, Aug 25
What is HCNLP? + Course Logistics (Lecture)
Definition of HCNLP, connections to adjacent fields, and course overview.
Slides
NLP Crash Course
Wed, Aug 27
NLU, NLG, and Word Embeddings (Lecture)
Pre-LLM NLP tasks & pipelines; mapping abstract tasks to real applications; quick tour of tooling.
Slides
Required Hugging Face Course: Classical NLP by Hugging Face in 2023
Mon, Sep 01
No Class (Labor Day) Slides
Deadline A0: AWS Account ID Collection for Credit Distribution
Wed, Sep 03
LLMs and Their Applications (Lecture)
Definition of LLMs (architectures, pre- and post-training); existing models; multimodality.
Slides
Deadline Reading 0: Sign up for paper presentation
Required The Illustrated Transformer by Jay Alammar in 2018
Optional Training language models to follow instructions with human feedback by Long Ouyang et al. in arXiv 2022
Optional Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafael Rafailov et al. in NeurIPS 2024
Mon, Sep 08
Guest Lecture: LLM Agents (Zora Wang) (Lecture)
Agent definitions, world models, memory, frameworks; HCI vs. AI perspectives.
Slides
Deadline Online discussion for Agentic Systems
Required Language Agents: Foundations, Prospects, and Risks (EMNLP 2024 Tutorial) by Yu Su et al. in EMNLP 2024 (Tutorial)
Optional Beyond Browsing: API-Based Web Agents by Yueqi Song et al. in ArXiv 2025
Optional Social Simulacra: Creating Populated Prototypes for Social Computing Systems by Joon Sung Park et al. in UIST 2022
Optional AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents by Maksym Andriushchenko et al. in ICLR 2025
Wed, Sep 10
Agentic Systems (discussion) (Reading)
Debate: capabilities & beneficiaries of agents; autonomy vs. responsibility; 'intelligent collaborators' vs. workflow wrappers.
Slides
Required Challenges in Human-Agent Communication by Gagan Bansal et al. in 2024
Required What Are Tools Anyway? A Survey from the Language Model Perspective by Zhiruo Wang et al. in arXiv 2024
Mon, Sep 15
LLMs and Their Applications (cont') (Lecture)
Multimodality, discussions on when to use what models
Slides
Reflect Humans in Model Development
Wed, Sep 17
Desiderata to Bake Into Models (I) (Lecture)
Instruction following; rubric-based & LLM-based eval; safety & privacy.
Slides
Optional Checklists Are Better Than Reward Models For Aligning Language Models by Vijay Viswanathan et al. in arXiv 2025
Mon, Sep 22
Desiderata to Bake Into Models (II) (Lecture)
ToM & persona; collaboration capabilithy; task generalizability; tradeoffs between desiderata.
Slides
Optional LLM Evaluators Recognize and Favor Their Own Generations by Arjun Panickssery et al. in NeurIPS 2024
Required CollabLLM: From Passive Responders to Active Collaborators by Shirley Wu et al. in ICLR 2025
Required Human-Centered Evaluation of Language Technologies (Tutorial) by EMNLP 2024 Tutorial in EMNLP 2024
Wed, Sep 24
Data-in-the-Wild: Collection, Curation, and Cleaning (Lecture)
What counts as 'good data'; annotator populations; curation & augmentation; data quality metrics; documentation & sharing.
Slides
Deadline Group project: Form group + short project description
Optional WildChat: 1M ChatGPT Interaction Logs in the Wild by Wenting Zhao et al. in ICLR 2024
Optional Position: Measure Dataset Diversity, Don't Just Claim It by Dora Zhao et al. in arXiv 2024
Optional Data Feminism for AI by Lauren Klein, Catherine D’Ignazio in FAccT 2024
Mon, Sep 29
How human data drive LLMs (Lecture)
What usage data to collect/analyze; what to learn from it; training objectives; ties to system design.
Slides
Deadline Online discussion for Data, Desiderata, and Evaluation
Required Collective Constitutional AI: Aligning a Language Model with Public Input by Saffron Huang et al. in FAccT 2024
Required STELA: a community-centered approach to norm elicitation for AI alignment by Stevie Bergman et al. in Scientific Reports 2024
Optional A Taxonomy for Human-LLM Interaction Modes by Jie Gao et al. in CHI EA 2024
Optional Show, Don't Tell: Aligning LMs with Demonstrated Feedback by Omar Shaikh et al. in arXiv 2024
Wed, Oct 01
Data, Desiderata, and Evaluation (discussion) (Reading)
Debate on ethical & useful data collection; what to prioritize in model capabilities; whether we should focus on dataset curation vs. learning from messy usage.
Slides
Required Identifying the risks of LM agents with an LM-emulated sandbox by Yangjun Ruan et al. in arXiv 2023
Design and Test Model-infused Systems with Humans
Mon, Oct 06
Human–LLM Interaction and Prompting (Lecture)
Prompting basics & pitfalls; requirements-oriented prompting; synthesizers; modeling overconfidence; user mental models.
Slides
Required The Prompt Report: A Systematic Survey of Prompting Techniques by Sander Schulhoff et al. in arXiv 2024
Wed, Oct 08
Desiderata of Human–AI Collaboration (Lecture)
Important components for collaborative design across UI, infra, and interaction process; Trust & calibration; collaborative evaluation; user modeling.
Slides
Deadline Signup for the milestone presentation
Optional Formalizing Trust in Artificial Intelligence by Alon Jacovi et al. in FAccT 2021
Optional Guidelines for Human-AI Interaction by Amershi et al. in 2019
Required Task Completion Agents are Not Ideal Collaborators by Shannon Zejiang Shen et al. in arXiv 2025
Mon, Oct 13
No Class (Fall Break) (No-class) Slides
Wed, Oct 15
No Class (Fall Break) (No-class) Slides
Mon, Oct 20
Midterm Project Presentations — Session 1 (Presentation) Slides
Wed, Oct 22
Midterm Project Presentations — Session 2 (Presentation) Slides
Mon, Oct 27
Design Space for Human–AI Systems (Lecture)
Autonomy, control, interaction paradigms, trust/understanding; real-system examples; generative UI.
Slides
Optional Stakeholder-centric participation in LLMs for health systems by Zhiyuan Wang et al. in Nature 2025
Optional Rehearsal: Simulating conflict to teach conflict resolution by Omar Shaikh et al. in arXiv 2023
Wed, Oct 29
Guest Lecture: Theory of Mind (Chelsea Wang) (Lecture) Slides
Mon, Nov 03
Guest Lecture: Case Studies — Coding Agents (Valerie Chen) (Lecture) Slides
Deadline Assignment 1: Building an Agent
Deadline Online discussion for Agentic Systems
Wed, Nov 05
How Should HCI Contribute to Model Development? (discussion) (Reading)
Connection between UX and model architecture; the impact of participatory design; 'human-in-the-loop' meanings; evaluation metrics aligned with user needs.
Slides
Required Participation in the Age of Foundation Models by Harini Suresh et al. in FAccT 2024
Required Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks by Hope Schroeder et al. in NAACL Findings 2025
Optional Power to the People? Opportunities and Challenges for Participatory AI by Abeba Birhane et al. in EAAMO 2022
Mon, Nov 10
Evaluation of Human–AI Interaction (Lecture)
Different dimensions for considering evaluation on human-AI interaction w.r.t both models and systems.
Slides
Required SPHERE: An Evaluation Card for Human-AI Systems by Qianou Ma et al. in ACL Findings 2025
Required Evaluating Human–Language Model Interaction by Mina Lee et al. in TMLR 2023
Optional The RealHumanEval: Evaluating LLMs' Abilities to Support Programmers by Hussein Mozannar et al. in arXiv 2024
Optional Not Just Novelty: A Longitudinal Study on Utility and Customization of AI Workflows by Tao Long, Katy Ilonka Gero, Lydia B. Chilton in arXiv 2024
Wed, Nov 12
Social Implications (Lecture)
Discuss high-stakes use cases across Personalized education, Companion and support, LLMs for science discovery, Transformation to workforce.
Slides
Deadline Online discussion for Beneficial Use Cases
Optional GPTs are GPTs: Labor market impact potential of LLMs by Tyna Eloundou et al. in Science 2024
Optional Generative AI at Work by Erik Brynjolfsson, Danielle Li, Lindsey Raymond in NBER 2023
Optional Does Writing with Language Models Reduce Content Diversity? by Vishakh Padmakumar, He He in ICLR 2024
Mon, Nov 17
Beneficial and Non-Beneficial Use Cases (discussion) (Reading)
Deciding 'beneficial' deployments; particular dimensions e.g. companions & well-being; whether models democratize access to information or increases inequality, etc.
Slides
Required Impact of generative AI on socioeconomic inequalities by Valerio Capraro in PNAS Nexus 2024
Required Art or Artifice? LLMs and the False Promise of Creativity by Tuhin Chakrabarty et al. in CHI 2024
Optional Clinical safety & hallucination fidelity framework for LLMs by Elham Asgari in Digital Medicine 2025
Optional Emotional risks of AI companions demand attention in Nature Machine Intelligence 2025
Wed, Nov 19
Safety, Bias, Ethics (and/or Social Intelligence) (Lecture)
Cultural bias; stereotype datasets; values encoded in ML research; risks of LMs; anthropomorphism.
Slides
Optional The values encoded in machine learning research by Abeba Birhane et al. in FAccT 2022
Optional Challenges and Strategies in Cross-Cultural NLP by Daniel Hershcovich et al. in ACL 2022
Optional AnthroScore: A Computational Linguistic Measure of Anthropomorphism by Myra Cheng et al. in EACL 2024
Optional Taxonomy of risks posed by language models by Laura Weidinger et al. in FAccT 2022
Optional Unintended impacts of LLM alignment on global representation by Michael J. Ryan, William Held, Diyi Yang in arXiv 2024
Mon, Nov 24
Recap (Lecture) Slides
Deadline Signup for final presentation
Wed, Nov 26
No Class (Thanksgiving Break) (No-class) Slides
Mon, Dec 01
Final Project Presentations — Session 1 (Presentation) Slides
Deadline Assignment 2: Human-Centered Evaluation of Your Collaborative Agent
Wed, Dec 03
Final Project Presentations — Session 2 (Presentation) Slides
Fri, Dec 12
Final project report (No Class) Slides
Deadline Final project report submission

Additional course information available on Canvas.

Syllabus

Course Goals

The learning goals of the course are as follows:

Notice that this new course is mostly designed to be a graduate-level, semi-seminar-style course for students interested in HCI+NLP research. This means:

Prerequisites

There is no explicit prerequisite; However, students are expected to (1) be proficient in Python (for completing assignments). You should not take the course if you find programming or debugging extremely difficult because you will have to master several programming languages/concepts/libraries in very short order. That being said, the assignments that require these will have useful resources for brushing up on the topics. Students are also expected to (2) know basic ML concept — To the extent that you understand concepts like train/dev/test set, model fitting, feature, supervised learning, etc. (We will not cover these in this course!)

If you are familiar with NLP and relevant programming libraries (e.g. HuggingFace, Smolagents), you might find certain parts of the course introducing NLP concepts significantly easier (or, unnecessary :D).

Course Materials and Communications

Major Research Work

Grading

Assignments will be posted to canvas as well as their due dates. Each day late will result in a 10% deduction (up to a maximum of 50% off). Students caught cheating or plagiarizing will receive no credit for the assignment. As a reminder, here is the university policy on academic integrity.

Your final grade in this course will be based on:

Attendance

Lectures will be held in-person twice a week. A good portion of the learning in any class comes from intelligent discussion. If you don’t attend class, you cannot participate, and your performance in the class will reflect that. Rather than taking attendance, there will be pop quizzes and also artifacts collected at the end of class that were generated from in-class activities.

Excused absences this course accepts are medical and family emergencies, academic conference travel, religious events, and a small set of approved collegiate activities. If in doubt, contact me to find a solution. Note that interviews, family vacations, weddings, sleeping through alarms, etc. are not excused. Your lowest two participation grades will be dropped, allowing you to miss up to two classes without impacting your grade.

Assignments

There will be two major assignments, one creating an LLM agent based on certain human-AI interaction principles, and another one evaluating this agent. We will provide a Colab Notebook template to walkthrough the required steps. More details will be posted on Canvas once the assignments are released.

Presentations and Discussions

Four class sessions are designated as Discussion Sessions. These are student-led, seminar-style sessions where we critically debate important issues in Human-Centered NLP.

  1. Paper presentation. The reading lectures will be led by students. Each student would sign up for one session, and within each session 2-3 students can lead the same paper that help ground discussions. As presenters, you will (1) do a concise presentation of the paper (so everyone has context), (2) connect the paper to the broader discussion questions provided, and (3) seed discussion with reflection prompts, not necessarily argue one side. To achieve deep paper digestion, you can take inspirations from the role playing model of Jacobson and Raffel. No need to pick explicit roles, just cover relevant discussion points.

  2. Earn participation scores through discussions. Before each reading lecture, we will open corresponding discussion threads on Canvas. Students not leading the session are expected to participate submit comments on those required readings on Canvas. This is how you earn discussion scores! Good comments typically exhibit one or more of the following:

Final Project

The most substantial portion of your coursework is a team-based project (2-4 people). You will self-propose a project broadly relevant to HCI+NLP, with four milestones (they will be posted on Canvas when the time comes):

  1. Form research group + topic selection. You will fill in a short Google Form that documents your group members, and a general description of your project. This will act as a forcing function for you to start think about the project. In the form, you will mostly address these questions: 1 (what are you trying to do), 2 (how is it done today), 3 (what’s new), 4 (who cares), 5 (your proposed method), and 6 (metrics of success). If you are looking for project partners, please post to Canvas!

  2. Midterm presentation + peer feedback. Shortly after the Fall Break, Each group will do a 7-8 minute in-class presentation on the project progress, so the instructor and other students can provide feedback.

  3. Final presentation. Each group will do a 7-8 minute in-class presentation on the final project result. This will be similar to the midterm presentation.

  4. Final report. Each group will also submit a 4-8 page final report (not counting references) written in the form of a conference paper submission. The paper might include content that is typical of papers that appear at ACL or CHI.

Other Information

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you are experiencing COVID-like symptoms or have a recent COVID exposure, do not attend class if we are meeting in-person. Please email the instructors for accomodations.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help; call 412-268-2922 and visit their website at www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police. On campus call CMU Police: 412-268-2323. Off campus: 911.

If you have questions about this, please let the instructors know. Thank you, and have a great semester.