Special Topic: Human-Centered NLP

Canvas:

https://canvas.cmu.edu/courses/49594

Semester:

2025 Fall (05-499/899-D)

Instructors:

Sherry Tongshuang Wu (Office hour: Mondays 2-3pm, NSH 3525)

Time:

Monday / Wednesday 12:30-01:50pm

Location:

TEP 3808

“HCI people design useful things that NLP people cannot build; NLP people make things that nobody uses.” (Yang et al., 2019) This course aims to help students develop the mindsets and skills necessary to build useful NLP systems, by exploring the intersection between HCI and NLP. The course will discuss the strengths and weaknesses of the status quo NLP techniques in interactive scenarios – with a focus on LLMs and their applications, which has inspired profound transformation in the field of human-AI interaction. We will also discuss ways to integrate humans into designing, developing, and evaluating NLP resources, models, and systems. Importantly, it will highlight topics shared between HCI and NLP (agents, model trust, task delegation, data curation, etc.) and reflect on how the two communities approach similar topics differently. The primary goal of the course is to offer an overview of HCI+NLP, and to help students get access to, and understand, both HCI and NLP research papers and methods. The course will be half lecture and half seminar style – every 1-2 weeks, students will sign up to lead the discussion of certain given papers. Coursework includes lectures, paper readings, class presentations, and group projects; It will not contain exams.

Schedule and Readings

This schedule is tentative and subject to changes.

Mon, Aug 25

What is HCNLP? + Course Logistics (Lecture)

Definition of HCNLP, connections to adjacent fields, and course overview.

Slides

Wed, Aug 27

NLU, NLG, and Word Embeddings (Lecture)

Pre-LLM NLP tasks & pipelines; mapping abstract tasks to real applications; quick tour of tooling.

Slides

Required Hugging Face Course: Classical NLP by Hugging Face in 2023

Mon, Sep 01

No Class (Labor Day) Slides

Deadline A0: AWS Account ID Collection for Credit Distribution

Wed, Sep 03

LLMs and Their Applications (Lecture)

Definition of LLMs (architectures, pre- and post-training); existing models; multimodality.

Slides

Deadline Reading 0: Sign up for paper presentation

Required The Illustrated Transformer by Jay Alammar in 2018

Optional Training language models to follow instructions with human feedback by Long Ouyang et al. in arXiv 2022

Optional Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafael Rafailov et al. in NeurIPS 2024

Mon, Sep 08

Guest Lecture: LLM Agents (Zora Wang) (Lecture)

Agent definitions, world models, memory, frameworks; HCI vs. AI perspectives.

Slides

Deadline Online discussion for Agentic Systems

Required Language Agents: Foundations, Prospects, and Risks (EMNLP 2024 Tutorial) by Yu Su et al. in EMNLP 2024 (Tutorial)

Optional Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents by Boyu Gou et al. in ICLR 2025

Optional Beyond Browsing: API-Based Web Agents by Yueqi Song et al. in ArXiv 2025

Optional An Evaluation of Situational Autonomy for Human-AI Collaboration in a Shared Workspace Setting by Vildan Salikutluk et al. in CHI 2024

Optional Social Simulacra: Creating Populated Prototypes for Social Computing Systems by Joon Sung Park et al. in UIST 2022

Optional AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents by Maksym Andriushchenko et al. in ICLR 2025

Wed, Sep 10

Agentic Systems (discussion) (Reading)

Debate: capabilities & beneficiaries of agents; autonomy vs. responsibility; 'intelligent collaborators' vs. workflow wrappers.

Slides

Required Challenges in Human-Agent Communication by Gagan Bansal et al. in 2024

Required What Are Tools Anyway? A Survey from the Language Model Perspective by Zhiruo Wang et al. in arXiv 2024

Required Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration by Yijia Shao et al. in arXiv 2024

Optional TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks by Frank Xu in 2024

Mon, Sep 15

LLMs and Their Applications (cont') (Lecture)

Multimodality, discussions on when to use what models

Slides

Wed, Sep 17

Desiderata to Bake Into Models (I) (Lecture)

Instruction following; rubric-based & LLM-based eval; safety & privacy.

Slides

Optional INFOBENCH: Evaluating Instruction Following Ability in Large Language Models by Yiwei Qin et al. in ACL 2024

Optional Checklists Are Better Than Reward Models For Aligning Language Models by Vijay Viswanathan et al. in arXiv 2025

Optional SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents by Xuhui Zhou et al. in ICLR 2024

Optional Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization by Zhanhui Zhou et al. in ACL 2024

Mon, Sep 22

Desiderata to Bake Into Models (II) (Lecture)

ToM & persona; collaboration capabilithy; task generalizability; tradeoffs between desiderata.

Slides

Optional LLM Evaluators Recognize and Favor Their Own Generations by Arjun Panickssery et al. in NeurIPS 2024

Required CollabLLM: From Passive Responders to Active Collaborators by Shirley Wu et al. in ICLR 2025

Required Human-Centered Evaluation of Language Technologies (Tutorial) by EMNLP 2024 Tutorial in EMNLP 2024

Optional General Scales Unlock AI Evaluation with Explanatory and Predictive Power by Lexin Zhou et al. in 2024

Optional Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization by Yu-Min Tseng et al. in 2024

Wed, Sep 24

Data-in-the-Wild: Collection, Curation, and Cleaning (Lecture)

What counts as 'good data'; annotator populations; curation & augmentation; data quality metrics; documentation & sharing.

Slides

Deadline Group project: Form group + short project description

Optional The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track by Eshta Bhardwaj et al. in NeurIPS 2024

Optional WildChat: 1M ChatGPT Interaction Logs in the Wild by Wenting Zhao et al. in ICLR 2024

Optional Whose language counts as high quality? Measuring language ideologies in text data selection by Suchin Gururangan et al. in arXiv 2022

Optional Position: Measure Dataset Diversity, Don't Just Claim It by Dora Zhao et al. in arXiv 2024

Optional AboutMe: Using self-descriptions in webpages to document effects of English pretraining data filters by Li Lucy et al. in ACL 2024

Optional Data Feminism for AI by Lauren Klein, Catherine D’Ignazio in FAccT 2024

Mon, Sep 29

How human data drive LLMs (Lecture)

What usage data to collect/analyze; what to learn from it; training objectives; ties to system design.

Slides

Deadline Online discussion for Data, Desiderata, and Evaluation

Required Collective Constitutional AI: Aligning a Language Model with Public Input by Saffron Huang et al. in FAccT 2024

Required STELA: a community-centered approach to norm elicitation for AI alignment by Stevie Bergman et al. in Scientific Reports 2024

Optional A Taxonomy for Human-LLM Interaction Modes by Jie Gao et al. in CHI EA 2024

Optional ConstitutionMaker: Interactively Critiquing LLMs by Converting Feedback into Principles by Savvas Petridis et al. in arXiv 2023

Optional Show, Don't Tell: Aligning LMs with Demonstrated Feedback by Omar Shaikh et al. in arXiv 2024

Wed, Oct 01

Data, Desiderata, and Evaluation (discussion) (Reading)

Debate on ethical & useful data collection; what to prioritize in model capabilities; whether we should focus on dataset curation vs. learning from messy usage.

Slides

Required Data Authenticity, Consent, and Provenance for AI Are All Broken: What Will It Take to Fix Them? by Shayne Longpre et al. in 2024

Required Collective Consent: Who Needs to Consent to Data Representing Multiple People? by Emma Walquist et al. in CSCW 2025

Required Identifying the risks of LM agents with an LM-emulated sandbox by Yangjun Ruan et al. in arXiv 2023

Optional The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models by Hannah Rose Kirk et al. in arXiv 2024

Mon, Oct 06

Human–LLM Interaction and Prompting (Lecture)

Prompting basics & pitfalls; requirements-oriented prompting; synthesizers; modeling overconfidence; user mental models.

Slides

Required The Prompt Report: A Systematic Survey of Prompting Techniques by Sander Schulhoff et al. in arXiv 2024

Required DSPy: Compiling Declarative LM Calls into Self-Improving Pipelines by Omar Khattab et al. in 2024

Wed, Oct 08

Desiderata of Human–AI Collaboration (Lecture)

Important components for collaborative design across UI, infra, and interaction process; Trust & calibration; collaborative evaluation; user modeling.

Slides

Deadline Signup for the milestone presentation

Optional Formalizing Trust in Artificial Intelligence by Alon Jacovi et al. in FAccT 2021

Required Interaction, Process, Infrastructure: A Unified Architecture for Human-Agent Collaboration by Yun Wang, Yan Lu in arXiv 2025

Optional Guidelines for Human-AI Interaction by Amershi et al. in 2019

Required Task Completion Agents are Not Ideal Collaborators by Shannon Zejiang Shen et al. in arXiv 2025

Mon, Oct 13

No Class (Fall Break) (No-class) Slides

Wed, Oct 15

No Class (Fall Break) (No-class) Slides

Mon, Oct 20

Midterm Project Presentations — Session 1 (Presentation) Slides

Wed, Oct 22

Midterm Project Presentations — Session 2 (Presentation) Slides

Mon, Oct 27

Design Space for Human–AI Systems (Lecture)

Autonomy, control, interaction paradigms, trust/understanding; real-system examples; generative UI.

Slides

Optional Stakeholder-centric participation in LLMs for health systems by Zhiyuan Wang et al. in Nature 2025

Optional Rehearsal: Simulating conflict to teach conflict resolution by Omar Shaikh et al. in arXiv 2023

Wed, Oct 29

Guest Lecture: Theory of Mind (Chelsea Wang) (Lecture) Slides

Mon, Nov 03

Guest Lecture: Case Studies — Coding Agents (Valerie Chen) (Lecture) Slides

Deadline Assignment 1: Building an Agent

Deadline Online discussion for Agentic Systems

Wed, Nov 05

How Should HCI Contribute to Model Development? (discussion) (Reading)

Connection between UX and model architecture; the impact of participatory design; 'human-in-the-loop' meanings; evaluation metrics aligned with user needs.

Slides

Required Participation in the Age of Foundation Models by Harini Suresh et al. in FAccT 2024

Required Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks by Hope Schroeder et al. in NAACL Findings 2025

Required Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences by Shreya Shankar et al. in UIST 2024

Optional Power to the People? Opportunities and Challenges for Participatory AI by Abeba Birhane et al. in EAAMO 2022

Optional From Prompt Engineering to Prompt Science with Humans in the Loop by Chirag Shah in CACM 2025

Mon, Nov 10

Evaluation of Human–AI Interaction (Lecture)

Different dimensions for considering evaluation on human-AI interaction w.r.t both models and systems.

Slides

Required SPHERE: An Evaluation Card for Human-AI Systems by Qianou Ma et al. in ACL Findings 2025

Required Evaluating Human–Language Model Interaction by Mina Lee et al. in TMLR 2023

Required Using LLMs to simulate multiple humans and replicate human-subject studies by Gati Aher et al. in ICML 2023

Optional The RealHumanEval: Evaluating LLMs' Abilities to Support Programmers by Hussein Mozannar et al. in arXiv 2024

Optional Not Just Novelty: A Longitudinal Study on Utility and Customization of AI Workflows by Tao Long, Katy Ilonka Gero, Lydia B. Chilton in arXiv 2024

Wed, Nov 12

Social Implications (Lecture)

Discuss high-stakes use cases across Personalized education, Companion and support, LLMs for science discovery, Transformation to workforce.

Slides

Deadline Online discussion for Beneficial Use Cases

Optional GPTs are GPTs: Labor market impact potential of LLMs by Tyna Eloundou et al. in Science 2024

Optional Generative AI at Work by Erik Brynjolfsson, Danielle Li, Lindsey Raymond in NBER 2023

Optional Does Writing with Language Models Reduce Content Diversity? by Vishakh Padmakumar, He He in ICLR 2024

Mon, Nov 17

Beneficial and Non-Beneficial Use Cases (discussion) (Reading)

Deciding 'beneficial' deployments; particular dimensions e.g. companions & well-being; whether models democratize access to information or increases inequality, etc.

Slides

Required Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce by Yijia Shao et al. in ArXiv 2025

Required Impact of generative AI on socioeconomic inequalities by Valerio Capraro in PNAS Nexus 2024

Required Art or Artifice? LLMs and the False Promise of Creativity by Tuhin Chakrabarty et al. in CHI 2024

Optional Clinical safety & hallucination fidelity framework for LLMs by Elham Asgari in Digital Medicine 2025

Optional Emotional risks of AI companions demand attention in Nature Machine Intelligence 2025

Wed, Nov 19

Safety, Bias, Ethics (and/or Social Intelligence) (Lecture)

Cultural bias; stereotype datasets; values encoded in ML research; risks of LMs; anthropomorphism.

Slides

Optional The values encoded in machine learning research by Abeba Birhane et al. in FAccT 2022

Optional Challenges and Strategies in Cross-Cultural NLP by Daniel Hershcovich et al. in ACL 2022

Optional AnthroScore: A Computational Linguistic Measure of Anthropomorphism by Myra Cheng et al. in EACL 2024

Optional Taxonomy of risks posed by language models by Laura Weidinger et al. in FAccT 2022

Optional Unintended impacts of LLM alignment on global representation by Michael J. Ryan, William Held, Diyi Yang in arXiv 2024

Mon, Nov 24

Recap (Lecture) Slides

Deadline Signup for final presentation

Wed, Nov 26

No Class (Thanksgiving Break) (No-class) Slides

Mon, Dec 01

Final Project Presentations — Session 1 (Presentation) Slides

Deadline Assignment 2: Human-Centered Evaluation of Your Collaborative Agent

Wed, Dec 03

Final Project Presentations — Session 2 (Presentation) Slides

Fri, Dec 12

Final project report (No Class) Slides

Deadline Final project report submission

Additional course information available on Canvas.

Syllabus

Course Goals

The learning goals of the course are as follows:

To introduce basic concepts of NLP and HCI, to the extent that students can read research papers in the relevant field.
To introduce a variety of emerging topics related to human-centered NLP.
To introduce practical tools for building and analyzing model-infused applications.

Notice that this new course is mostly designed to be a graduate-level, semi-seminar-style course for students interested in HCI+NLP research. This means:

The course is largely project-oriented, and does not involve exames.
A large proportion of your grade will depend on research paper reading and digestion.
This is a special topic course (not well-established) and may only be offered once, so please expect some glitches in the schedule or assignments :)
I cannot guarantee it will count as a part of a technical requirement.

Prerequisites

There is no explicit prerequisite; However, students are expected to (1) be proficient in Python (for completing assignments). You should not take the course if you find programming or debugging extremely difficult because you will have to master several programming languages/concepts/libraries in very short order. That being said, the assignments that require these will have useful resources for brushing up on the topics. Students are also expected to (2) know basic ML concept — To the extent that you understand concepts like train/dev/test set, model fitting, feature, supervised learning, etc. (We will not cover these in this course!)

If you are familiar with NLP and relevant programming libraries (e.g. HuggingFace, Smolagents), you might find certain parts of the course introducing NLP concepts significantly easier (or, unnecessary :D).

Course Materials and Communications

Slides will be on this page; You will need to log in with your Andrew ID to access them.
Assignments and discussions will be on Canvas when their time comes. All assignments must be turned in using Canvas.
If you have questions related to course materials or logistics, please post them on Slack (See the link on Canvas).
If you have special requests, please DM the instructor on Slack or email the instructor at sherryw@cs.cmu.edu.

Major Research Work

Grading

Assignments will be posted to canvas as well as their due dates. Each day late will result in a 10% deduction (up to a maximum of 50% off). Students caught cheating or plagiarizing will receive no credit for the assignment. As a reminder, here is the university policy on academic integrity.

Your final grade in this course will be based on:

30% Homework Assignments
40% Final Project
- 5% Form group + short project description
- 10% Midterm presentation
- 15% Final presentation
- 10% Project report
15% Leading paper discussions
10% Paper discussions on Canvas
5% In-class attendance

Attendance

Lectures will be held in-person twice a week. A good portion of the learning in any class comes from intelligent discussion. If you don’t attend class, you cannot participate, and your performance in the class will reflect that. Rather than taking attendance, there will be pop quizzes and also artifacts collected at the end of class that were generated from in-class activities.

Excused absences this course accepts are medical and family emergencies, academic conference travel, religious events, and a small set of approved collegiate activities. If in doubt, contact me to find a solution. Note that interviews, family vacations, weddings, sleeping through alarms, etc. are not excused. Your lowest two participation grades will be dropped, allowing you to miss up to two classes without impacting your grade.

Assignments

There will be two major assignments, one creating an LLM agent based on certain human-AI interaction principles, and another one evaluating this agent. We will provide a Colab Notebook template to walkthrough the required steps. More details will be posted on Canvas once the assignments are released.

Presentations and Discussions

Four class sessions are designated as Discussion Sessions. These are student-led, seminar-style sessions where we critically debate important issues in Human-Centered NLP.

Paper presentation. The reading lectures will be led by students. Each student would sign up for one session, and within each session 2-3 students can lead the same paper that help ground discussions. As presenters, you will (1) do a concise presentation of the paper (so everyone has context), (2) connect the paper to the broader discussion questions provided, and (3) seed discussion with reflection prompts, not necessarily argue one side. To achieve deep paper digestion, you can take inspirations from the role playing model of Jacobson and Raffel. No need to pick explicit roles, just cover relevant discussion points.
Earn participation scores through discussions. Before each reading lecture, we will open corresponding discussion threads on Canvas. Students not leading the session are expected to participate submit comments on those required readings on Canvas. This is how you earn discussion scores! Good comments typically exhibit one or more of the following:

Critiques of arguments made in the papers
Analysis of implications or future directions for work discussed in lecture or readings
Clarification of some point or detail presented in the class
Insightful questions about the readings or answers to other people’s questions
Links to web resources or examples that pertain to a lecture or reading

Final Project

The most substantial portion of your coursework is a team-based project (2-4 people). You will self-propose a project broadly relevant to HCI+NLP, with four milestones (they will be posted on Canvas when the time comes):

Form research group + topic selection. You will fill in a short Google Form that documents your group members, and a general description of your project. This will act as a forcing function for you to start think about the project. In the form, you will mostly address these questions: 1 (what are you trying to do), 2 (how is it done today), 3 (what’s new), 4 (who cares), 5 (your proposed method), and 6 (metrics of success). If you are looking for project partners, please post to Canvas!
Midterm presentation + peer feedback. Shortly after the Fall Break, Each group will do a 7-8 minute in-class presentation on the project progress, so the instructor and other students can provide feedback.
Final presentation. Each group will do a 7-8 minute in-class presentation on the final project result. This will be similar to the midterm presentation.
Final report. Each group will also submit a 4-8 page final report (not counting references) written in the form of a conference paper submission. The paper might include content that is typical of papers that appear at ACL or CHI.

Other Information

Respect for Diversity

It is our intent that students from all diverse backgrounds and perspectives be well served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. It is our intent to present materials and activities that are respectful of diversity: gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups. In addition, if any of our class meetings conflict with your religious events, please let us know so that we can make arrangements for you.

Accommodations for Students with Disabilities

If you have a disability and are registered with the Office of Disability Resources, we encourage you to use their online system to notify us of your accommodations and discuss your needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Health and Well-being

If you are experiencing COVID-like symptoms or have a recent COVID exposure, do not attend class if we are meeting in-person. Please email the instructors for accomodations.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help; call 412-268-2922 and visit their website at www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help. If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

CaPS: 412-268-2922
Re:solve Crisis Network: 888-796-8226

If the situation is life threatening, call the police. On campus call CMU Police: 412-268-2323. Off campus: 911.

If you have questions about this, please let the instructors know. Thank you, and have a great semester.

Sherry @ CMU

Special Topic: Human-Centered NLP

Schedule and Readings

Syllabus

Course Goals

Prerequisites

Course Materials and Communications

Major Research Work

Grading

Attendance

Assignments

Presentations and Discussions

Final Project

Other Information

Respect for Diversity

Accommodations for Students with Disabilities

Health and Well-being