Build practical AI systems, by mapping general-purpose AIs to the right specific use cases.
Recent Publications
2025
Improving Automated Feedback Systems for Tutor Training in Low-Resource Scenarios through Data Augmentation
Chentianye Xu,
Jionghao Lin,
Tongshuang Wu,
Vincent Aleven,
Kenneth R. Koedinger
ArXiv 2025
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma*,
Dora Zhao*,
Xinran Zhao,
Chenglei Si,
Chenyang Yang,
Ryan Louie,
Ehud Reiter,
Diyi Yang+,
Tongshuang Wu+
ArXiv 2025
General Scales Unlock AI Evaluation with Explanatory and Predictive Power
Lexin Zhou,
Lorenzo Pacchiardi,
Fernando Martínez-Plumed,
Katherine M. Collins,
Yael Moros-Daval,
Seraphina Zhang,
Qinlin Zhao,
Yitian Huang,
Luning Sun,
Jonathan E. Prunty,
Zongqian Li,
Pablo Sánchez-García,
Kexin Jiang Chen,
Pablo A. M. Casares,
Jiyun Zu,
John Burden,
Behzad Mehrbakhsh,
David Stillwell,
Manuel Cebrian,
Jindong Wang,
Peter Henderson,
Sherry Tongshuang Wu,
Patrick C. Kyllonen,
Lucy Cheke,
Xing Xie,
José Hernández-Orallo
ArXiv 2025
What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models
Michael Xieyang Liu,
Tongshuang Wu,
Tianying Chen,
Franklin Mingzhe Li,
Aniket Kittur,
Brad A. Myers
CHI 2024
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang,
Yining Hong,
Grace A. Lewis,
Tongshuang Wu,
Christian Kästner
ASE 2024
HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation
Zirui Wang,
Xinran Zhao,
Simon Stepputtis,
Woojun Kim,
Tongshuang Wu,
Katia Sycara,
Yaqi Xie
Video-Language Models Workshop @ NeurIPS 2024
2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes,
Aman Madaan,
Emmy Liu,
António Farinhas,
Pedro Henrique Martins,
Amanda Bertsch,
José G. C. de Souza,
Shuyan Zhou,
Tongshuang Wu,
Graham Neubig,
André F. T. Martins
TACL 2023
Large Language Models Enable Few-Shot Clustering
Vijay Viswanathan,
Kiril Gashteovski,
Carolin Lawrence,
Tongshuang Wu,
Graham Neubig
TACL 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
Yiming Zhang,
Sravani Nanduri,
Liwei Jiang,
Tongshuang Wu,
Maarten Sap
EMNLP 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs