RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation Systems
Yixiao Zeng,
Tianyu Cao,
Danqing Wang,
Xinran Zhao,
Zimeng Qiu,
Morteza Ziyadi,
Tongshuang Wu,
Lei Li
ArXiv 2025
From Prompts to Reflection: Designing Reflective Play for GenAI Literacy
Qianou Ma,
Megan Chai,
Yike Tan,
Jihun Choi,
Jini Kim,
Erik Harpstead,
Geoff Kauffman,
Tongshuang Wu
ArXiv 2025
Not Everyone Wins with LLMs: Behavioral Patterns and Pedagogical Implications in AI-assisted Data Analysis
Qianou Ma,
Kenneth Koedinger,
Tongshuang Wu
ArXiv 2025
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Rulin Shao,
Akari Asai,
Shannon Zejiang Shen,
Hamish Ivison,
Varsha Kishore,
Jingming Zhuo,
Xinran Zhao,
Molly Park,
Samuel G Finlayson,
David Sontag,
Tyler Murray,
Sewon Min,
Pradeep Dasigi,
Luca Soldaini,
Faeze Brahman,
Wen-tau Yih,
Tongshuang Wu,
Luke Zettlemoyer,
Yoon Kim,
Hannaneh Hajishirzi,
Pang Wei Koh
ArXiv 2025
Completion ≠ Collaboration: Scaling Collaborative Effort with Agents
Best Paper
Shannon Zejiang Shen,
Valerie Chen,
Ken Gu,
Alexis Ross,
Zixian Ma,
Jillian Ross,
Alex Gu,
Chenglei Si,
Wayne Chi,
Andi Peng,
Jocelyn J Shen,
Ameet Talwalkar,
David Sontag,
Tongshuang Wu
NeurIPS Responsible Foundation Model Workshop 2025
General Scales Unlock AI Evaluation with Explanatory and Predictive Power
Lexin Zhou,
Lorenzo Pacchiardi,
Fernando Martínez-Plumed,
Katherine M. Collins,
Yael Moros-Daval,
Seraphina Zhang,
Qinlin Zhao,
Yitian Huang,
Luning Sun,
Jonathan E. Prunty,
Zongqian Li,
Pablo Sánchez-García,
Kexin Jiang Chen,
Pablo A. M. Casares,
Jiyun Zu,
John Burden,
Behzad Mehrbakhsh,
David Stillwell,
Manuel Cebrian,
Jindong Wang,
Peter Henderson,
Sherry Tongshuang Wu,
Patrick C. Kyllonen,
Lucy Cheke,
Xing Xie,
José Hernández-Orallo
Nature 2025
What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use
Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models
Michael Xieyang Liu,
Tongshuang Wu,
Tianying Chen,
Franklin Mingzhe Li,
Aniket Kittur,
Brad A. Myers
CHI 2024
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang,
Yining Hong,
Grace A. Lewis,
Tongshuang Wu,
Christian Kästner
ASE 2024
HiMemFormer: Hierarchical Memory-Aware Transformer for Multi-Agent Action Anticipation
Zirui Wang,
Xinran Zhao,
Simon Stepputtis,
Woojun Kim,
Tongshuang Wu,
Katia Sycara,
Yaqi Xie
Video-Language Models Workshop @ NeurIPS 2024
2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes,
Aman Madaan,
Emmy Liu,
António Farinhas,
Pedro Henrique Martins,
Amanda Bertsch,
José G. C. de Souza,
Shuyan Zhou,
Tongshuang Wu,
Graham Neubig,
André F. T. Martins
TACL 2023
Large Language Models Enable Few-Shot Clustering
Vijay Viswanathan,
Kiril Gashteovski,
Carolin Lawrence,
Tongshuang Wu,
Graham Neubig
TACL 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
Yiming Zhang,
Sravani Nanduri,
Liwei Jiang,
Tongshuang Wu,
Maarten Sap
EMNLP 2023
Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs