Dallas Card

Email: dalc@umich.edu
Office: North Quad 3421
GitHub, Twitter, Bluesky, Blog
Google Scholar, ORCiD

I am an assistant professor in the School of Information at the University of Michigan. Before that, I was a postdoctoral researcher in the Stanford NLP Group and the Stanford Data Science Institute. I received my Ph.D. from the Machine Learning Department at Carnegie Mellon University, where I was advised by Noah Smith.

My research centers on making machine learning more reliable and responsible, and on using machine learning and natural language processing to learn about society from text.

Updates

May 2024: I will be attending NAACL 2024 in Mexico City, where I am one of the organizers of the NLP+CSS workshop.
April 2024: Our paper on An Archival Perspective on Pretraining Data has now been published (open access) in Patterns!
March 2024: I will be speaking at the Emory Department of Quantitative Theory & Methods speaker series on March 25th.
February 2024: Our PNAS paper on immigration was covered by today's Washington Post (paywalled).
December 2023: I will be attending NeurIPS in New Orleans to present our poster, An Archival Perspective on Pretraining Data, at the SoLaR workshop on December 16th
October 2023: Our new paper on media storms has been accepted to Findings of EMNLP
October 2023: I will be attending the Workshop on Operationalizing the Measure Function of the NIST AI RMF in Washington, October 16-17th
July 2023: I will be speaking about evaluation challenges at the MIDAS workshop on Generative AI for Research, July 25-26th
July 2023: I will be attending the CASMI workshop on Sociotechnical Approaches to Measurement and Validation for Safety in AI, July 18-19th
July 2023: I will be attending ACL 2023 in Toronto, July 9-14th, where I will be presenting a paper on Semantic Change Detection
June 2023: I will be attending FAccT 2023 in Chicago, June 12-15th

Current Ph.D. Students

Ben Litterer (co-advised with David Jurgens)
Lavinia Dunagan
Meera Desai (co-advised with Abigail Jacobs)

Selected Publications

An Archival Perspective on Pretraining Data
Meera A. Desai, Irene V. Pasquetto, Abigail Z. Jacobs, Dallas Card
Patterns, March 2024
[bib]

When it Rains, it Pours: Modeling Media Storms and the News Ecosystem
Benjamin Litterer, David Jurgens, Dallas Card
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023
[code] [bib]

Substitution-based Semantic Change Detection using Contextual Embeddings
Dallas Card
Association for Computational Linguistics (ACL), 2023
[code] [bib]

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith
Empirical Methods in Natural Language Processing (EMNLP), 2022
[code] [bib]

Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration
Dallas Card, Serina Chang, Chris Becker, Julia Mendelsohn, Rob Voigt, Leah Boustan, Ran Abramitzky, Dan Jurafsky
Proceedings of the National Academy of Sciences 119(31), 2022
[data and code] [bib] Media coverage: New York Times, Washington Post

The Values Encoded in Machine Learning Research
Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao
ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022
[data and code] [bib]
[Distinguished Paper Award]

Modular Domain Adaptation
Junshen Chen, Dallas Card, Dan Jurafsky
In Findings of the Association of Computational Linguistics (ACL), 2022
[blog post] [code] [bib]

Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words
Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, Dan Jurafsky
Association for Computational Linguistics (ACL), 2022
[bib]

On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A. Hudson, Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, et al.
arXiv:2108.07258, 2021
[bib]

Expected Validation Performance and Estimation of a Random Variable's Maximum
Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021
[bib]

Causal Effects of Linguistic Properties
Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Veitch, and Dhanya Sridhar
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
[bib]

With Little Power Comes Great Responsibility
Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, and Dan Jurafsky
Empirical Methods in Natural Language Processing (EMNLP), 2020
[code] [bib]

Detecting Stance in Media On Global Warming
Yiwei Luo, Dallas Card, and Dan Jurafsky
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2020
[code] [bib]

Explain like I am a Scientist: The Linguistic Barriers of Entry to r/science
Tal August, Dallas Card, Gary Hsieh, Noah A. Smith, and Katharina Reinecke
Human Factors in Computing Systems (CHI), 2020
[bib]

On Consequentialism and Fairness
Dallas Card and Noah A. Smith
Frontiers in Artificial Intelligence, 2020
[bib]

Show Your Work: Improved Reporting of Experimental Results
Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, and Noah A. Smith
Empirical Methods in Natural Language Processing (EMNLP), 2019
[code] [bib] Media coverage: WIRED

Variational Pretraining for Semi-supervised Text Classification
Suchin Gururangan, Tam Dang, Dallas Card, and Noah A. Smith
Association for Computational Linguistics (ACL), 2019
[code] [bib]

The Risk of Racial Bias in Hate Speech Detection
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith
Association for Computational Linguistics (ACL), 2019
[bib] Media coverage: VOX

Deep Weighted Averaging Classifiers
Dallas Card, Michael Zhang, and Noah A. Smith
ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2019
[code] [blog post] [bib]

Neural Models for Documents with Metadata
Dallas Card, Chenhao Tan, and Noah A. Smith
Association for Computational Linguistics (ACL), 2018
[code] [tutorial] [bib]

Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts
Chenhao Tan, Dallas Card, and Noah A. Smith
Association for Computational Linguistics (ACL), 2017
[blog post] [bib]

Analyzing Framing through the Casts of Characters in the News
Dallas Card, Justin H. Gross, Amber E. Boydstun, and Noah A. Smith
Empirical Methods in Natural Language Processing (EMNLP), 2016
[bib]

The Media Frames Corpus: Annotations of Frames Across Issues
Dallas Card, Amber E. Boydstun, Justin H. Gross, Philip Resnik, and Noah A. Smith
Association for Computational Linguistics (ACL), 2015
[bib]

Recent Professional Service

FAccT steering committee member (2023-2024)
Co-organizer of the NLP+CSS workshop, to be held at NAACL 2024 in Mexico City.
Co-organizer of the 2024 Midwest Speech and Language Days, to be held on April 15-16th, at the University of Michigan
Area Chair for ACL (2023), ACL Rolling Review (2024, 2023), FAccT (2024, 2023), NAACL (2021)
Reviewer for ACL Rolling Review (2022, 2021), ACL (2022, 2021), EMNLP (2022, 2021), NAACL (2022, 2021) TACL (2023, 2022, 2021), EMNLP Ethics reviewer (2023, 2022, 2021), FAccT (2022), AAAI (2022, 2021), AIES (2023), International Journal of Communication (2024), The Web Conference (2023), Philosophy and Technology (2021), PeerJ (2021)

About me

I'm originally from Winnipeg, but I have also lived in Toronto, Waterloo, Halifax, Sydney, Kampala, Pittsburgh, Seattle, Palo Alto, and now Ann Arbor.

I am an occasional guest on The Reality Check podcast. You can hear me in episodes #466 (biased algorithms), #382 (deep learning), #362 (Simpson's paradox), and #227 (fMRI and vegetative states).

[short bio for talks]