The written text that we interact with on an everyday basis -- news articles, emails, social media, books -- is the product of a profoundly social phenomenon with people at its core. People intersect with text in multiple, complex ways: they are the authors of nearly all text we see, they are the audience for whom that text is written, and they are the subjects of that content itself.
In this thesis, I advocate for a model of text analysis centered on people in each of these interacting roles of author, audience and content. While much current work in NLP approaches each of these aspects individually, I argue that developing computational models that capture the complexity of their interaction will yield deeper, socio-culturally relevant descriptions of these actors, and that these deeper representations will open the door to socially-aware language technologies that have a more useful understanding of the world.
I explore this perspective by designing, implementing and evaluating computational models of three kinds: a.) unsupervised models of "personas" (abstract categories defined over people), through which we can capture patterns of identity and behavior in descriptions of people as content in text; b.) models of persona variation, through which we can measure how those descriptions vary according to qualities of the author (such as political affiliation) and c.) models of persona self-presentation, through which we can capture how individuals choose to present themselves as authors on social media as a function of their audience. Each of these research fronts captures one dimension of how people interact with each other (as mediated through text) and charts out further work along its axis.
Noah Smith (Chair)
Jacob Eisentein (Georgia Institute of Technology)
Ted Underwood (University of Illinois)
staceyy [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu