Language Technologies Institute Colloquium
- Doherty Hall
- DAVID R. MORTENSEN
- Systems Scientist
- Language Technologies Institute
- Carnegie Mellon University
Hmong Elaborate Expressions: Constructional and Distributional-Semantic Perspectives
Fluent Hmong speech and writing are full of elaborate expressions, idioms like tuav riam tuav phom ‘wield knife wield gun; wield weapons’ and poob ntsej poob muag ‘lose ear lose eye; lose face.’ This talk argues that these expressions are interesting in two different ways. First, they are simultaneously based upon a general pattern of coordination and on specific coordinate compounds (words like ntsej-muag ‘ear-eye; face’). Second, the words that can occur in these coordinate compounds are predictable and follow a single, general pattern. They seem to sometimes be composed of synonyms (as in quaj-nyiav ‘cry-cry; cry’, sometimes antonyms (as in hnub-hmo ‘day-night; day and night; all the time’) and sometimes representative members of some class (as in xyoob-ntoo ‘bamboo-tree; woody plants’). However, this talk hypothesizes that they are always distributionally similar. That is, they are words that occur in similar sets of contexts.
The hypothesis is tested against a 13 million word corpus of Hmong newsgroup text. I show that a classifier based on the cosine similarity between the second and forth word (in Word2vec embeddings) better predicts when a four-gram is an elaborate expression than a strong rule-based baseline. This finding has implications for other kinds of coordination (whether in Hmong or in other languages).
David R. Mortensen's origins are unknown. He is currently a Systems Scientist at LTI. Prior to coming to Carnegie Mellon University, he was an Assistant Professor in the Linguistics Department at the University of Pittsburgh. He earned an MA and PhD in Linguistics at the University of California, Berkeley. He has diverse research interests. His work is multilingual and features a special interest in low-resource languages, especially languages of South and Southeast Asia. He specializes in computational phonology, morphology, and data resource development. He is currently working on research projects involving morphological disambiguation, historical linguistics, and distributed representations of linguistic units.
More on the Hmong people and culture.
4:00 pm: Light snacks, LTI 5th Floor Atrium