Children acquiring languages with noun classes (grammatical gender) have ample statistical information available that characterizes the distribution of nouns into these classes. In this talk I look at children acquiring Tsez, a Nakh Dagestanian language with 4 noun classes. I compare the statistical generalizations from a corpus of child directed speech to the results of a classification experiment, and find that children's behavior does not match what might be predicted by the statistical patterns in the input. I then present several computational models of noun classification that introduce uncertainty into the optimal Bayesian classifier and discuss ways they can account for the difference between children’s behavior and the optimal classifier. These results suggest that children may be classifying optimally with respect to a distribution that doesn’t match the surface distribution of these statistical features, highlighting differences between the linguistic input and the intake that is used for language acquisition, and informing hypotheses about the source of these differences.
Annie Gagliardi earned her PhD in linguistics from the University of Maryland in 2012. In her work she focuses on characterizing the relative contributions of the learner and the linguistic environment in language acquisition. She is particularly interested in combining data from multiple sources, including cross linguistic fieldwork, psycholinguistic experiments and computational modeling, to answer these questions.
Host: Chris Dyer
jlentz [atsymbol] cs.cmu.edu