Abstract

Data-driven model building is an important task of scientific discovery that is seeing real success in the development and application of discovery programs. Most efforts have targeted fields of natural science in which the hypothesis spaces are specialized and deal with domains having considerable formal structure. Less work has been directed toward qualitative areas of social science, in which model building also arises. This paper reports the first automation of a modelling task from linguistic anthropology: the analysis of natural-language kinship terminologies in terms of simpler semantic components. Our approach uses three generic simplicity criteria to comprehensively find all the simplest models that are consistent with kinship data. We have reproduced results from the linguistics literature, but have also found simpler models in some cases. The task has strong generic elements: extracts of the code are applied to other data sets to illustrate this potential.

full paper