There is wide debate about the degree to which the properties of human cognition affect how languages are structured and how they change over time. This controversy extends to the lexicon. We show that the decline of lexical items can be partially accounted for by biases that have been demonstrated in the cognitive science literature. A quantitative study of 19th century English, French, and German shows that semantic, distributional, and phonological factors affect the perpetuation of words in the ways predicted by psycholinguist studies. However, not all proposed biases are equally robust. In the study of multiword expressions in Hmong, Lahu, and Chinese, we show that—contrary to a widely assumed bias in language learning—word order in MWEs can be driven by phonological hierarchies (by modeling the order of these MWEs as a classification task). We further provide evidence that these hierarchies, contrary to another posited bias, can be phonetically arbitrary. However, through a sequence labeling task, we explore the possibility that the word order patterns may be learned without information from the unnatural phonological hierarchies (which can then be interpreted as a relic of earlier, natural hierarchies, leaving the question of learning bias open).
David Mortensen is a computational linguist interested in phonology, morphology, language change, linguistic typology, and human-in-the-loop computation. He is currently a Systems Scientist (a non-tenure track research faculty member at the Assistant Professor level) in the Language Technologies Institute, which is part of Carnegie Mellon University’s School of Computer Science. Before coming to CMU, he was an Assistant Professor in the Department of Linguistics at the University of Pittsburgh.