On the alignment of production variability in humans and text generators

Abstract

Any unique language production context affords speakers with multiple plausible communicative intents, and any intent can be produced in multiple plausible ways—given the same story prompt, for example, different humans may tell very different stories. Using multiple-reference datasets, we characterise the extent to which human production varies lexically, syntactically, and semantically across four production (text generation) tasks. We then inspect the space of plausible alternative productions as shaped by a text generator’s predicted probability distribution and decoding algorithm. For each test input, we probe the system’s production variability and measure its alignment to the variability exhibited by humans. We analyse language models and decoding strategies and find that (i) models overestimate variability on open-ended tasks and underestimate it on more constrained tasks, and (ii) decoding algorithms have relatively little influence on a model’s alignment to human variability. Our study draws connections between variability and uncertainty in language production and suggests ways to exploit this link, and our methods, in future work.

Date
May 3, 2023 10:30 AM — 12:00 PM
Location
OAT S13

Bio

Mario Giulianelli is a PhD student at the Institute for Logic, Language and Computation of the University of Amsterdam. His research is mainly on human strategies of communication using computational models of language understanding and generation.