Information Theory in Linguistics: Methods and Applications (COLING 2022)

COLING 2022: October 12-17, 2022

Course Description

Since Shannon originally proposed his mathematical theory of communication in the middle of the 20th century, information theory has been an important way of viewing and investigating problems at the interfaces between linguistics, cognitive science, and computation, respectively. With the upsurgence in applying machine learning approaches to linguistics questions, information-theoretic methods are becoming an ever more important tool in the linguist’s toolbox. This cutting-edge tutorial, which draws on the work of many different researchers, emphasizes interdisciplinary connections between the fields of linguistics and natural language processing. We plan to do this by reviewing the mathematical basis of information theory. We then show it can be fruitfully applied to several linguistic applications, ranging from semantics, typology, morphology, phonotactics, and to the interface between cognitive science and linguistics. We then discuss recent research—spanning fields from psycholinguistics to machine learning—that have made progress in the analysis of natural language using these techniques. Throughout the tutorial, we will provide hands-on exercises that allow you to put theory into practice in linguistic applications.


Module 1 Introduction and Background Slides Colab Notebook
Module 2 Case Studies in Complexity Slides Colab Notebook
Module 3 Case Studies in Correlation Slides
Module 4 Case Studies in Communication Slides
Conclusion Concluding Remarks Slides


Information Theory Background

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. 2006. Wiley-Interscience, USA.

Statistics Background

Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics. 2001. Prentice Hall, USA.

Recent Papers (by topic)

Topic Title Authors Bib  
Entropy Estimation Estimating Discrete Entropy Part 1 Sebastian Nowozin
Estimating Discrete Entropy Part 2 Sebastian Nowozin
Estimating Discrete Entropy Part 3 Sebastian Nowozin
Jackknifing An Index of Diversity Samuel Zahl
Estimating functions of probability distributions from a finite set of samples David H. Wolpert and David R. Wolf
Distribution of Mutual Information Hutter, Marcus
Entropy and Inference, Revisited Nemenman, Ilya and Shafee, F. and Bialek, William
Estimation of Entropy and Mutual Information Paninski, Liam
Bayesian Entropy Estimation for Countable Discrete Distributions Evan Archer and Il Memming Park and Jonathan W. Pillow
Estimating the Entropy of Linguistic Distributions Aryaman Arora and Clara Isabel Meister and Ryan Cotterell
Arbitrariness of the Sign Meaning to Form: Measuring Systematicity as Information Pimentel, Tiago and McCarthy, Arya D. and Blasi, Damian and Roark, Brian and Cotterell, Ryan
Finding Concept-specific Biases in Form--Meaning Associations Pimentel, Tiago and Roark, Brian and Wichmann, Søren and Cotterell, Ryan and Blasi, Damián
Morphology Predicting Declension Class from Form and Meaning Williams, Adina and Pimentel, Tiago and Blix, Hagen and McCarthy, Arya D. and Chodroff, Eleanor and Cotterell, Ryan
Quantifying the Semantic Core of Gender Systems Williams, Adina and Blasi, Damián and Wolf-Sonkin, Lawrence and Wallach, Hanna and Cotterell, Ryan
Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions McCarthy, Arya D. and Williams, Adina and Liu, Shijia and Yarowsky, David and Cotterell, Ryan
Morphological Irregularity Correlates with Frequency Wu, Shijie and Cotterell, Ryan and O'Donnell, Timothy
On the Complexity and Typology of Inflectional Morphological Systems Cotterell, Ryan and Kirov, Christo and Hulden, Mans and Eisner, Jason
Human Language Processing Predictive power of word surprisal for reading times is a linear function of language model quality Goodkind, Adam and Bicknell, Klinton
Evaluating information-theoretic measures of word prediction in naturalistic sentence reading Aurnhammer, Christoph and Frank, Stefan L
A Cognitive Regularizer for Language Modeling Wei, Jason and Meister, Clara and Cotterell, Ryan
Lower Perplexity is Not Always Human-Like Kuribayashi, Tatsuki and Oseki, Yohei and Ito, Takumi and Yoshida, Ryo and Asahara, Masayuki and Inui, Kentaro
Human Sentence Processing: Recurrence or Attention? Merkx, Danny and Frank, Stefan L.
Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing Futrell, Richard and Gibson, Edward and Levy, Roger P
Lexicon The Psycho-biology of Language Zipf, G. K
Word lengths are optimized for efficient communication Piantadosi, Steven T. and Tily, Harry and Gibson, Edward
Info/information theory: speakers choose shorter words in predictive contexts Kyle Mahowald and Evelina Fedorenko and Steven T. Piantadosi and Edward Gibson
The Entropy of Words—Learnability and Expressivity across More than 1000 Languages Bentz, Christian and Alikaniotis, Dimitrios and Cysouw, Michael and Ferrer-i-Cancho, Ramon
How (Non-)Optimal is the Lexicon? Pimentel, Tiago and Nikkarinen, Irene and Mahowald, Kyle and Cotterell, Ryan and Blasi, Damián
Disambiguatory Signals are Stronger in Word-initial Positions Pimentel, Tiago and Cotterell, Ryan and Roark, Brian
Speakers Fill Lexical Semantic Gaps with Context Pimentel, Tiago and Hall Maudslay, Rowan and Blasi, Damián and Cotterell, Ryan
Language Generation If Beam Search is the Answer, What was the Question? Meister, Clara and Vieira, Tim and Cotterell, Ryan
Language Model Evaluation Beyond Perplexity Meister, Clara and Cotterell, Ryan
Parsing Mathematics as a Science of Patterns Michael D. Resnik
Syntactic dependencies correspond to word pairs with high mutual information Futrell, Richard and Qian, Peng and Gibson, Edward and Fedorenko, Evelina and Blank, Idan
Color Systems Efficient compression in color naming and its evolution Zaslavsky, Noga and Kemp, Charles and Regier, Terry and Tishby, Naftali
Color naming across languages reflects color use Gibson, Edward and Futrell, Richard and Jara-Ettinger, Julian and Mahowald, Kyle and Bergen, Leon and Ratnasingam, Sivalogeswaran and Gibson, Mitchell and Piantadosi, Steven T. and Conway, Bevil R.
Communicating artificial neural networks develop efficient color-naming systems Chaabouni, Rahma and Kharitonov, Eugene and Dupoux, Emmanuel and Baroni, Marco
Interpretability of Neural Networks Information-Theoretic Probing for Linguistic Structure Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan
Information-Theoretic Probing with Minimum Description Length Voita, Elena and Titov, Ivan