Information Theory in Linguistics: Methods and Applications (COLING 2022)
COLING 2022: October 12-17, 2022
Course Description
Since Shannon originally proposed his mathematical theory of communication in the middle of the 20th century, information theory has been an important way of viewing and investigating problems at the interfaces between linguistics, cognitive science, and computation, respectively. With the upsurgence in applying machine learning approaches to linguistics questions, information-theoretic methods are becoming an ever more important tool in the linguist’s toolbox. This cutting-edge tutorial, which draws on the work of many different researchers, emphasizes interdisciplinary connections between the fields of linguistics and natural language processing. We plan to do this by reviewing the mathematical basis of information theory. We then show it can be fruitfully applied to several linguistic applications, ranging from semantics, typology, morphology, phonotactics, and to the interface between cognitive science and linguistics. We then discuss recent research—spanning fields from psycholinguistics to machine learning—that have made progress in the analysis of natural language using these techniques. Throughout the tutorial, we will provide hands-on exercises that allow you to put theory into practice in linguistic applications.
Syllabus
Module 1 | Introduction and Background | Slides | Colab Notebook |
---|---|---|---|
Module 2 | Case Studies in Complexity | Slides | Colab Notebook |
Module 3 | Case Studies in Correlation | Slides | |
Module 4 | Case Studies in Communication | Slides | |
Conclusion | Concluding Remarks | Slides |
Literature
Information Theory Background
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. 2006. Wiley-Interscience, USA.
Statistics Background
Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics. 2001. Prentice Hall, USA.
Recent Papers (by topic)
Topic | Title | Authors | Bib |
---|---|---|---|
Entropy Estimation | Estimating Discrete Entropy Part 1 | Sebastian Nowozin | |
Estimating Discrete Entropy Part 2 | Sebastian Nowozin | ||
Estimating Discrete Entropy Part 3 | Sebastian Nowozin | ||
Jackknifing An Index of Diversity | Samuel Zahl | ||
Estimating functions of probability distributions from a finite set of samples | David H. Wolpert and David R. Wolf | ||
Distribution of Mutual Information | Hutter, Marcus | ||
Entropy and Inference, Revisited | Nemenman, Ilya and Shafee, F. and Bialek, William | ||
Estimation of Entropy and Mutual Information | Paninski, Liam | ||
Bayesian Entropy Estimation for Countable Discrete Distributions | Evan Archer and Il Memming Park and Jonathan W. Pillow | ||
Estimating the Entropy of Linguistic Distributions | Aryaman Arora and Clara Isabel Meister and Ryan Cotterell | ||
Arbitrariness of the Sign | Meaning to Form: Measuring Systematicity as Information | Pimentel, Tiago and McCarthy, Arya D. and Blasi, Damian and Roark, Brian and Cotterell, Ryan | |
Finding Concept-specific Biases in Form--Meaning Associations | Pimentel, Tiago and Roark, Brian and Wichmann, Søren and Cotterell, Ryan and Blasi, Damián | ||
Morphology | Predicting Declension Class from Form and Meaning | Williams, Adina and Pimentel, Tiago and Blix, Hagen and McCarthy, Arya D. and Chodroff, Eleanor and Cotterell, Ryan | |
Quantifying the Semantic Core of Gender Systems | Williams, Adina and Blasi, Damián and Wolf-Sonkin, Lawrence and Wallach, Hanna and Cotterell, Ryan | ||
Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions | McCarthy, Arya D. and Williams, Adina and Liu, Shijia and Yarowsky, David and Cotterell, Ryan | ||
Morphological Irregularity Correlates with Frequency | Wu, Shijie and Cotterell, Ryan and O'Donnell, Timothy | ||
On the Complexity and Typology of Inflectional Morphological Systems | Cotterell, Ryan and Kirov, Christo and Hulden, Mans and Eisner, Jason | ||
Human Language Processing | Predictive power of word surprisal for reading times is a linear function of language model quality | Goodkind, Adam and Bicknell, Klinton | |
Evaluating information-theoretic measures of word prediction in naturalistic sentence reading | Aurnhammer, Christoph and Frank, Stefan L | ||
A Cognitive Regularizer for Language Modeling | Wei, Jason and Meister, Clara and Cotterell, Ryan | ||
Lower Perplexity is Not Always Human-Like | Kuribayashi, Tatsuki and Oseki, Yohei and Ito, Takumi and Yoshida, Ryo and Asahara, Masayuki and Inui, Kentaro | ||
Human Sentence Processing: Recurrence or Attention? | Merkx, Danny and Frank, Stefan L. | ||
Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing | Futrell, Richard and Gibson, Edward and Levy, Roger P | ||
Lexicon | The Psycho-biology of Language | Zipf, G. K | |
Word lengths are optimized for efficient communication | Piantadosi, Steven T. and Tily, Harry and Gibson, Edward | ||
Info/information theory: speakers choose shorter words in predictive contexts | Kyle Mahowald and Evelina Fedorenko and Steven T. Piantadosi and Edward Gibson | ||
The Entropy of Words—Learnability and Expressivity across More than 1000 Languages | Bentz, Christian and Alikaniotis, Dimitrios and Cysouw, Michael and Ferrer-i-Cancho, Ramon | ||
How (Non-)Optimal is the Lexicon? | Pimentel, Tiago and Nikkarinen, Irene and Mahowald, Kyle and Cotterell, Ryan and Blasi, Damián | ||
Disambiguatory Signals are Stronger in Word-initial Positions | Pimentel, Tiago and Cotterell, Ryan and Roark, Brian | ||
Speakers Fill Lexical Semantic Gaps with Context | Pimentel, Tiago and Hall Maudslay, Rowan and Blasi, Damián and Cotterell, Ryan | ||
Language Generation | If Beam Search is the Answer, What was the Question? | Meister, Clara and Vieira, Tim and Cotterell, Ryan | |
Language Model Evaluation Beyond Perplexity | Meister, Clara and Cotterell, Ryan | ||
Parsing | Mathematics as a Science of Patterns | Michael D. Resnik | |
Syntactic dependencies correspond to word pairs with high mutual information | Futrell, Richard and Qian, Peng and Gibson, Edward and Fedorenko, Evelina and Blank, Idan | ||
Color Systems | Efficient compression in color naming and its evolution | Zaslavsky, Noga and Kemp, Charles and Regier, Terry and Tishby, Naftali | |
Color naming across languages reflects color use | Gibson, Edward and Futrell, Richard and Jara-Ettinger, Julian and Mahowald, Kyle and Bergen, Leon and Ratnasingam, Sivalogeswaran and Gibson, Mitchell and Piantadosi, Steven T. and Conway, Bevil R. | ||
Communicating artificial neural networks develop efficient color-naming systems | Chaabouni, Rahma and Kharitonov, Eugene and Dupoux, Emmanuel and Baroni, Marco | ||
Interpretability of Neural Networks | Information-Theoretic Probing for Linguistic Structure | Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan | |
Information-Theoretic Probing with Minimum Description Length | Voita, Elena and Titov, Ivan |