Information Theory in Linguistics: Methods and Applications

ESSLLI 2021: Week 2 (August 2-6)

Course Description

Since Shannon originally proposed his mathematical theory of communication in the middle of the 20th century, information theory has been an important way of viewing and investigating problems at the interfaces between linguistics, cognitive science, and computation, respectively. With the upsurgence in applying machine learning approaches to linguistics questions, information-theoretic methods are becoming an ever more important tool in the linguist’s toolbox. The course emphasizes interdisciplinary connections between the fields of linguistics and natural language processing. We plan to do this by first establishing a firm mathematical basis, and showing it can be fruitfully applied to several linguistic applications, ranging from semantics, typology, morphology, and phonotactics, to the interface between cognitive science and linguistics.

Syllabus

Lecture 1	Introduction and Overview	Slides
Lecture 2	Estimating Information-Theoretic Quantities	Slides	iPython Notebook
Lecture 3	Case Studies in Complexity	Slides
Lecture 4	Case Studies in Correlation	Slides
Lecture 5	Case Studies in Communication	Slides

Literature

Information Theory Background

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. 2006. Wiley-Interscience, USA.

Statistics Background

Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics. 2001. Prentice Hall, USA.

Recent Papers (by topic)

Topic	Title	Authors
Entropy Estimation	Estimating Discrete Entropy Part 1	Sebastian Nowozin
	Estimating Discrete Entropy Part 2	Sebastian Nowozin
	Estimating Discrete Entropy Part 3	Sebastian Nowozin
	Jackknifing An Index of Diversity	Samuel Zahl
	Estimating functions of probability distributions from a finite set of samples	David H. Wolpert and David R. Wolf
	Distribution of Mutual Information	Hutter, Marcus
	Entropy and Inference, Revisited	Nemenman, Ilya and Shafee, F. and Bialek, William
	Estimation of Entropy and Mutual Information	Paninski, Liam
	Bayesian Entropy Estimation for Countable Discrete Distributions	Evan Archer and Il Memming Park and Jonathan W. Pillow
Arbitrariness of the Sign	Meaning to Form: Measuring Systematicity as Information	Pimentel, Tiago and McCarthy, Arya D. and Blasi, Damian and Roark, Brian and Cotterell, Ryan
	Finding Concept-specific Biases in Form--Meaning Associations	Pimentel, Tiago and Roark, Brian and Wichmann, Søren and Cotterell, Ryan and Blasi, Damián
Morphology	Predicting Declension Class from Form and Meaning	Williams, Adina and Pimentel, Tiago and Blix, Hagen and McCarthy, Arya D. and Chodroff, Eleanor and Cotterell, Ryan
	Quantifying the Semantic Core of Gender Systems	Williams, Adina and Blasi, Damián and Wolf-Sonkin, Lawrence and Wallach, Hanna and Cotterell, Ryan
	Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions	McCarthy, Arya D. and Williams, Adina and Liu, Shijia and Yarowsky, David and Cotterell, Ryan
	Morphological Irregularity Correlates with Frequency	Wu, Shijie and Cotterell, Ryan and O'Donnell, Timothy
	On the Complexity and Typology of Inflectional Morphological Systems	Cotterell, Ryan and Kirov, Christo and Hulden, Mans and Eisner, Jason
Human Language Processing	Predictive power of word surprisal for reading times is a linear function of language model quality	Goodkind, Adam and Bicknell, Klinton
	Evaluating information-theoretic measures of word prediction in naturalistic sentence reading	Aurnhammer, Christoph and Frank, Stefan L
	A Cognitive Regularizer for Language Modeling	Wei, Jason and Meister, Clara and Cotterell, Ryan
	Lower Perplexity is Not Always Human-Like	Kuribayashi, Tatsuki and Oseki, Yohei and Ito, Takumi and Yoshida, Ryo and Asahara, Masayuki and Inui, Kentaro
	Human Sentence Processing: Recurrence or Attention?	Merkx, Danny and Frank, Stefan L.
	Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing	Futrell, Richard and Gibson, Edward and Levy, Roger P
Lexicon	The Psycho-biology of Language	Zipf, G. K
	Word lengths are optimized for efficient communication	Piantadosi, Steven T. and Tily, Harry and Gibson, Edward
	Info/information theory: speakers choose shorter words in predictive contexts	Kyle Mahowald and Evelina Fedorenko and Steven T. Piantadosi and Edward Gibson
	The Entropy of Words—Learnability and Expressivity across More than 1000 Languages	Bentz, Christian and Alikaniotis, Dimitrios and Cysouw, Michael and Ferrer-i-Cancho, Ramon
	How (Non-)Optimal is the Lexicon?	Pimentel, Tiago and Nikkarinen, Irene and Mahowald, Kyle and Cotterell, Ryan and Blasi, Damián
	Disambiguatory Signals are Stronger in Word-initial Positions	Pimentel, Tiago and Cotterell, Ryan and Roark, Brian
	Speakers Fill Lexical Semantic Gaps with Context	Pimentel, Tiago and Hall Maudslay, Rowan and Blasi, Damián and Cotterell, Ryan
Language Generation	If Beam Search is the Answer, What was the Question?	Meister, Clara and Vieira, Tim and Cotterell, Ryan
	Language Model Evaluation Beyond Perplexity	Meister, Clara and Cotterell, Ryan
Parsing	Mathematics as a Science of Patterns	Michael D. Resnik
	Syntactic dependencies correspond to word pairs with high mutual information	Futrell, Richard and Qian, Peng and Gibson, Edward and Fedorenko, Evelina and Blank, Idan
Color Systems	Efficient compression in color naming and its evolution	Zaslavsky, Noga and Kemp, Charles and Regier, Terry and Tishby, Naftali
	Color naming across languages reflects color use	Gibson, Edward and Futrell, Richard and Jara-Ettinger, Julian and Mahowald, Kyle and Bergen, Leon and Ratnasingam, Sivalogeswaran and Gibson, Mitchell and Piantadosi, Steven T. and Conway, Bevil R.
	Communicating artificial neural networks develop efficient color-naming systems	Chaabouni, Rahma and Kharitonov, Eugene and Dupoux, Emmanuel and Baroni, Marco
Interpretability of Neural Networks	Information-Theoretic Probing for Linguistic Structure	Pimentel, Tiago and Valvoda, Josef and Hall Maudslay, Rowan and Zmigrod, Ran and Williams, Adina and Cotterell, Ryan
	Information-Theoretic Probing with Minimum Description Length	Voita, Elena and Titov, Ivan

Information Theory in Linguistics: Methods and Applications

Course Description

Syllabus

Literature

Information Theory Background

Statistics Background

Recent Papers (by topic)

TAs

Adina Williams

Research Scientist

Facebook AI Research NYC

Clara Meister

PhD Student

ETH Zürich

Franz Nowak

PhD Student

ETH Zürich

Lucas Torroba Hennigen

PhD Student

MIT

Richard Futrell

Associate Professor

University of California, Irvine

Ryan Cotterell

Assistant Professor of Computer Science

ETH Zürich

Tiago Pimentel

PhD Student

ETH Zürich