Neural Networks and Computational Complexity
ETH Zürich: Fall 2024
Course Description
This Bachelor’s seminar delves into the fascinating world of modern large language models (LLMs), which have revolutionized natural language processing. As these models continue to evolve and impact various domains, we will explore their potential, limitations, and underlying mechanisms through a theoretical lens. Throughout the seminar, we will address the following key questions: what are the real capabilities of large language models? What are their inherent limitations? How do these models function at a fundamental level? Under what circumstances are they likely to fail? Can we develop a comprehensive “science of LLMs” to address these inquiries? We will leverage formal language theory to provide a rigorous framework for understanding the representational capacity of neural language models.
Time: Friday 14-16h
Location: CHN D 44
Additional Material
IMPORTANT!
When you send an e-mail, please ALWAYS put “Bachelor’s Seminar” in the object!
Course Schedule (Work in Progress)
Week | Date | Topic | Presenter | Reading |
---|---|---|---|---|
1 | 20.09.24 | Intro | ||
2 | 27.09.24 | Language Models & FLT | Only Lecture | |
3 | 4.10.24 | RNNs and FSAs | Mary, Jakob, Pierre | Svete et al. (2024), Svete et al. (2024), |
4 | 11.10.24 | Counter Machines and the LSTM | Tom, Simon, Julius | Weiss et al. (2017), |
5 | 18.10.24 | RNNs and Turing Machines | Sasha, Ben, Torban | Nowak et al. (2023), Siegelmann and Sontag (1992), |
6 | 25.10.24 | The Transformer | Sarah, Alexander, Leon | Vaswani et al. (2017), Bahdanu et al. (2014), |
7 | 1.11.24 | The Transformer is Turing Complete | Perez et al. (2017), | |
8 | 8.11.24 | EMNLP paper | Pasti et al. (20204) | |
9 | 15.11.24 | No Lecture (EMNLP) | ||
8 | 22.11.24 | The Transformer is Turing (In)Complete | Mischa,Renne,Jesse | Hahn (2020) |
10 | 29.11.24 | The Transformer with Chain of Thought | Simon, Christian | Merril and Sabharwal (2024) |
11 | 6.12.24 | Circuit Complexity of The Transformer (I) | Aurelian, Erdem | Strobl et al. (2024) (Survey) |
12 | 13.12.24 | Circuit Complexity of The Transformer (II) | Raphael | Hao et al. (2022) |
12 | 20.12.24 | What can Attention Learn? | Meri | Yau et al. (2024) |