Large Language Models, Spring 2023

ETH Zürich: Course catalog

Course Description

Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. In this course, we offer a self-contained introduction to language modeling and its applications. We start with the probabilistic foundations of language models, i.e., covering what constitutes a language model from a formal, theoretical perspective. We then discuss how to construct and curate training corpora, and introduce many of the neural-network architectures often used to instantiate language models at scale. The course covers aspects of systems programming, discussion of privacy and harms, as well as applications of language models in NLP and beyond.


3. 1. 2023   Class website is online!
20. 2. 2023   Update on the previous announcement from January 30th: the Large Language Models course can count towards the core elective courses for the Data Science master’s program, rather than the core courses. Indeed, the course is now listed as a core elective course for the Data Science master’s program, so no additional action is required upon registering for the course through MyStudies.
20. 2. 2023   First draft of the notes for the first part of the course is online!
24. 2. 2023   The iPad class notes have been posted. The same link will contain updated notes for the first part of the course throughout the semester.
9. 3. 2023   The first part of the first assignment has been released!
25. 4. 2023   First draft of the notes for the second part of the course is online!
25. 5. 2023   The second assignment has been released together with the LaTeX source code!

Syllabus and Schedule

On the Use of Class Time


There are two lecture slots for LLM each week: the first one on Tuesdays 14-16 in CAB G 61 and the second one on Fridays 10-11 in CAB G 61.

Both lectures will be given in person and live broadcast on Zoom; the password is available on the course Moodle page.

Lectures will be recorded—links to the Zoom recordings will be posted on the course Moodle page.

Discussion Sections

Discussion sections (tutorials) will take place Thursdays 16-18 in NO C 60 and on Zoom (same link as the lectures).


Date Time Module Topic Lecturer Summary Material Reading
21. 2. 2023 1 hour Introduction and Overview Ryan/Mrinmaya/Ce/Florian Introductory Slides
21. 2. 2023 1 hour Probabilistic Foundations Basic Measure Theory Ryan Du, Li, et al. A Measure-Theoretic Characterization of Tight Language Models. arXiv, 2022.
24. 2. 2023 1 hour Defining a Language Model Ryan
28. 2. 2023 2 hours Tight Language Models Ryan Du, Li, et al. A Measure-Theoretic Characterization of Tight Language Models. arXiv, 2022., Chen, Yining, et al. Recurrent Neural Networks as Weighted Language Recognizers. arXiv, 2017.
3. 3. 2023 1 hour Modeling Foundations The Language Modeling Task Ryan
7. 3. 2023 2 hours Finite-State Language Models Ryan Bengio, Yoshua, et al. A neural probabilistic language model. J. Mach. Learn. Res., 2003.
10. 3. 2023 1 hour Pushdown Language Models Ryan
14. 3. 2023 2 hours Neural Network Modeling Recurrent Neural Language Models Ryan
17. 3. 2023 1 hour Variants of RNNLMs Ryan
21. 3. 2023 2 hours Representational Capacity of RNNLMs Ryan Siegelmann H. T. and Sontag E. D. On the computational power of neural nets. Computational learning theory. 1992.
24. 3. 2023 1 hour Transformer-based Language Models Ryan
28. 3. 2023 2 hours Efficient Attention Ryan
31. 3. 2023 1 hour Representational Capacity of Transformer-based Language Models Ryan
4. 4. 2023 2 hours Modeling Potpourri Tokenization Ryan
Easter Break
18. 4. 2023 2 hours Modeling Potpourri Generating Text from a Language Model Ryan
21. 4. 2023 1 hour Training, Fine Tuning and Inference Transfer Learning Mrinmaya Slides
25. 4. 2023 2 hours Parameter efficient finetuning Mrinmaya Slides
28. 4. 2023 1 hour Prompting and zero-shot inference Mrinmaya Slides
2. 5. 2023 2 hours Parallelism and Scaling up Scaling up Ce Slides
5. 5. 2023 1 hour Parallelism Ce Slides
9. 5. 2023 2 hours Applications and the Benefits of Scale Multimodality Mrinmaya Slides
12. 5. 2023 1 hour Additional Topics Mrinmaya Slides
16. 5. 2023 2 hours Analysis Analysis and Probing Tiago/Ryan Slides
19. 5. 2023 1 hour Cognitive Modeling Ethan/Alex/Ryan
23. 5. 2023 2 hours Security and Misuse Security and Misuse Florian Slides
26. 5. 2023 1 hour Harms and Ethical Concerns Florian Slides
30. 5. 2023 2 hours Memorization and Privacy Florian
2. 6. 2023 1 hour The data lifecycle Florian


Live Chat

In addition to class time, there will also be a RocketChat-based live chat hosted on ETH’s servers. Students are free to ask questions of the teaching staff and of others in public or private (direct message). There are specific channels for each of the two assignments as well as for reporting errata in the course notes and slides. All data from the chat will be deleted from ETH servers at the course’s conclusion.

Important: There are a few important points you should keep in mind about the course live chat:

  1. RocketChat will be the main communications hub for the course. You are responsible for receiving all messages broadcast in the RocketChat.
  2. Your username should be firstname.lastname. This is required as we will only allow enrolled students to participate in the chat and we will remove users which we cannot validate.
  3. Tag your questions as described in the document on How to use Rycolab Course RocketChat channels. The document also contains other general remarks about the use of RocketChat.
  4. Search for answers in the appropriate channels before posting a new question.
  5. Ask questions on public channels as much as possible.
  6. Answer to posts in threads.
  7. The chat supports LaTeX for easier discussion of technical material. See How to use LaTeX in RocketChat.
  8. We highly recommend you download the desktop app here.

This is the link to the main channel. To make the moderation of the chat more easily manageable, we have created a number of other channels on RocketChat. The full list is:

If you feel like you would benefit from any other channel, feel free to suggest it to the teaching team!

Course Notes

We will prepare the course lecture notes as we go! The individual chapters will be published in the course syllabus and updated throughout the semester. Please report all errata to the teaching staff; we created an errata channel in RocketChat.

Links to the course notes:

Other useful literature:


Marks for the course will be determined by the following formula:

  • 50% Final Exam
  • 50% Assignments

On the Final Exam

The final exam is comprehensive and should be assumed to cover all the material in the slides and class notes.

On the Class Assignments

There will be 2 larger assignments in the course.

We require the solutions to be properly typeset. We recommend using LaTeX (with Overleaf), but markdown files with MathJax for the mathematical expressions are also fine.

The first assignment will be of more theoretical nature and will be released shortly after the start of the semester.

Assignment instructions sheets:

Assignment Deadlines

The first assignment will be due on Tuesday, August 15th at 23:59.

Large Language Models Lecturers


Ce Zhang

Assistant Professor in Computer Science

ETH Zürich


Florian Tramèr

Assistant Professor in Computer Science

ETH Zürich


Mrinmaya Sachan

Assistant Professor in Computer Science

ETH Zürich


Ryan Cotterell

Assistant Professor of Computer Science

ETH Zürich

Large Language Models Guest Lecturers

Large Language Models Teaching Assistants


Luca Malagutti

Master’s Student

ETH Zürich


Tianyu Liu

PhD Student/Web Meister

ETH Zürich