Large Language Models, Spring 2025
ETH Zürich: Course catalog
Course Description
Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. In this course, we start with the probabilistic foundations of language models, i.e., covering what constitutes a language model from a formal, theoretical perspective. We then discuss how to construct and curate training corpora, and introduce many of the neural-network architectures often used to instantiate language models at scale. The course discusses privacy and harms, as well as applications of language models in NLP and beyond.
Pre-requisites: While there are no formal pre-requisites for taking the course, we count on you being comfortable with probability theory, linear algebra, computational complexity, and machine learning.
Syllabus and Schedule
On the Use of Class Time
Lectures
There are two lecture slots for LLM each week:
In-person and Zoom
Both lectures will be given in person and live broadcast on Zoom; the password is available on the course Moodle page.
Recordings: Lectures will be recorded—links to the Zoom recordings will be posted on the course Moodle page.
Tutorials
Tutorials will take place Thursdays 16-18 in NO C 60 and on Zoom.
Syllabus
Date | Time | Module | Topic | Lecturer | Summary | Material | Reading |
---|---|---|---|---|---|---|---|
18. 2. 2025 | 1 hour | Introduction and Overview | Ryan/Mrinmaya/Florian | Introductory Slides | Course Notes, § 1 | ||
18. 2. 2025 | 1 hour | Probabilistic Foundations | Basic Measure Theory | Ryan |
Course Notes, §§ 2.1 and 2.2, Du et al. A Measure-Theoretic Characterization of Tight Language Models. |
||
21. 2. 2025 | 1 hour | Defining a Language Model | Ryan |
Course Notes, §§ 2.3 and 2.4, Du et al. A Measure-Theoretic Characterization of Tight Language Models |
|||
25. 2. 2025 | 2 hours | Tight Language Models | Ryan |
Course Notes, § 2.5, Du et al. A Measure-Theoretic Characterization of Tight Language Models, Chen, Yining, et al. Recurrent Neural Networks as Weighted Language Recognizers |
|||
28. 2. 2025 | 1 hour | Modeling Foundations | The Language Modeling Task | Ryan | Course Notes, § 3 | ||
4. 3. 2025 | 2 hours | Finite-State Language Models | Ryan |
Course Notes, § 4.1 Bengio, Yoshua, et al. A neural probabilistic language model, Sun, Simeng, et al. Revisiting Simple Neural Probabilistic Language Models. |
|||
7. 3. 2025 | 1 hours | Neural Network Modeling | Recurrent Neural Language Models | Ryan | Course Notes, §§ 5.1.1–5.1.4 | ||
11. 3. 2025 | 1 hours | Representational Capacity of RNN LMs | Ryan |
Course Notes, § 5.1.6, Svete et al., Recurrent Neural Language Models as Probabilistic Finite-state Automata., Nowak et al., On the Representational Capacity of Recurrent Neural Language Models., Siegelmann H. T. and Sontag E. D. On the computational power of neural nets. |
|||
11. 3. 2025 | 1 hour | Transformer-based Language Models | Ryan |
Course Notes, § 5.2, Radford et al., Language Models are Unsupervised Multitask Learners, Vaswani et al., Attention Is All You Need, The Illustrated Transformer, The Illustrated GPT-2, Transformer decoder (Wikipedia) |
|||
14. 3. 2025 | 1 hour | Transformer-based Language Models | Ryan | ||||
18. 3. 2025 | 1 hour | Representational Capacity of Transformer-based Language Models | Ryan | Course Notes, § 5.3 | |||
18. 3. 2025 | 1 hour | Modeling Potpourri | Tokenization | Ryan | |||
18. 3. 2025 | 1 hour | Generating Text from a Language Model | Ryan | ||||
21. 3. 2025 | 1 hour | Generating Text from a Language Model | Ryan | ||||
25. 3. 2025 | 2 hours | Training, Fine Tuning and Inference | Transfer Learning | Mrinmaya | Slides | ||
28. 3. 2025 | 1 hour | Training, Fine Tuning and Inference | Parameter efficient finetuning | Mrinmaya | Slides | ||
1. 4. 2025 | 2 hours | In-context learning, Prompting, zero-shot, instruction tuning | Mrinmaya | Slides | |||
4. 4. 2025 | 1 hour | Applications and the Benefits of Scale | In-context learning, Prompting, zero-shot, instruction tuning | Mrinmaya | Slides | ||
8. 4. 2025 | 2 hours | Multimodality | Mrinmaya | Slides | |||
11. 4. 2025 | 1 hour | Retrieval augmented Language Models | Mrinmaya | Slides | |||
15. 4. 2025 | 2 hours | Reinforcement learning for reasoning and inference-time compute | Mrinmaya | ||||
Easter Break | |||||||
29. 4. 2025 | 2 hours | Applications and the Benefits of Scale | Instruction tuning and RLHF | Mrinmaya | Slides | ||
2. 5. 2025 | 1 hour | Security | Harms & Ethics | Florian | Slides | Bai et al. Constitutional AI: Harmlessness from AI Feedback | |
6. 5. 2025 | 2 hours | Security & Adversarial examples | Florian | Slides | Carlini et al. Are aligned neural networks adversarially aligned?, Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models | ||
9. 5. 2025 | 1 hour | Prompt injections | Florian | Slides | Greshake et al. Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | ||
13. 5. 2025 | 2 hours | Data poisoning, backdoors and model stealing | Florian | Slides | Carlini et al. Poisoning Web-Scale Training Datasets is Practical, Wallace et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems | ||
16. 5. 2025 | 1 hour | Privacy in ML | Florian | Slides | Carlini et al. Is Private Learning Possible with Instance Encoding?, Fowl et al. Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models | ||
20. 5. 2025 | 2 hours | Memorization + Differential Privacy | Florian | Slides | Nasr et al. Scalable Extraction of Training Data from (Production) Language Models, Abadi et al. Deep Learning with Differential Privacy | ||
23. 5. 2025 | 1 hour | Data lifecycle | Florian | Slides | Gebru et al. Datasheets for Datasets | ||
27. 5. 2025 | 2 hours | Explainability, Interpretability, AI Safety | Florian | Slides | Meng et al. Locating and Editing Factual Associations in GPT, Li et al. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task | ||
30. 5. 2025 | 1 hour | Guest Lecture: TBD | TBD, Florian |
Tutorial Schedule
Week | Date | Topic | Teaching Assistant | Material |
---|---|---|---|---|
1 | 20. 2. 2025 | Course Logistics (1 hour) | Anej | Introduction Slides |
2 | 27. 2. 2025 | Fundamentals of Natural Language Processing and Language Modeling, Measure Theory, Generation |
Irene | Exercises, Exercises with solutions | , iPad Notes
3 | 6. 3. 2025 | Classical Language Models: $n$-grams and Context-free Grammars | Vicky | Exercises, Exercises with solutions |
4 | 13. 3. 2025 | RNN Language Models | Kári | Exercises, Exercises with solutions |
5 | 20. 3. 2025 | Transformer Language Models | Eren | Exercises, Exercises with solutions, Jupyter Notebook |
6 | 27. 3. 2025 | Tokenization and Generation | Manuel | Exercises, Exercises with solutions, Slides |
7 | 3. 4. 2025 | Assignment 1 Q&A | Anej, Irene, Vicky, Manuel, Eren | |
8 | 10. 4. 2025 | Common pre-trained language models, Parameter-efficient fine-tuning | Dmitrii | Google Colab Notebook, Transformer Architecture Drawing |
9 | 24. 4. 2025 | Retrieval-augmented generation | Maxim | Google Colab Notebook, Slides |
10 | 1. 5. 2025 | No tutorials: Labour Day | Exercises, Exercises with solutions | |
11 | 8. 5. 2025 | Prompting, Chain-of-Thought Reasoning | ||
12 | 15. 5. 2025 | Decoding and Watermarking | Matej | Exercises, Exercises with solutions |
13 | 22. 5. 2025 | Assignment 2 Q&A, Assignment 3 Q&A | Maxim, Dmitrii, Kiril, Kári, Matej | |
14 | 29. 5. 2025 | No tutorials: Ascension Day |
Organization
Moodle as a Communications and Questions-answering Platform
We will use the course Moodle page for course communications and as a place where you can ask questions to the teaching staff. There are several forums you can use to ask specific questions and we encourage you to take advantage of that. We aim to response quickly.
Course Notes
We prepared an extensive set of course notes for the course last semester. We will be improving them as we go this semester as well. Please report all errata you find in the course notes to the teaching staff in the Errata Google document linked on the course Moodle page.
Links to the course notes:
Other useful literature:
- Ryan’s iPad notes
- ESSLLI 2023 Tutorial on the Expressivity of Neural Networks
- ESSLLI 2024 Tutorial on the Expressivity of Transformers
- Introduction to Natural Language Processing (Eisenstein)
- Deep Learning (Goodfellow, Bengio and Courville)
- AFLT Course Notes
Grading
Marks for the course will be determined by the following formula:
- 50% Final Exam
- 50% Assignments
Exam
The final exam is comprehensive and should be assumed to cover all the material in the slides and class notes. The date is determined by the ETH examinations office centrally and will be announced towards the end of the semester.
Remote exams: ETH offers a centralized system for taking exams remotely if you are an exchange student or under specific circumstances for ETH students as well. To find out more and arrange a remote exam, please follow the instructions on remote examinations here.
Exam review: After the grades have been announced, you will be able to sign up to the exam review session, which we will offer sometime in the first three weeks of the semester. During the session, you will have the opportunity to review your exam and assignments and understand how they were graded. You will also be able to take notes about the exam and solution, but no copies or photos can be taken. To sign up, we will publish a Google form after the grading conference. Note that we offer only one review session, so individual (or remote) sessions are not possible. See also here for more information about exam reviews in general.
Assignments
There will be three larger assignments in the course. Assignments are individual work, and you are expected to submit your own solutions—solutions that you wrote up yourself and did not copy from any of your peers. Each assignment might, however, follow a different policy on collaboration when it comes to discussing the problems with your peers—please refer to the specific assignment instructions for details.
We require the solutions to be properly typeset.
We recommend using LaTeX
(with Overleaf
; see a submission template below), but markdown
files with something like MathJax
for the mathematical expressions are also fine.
Important: The overleaf template includes a declaration of originality which you should copy into your submission, so make sure you check out the submission template even if you don’t use it for your submission!
The first assignment will be of more theoretical nature and will be released shortly after the start of the semester. Assignments 2 and 3 will be of more practical nature and will be released in the second half of the semester.
Each of the three assignments contribute 1/3 to the final assignment grade (that is, the assignment grade will be the average of the three individual assignment grades; see the individual assignment instructions for the grading scales).
Assignment instructions:
- Assignment 1 Instructions
- Assignment 1 Submission Template.
While not strictly necessary, we highly advise you use this template when preparing your submission. It also includes a large number of LaTeX macros which can make your writing faster and easier to read.
Important: Even if you don’t use this template, you should copy the Declaration of originality from the front page into your own submission!
- Assignment 1 Submission Template.
While not strictly necessary, we highly advise you use this template when preparing your submission. It also includes a large number of LaTeX macros which can make your writing faster and easier to read.
Important: Even if you don’t use this template, you should copy the Declaration of originality from the front page into your own submission!
- Assignment 2 Instruction (last year)
- Assignment 2 will be released at the end of March and will likely be due on May 15th
- Assignment 3 Instructions (last year)
Assignment Deadlines
You will submit your assignments via Moodle.
- Assignment 1 is due on Wednesday, April 30th at 23:59.
- The preliminary deadline for Assignment 3 is Thursday, May 15th at 23:59.
- The details for Assignment 3 will be released later.
Please be proactive with your time management and start early. Barring exceptional circumstances that do not only affect the last two weeks before the deadline (e.g., prolonged illness, family emergency, or severe mistakes in the assignment setup), we will not accept requests for deadline extensions—neither individual nor group requests. Late submissions will not be graded.