Large Language Models, Spring 2026

Course Description

Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. In this course, we start with the probabilistic foundations of language models, i.e., covering what constitutes a language model from a formal, theoretical perspective. We then discuss how to construct and curate training corpora, and introduce many of the neural-network architectures often used to instantiate language models at scale. The course discusses privacy and harms, as well as applications of language models in NLP and beyond.

Pre-requisites: While there are no formal pre-requisites for taking the course, we count on you being comfortable with probability theory, linear algebra, computational complexity, and machine learning.

Syllabus and Schedule

On the Use of Class Time

Lectures

There are two lecture slots for LLMs each week (3 hours total):

Tuesdays 14:15-16:00 in HG E 3
Fridays 10:15-11:00 in CAB G 61

In-person and Zoom

Lectures will be given in person and live broadcast on Zoom; the password is available on the course Moodle page.

Recordings: Lectures will be recorded—links to the Zoom recordings will be posted on the course Moodle page.

Tutorials

Tutorials will take place Thursdays 16-18 in NO C 60 and on Zoom.

Syllabus

Date	Time	Module	Topic	Lecturer	Summary	Material	Reading
17. 2. 2026	1 hour		Introduction and Overview	Ryan	The lecturers will contextualize large language models in NLP and computer science more broadly. Thereby, we will also motivate why the topic necessitates a separate course. We will also go over the course schedule and logistics.	Introductory Slides	Course Notes, § 1
17. 2. 2026	1 hour	Modeling Foundations	Defining a Language Model	Ryan	Language modeling is about placing probability on infinite sets of strings. We introduce the notion of tightness and give a formal definition of a language model.		Course Notes, §§ 2–3, Du et al. A Measure-Theoretic Characterization of Tight Language Models
20. 2. 2026	1 hour	Modeling Foundations	The Language Modeling Task	Ryan	In this lecture, we introduce the language modeling task, which we define to be any attempt to learn a language model from finite data. We will discuss various objectives that one might wish to optimize to induce a language model from data. We also discuss various regularization techniques and their use in combatting overfitting.		Course Notes, § 3
24. 2. 2026	2 hours	Classical Language Models	Finite-State Language Models	Anej	Finite-state language models have a storied history in NLP. They are a natural generalization of n-gram models, which were the standard in the field from the 1980s till the late 2010s. In terms of theory, we introduce probabilistic finite-state automata as a generalization of finite-state automata from classic theory of computation. Additionally, we give a simple, closed-form characterization of tightness. We also show how Bengio et al. (2003), the first successful neural language model, is naturally viewed as a probabilistic finite-state automaton.		Course Notes, § 4.1 Bengio, Yoshua, et al. A neural probabilistic language model, Sun, Simeng, et al. Revisiting Simple Neural Probabilistic Language Models.
27. 2. 2026	1 hour	Classical Language Models	Recurrent Neural Language Models	Anej	Finite-state language models, by construction, can only look at a finite amount of context. Recurrent neural networks are a formalism that overcomes this limitation. In this lecture, we give a formal definition of a recurrent neural language model (RNNLM). We give examples of tight and non-tight RNN LMs as well as characterize the vanishing gradient problem.		Course Notes, §§ 5.1.1–5.1.4
3. 3. 2026	2 hours	Neural Network Modeling	Representational Capacity of RNN LMs	Alexandra	In this lecture, we explore the representational capacity of RNN LMs. We show that, if the activation function is a hard thresholding operation, then RNN LMs have the same expressive capacity as a finite-state LM. However, we show that RNN LMs can implicitly represent finite-state LMs that are much larger. Additionally, if the activation function is a saturated sigmoid or a ReLu and we assume infinite precision arithmetic, we show how an RNN can emulate a Turing machine.		Course Notes, § 5.1.6, Svete et al., Recurrent Neural Language Models as Probabilistic Finite-state Automata., Nowak et al., On the Representational Capacity of Recurrent Neural Language Models., Siegelmann H. T. and Sontag E. D. On the computational power of neural nets.
6. 3. 2026	1 hour		No lecture
10. 3. 2026	2 hours		Transformer-based Language Models	Tianyu	Introduced in 2017 by Vaswani et al., Transformers have quickly become the most popular architecture for neural language modeling. They are the basis for recent large language models, e.g., GPT-3 and PaLM. This lecture gives the definition of a Transformer and overviews details, e.g., residual connections, layer normalization, and position embeddings.		Course Notes, § 5.2, Radford et al., Language Models are Unsupervised Multitask Learners, Vaswani et al., Attention Is All You Need, The Illustrated Transformer, The Illustrated GPT-2, Transformer decoder (Wikipedia)
13. 3. 2026	1 hour		Representational Capacity of Transformer-based Language Models	Irene	Inspired by the Turing completeness of RNNs, we study the representational capacity of Transformers. Although the connection to automata is not as straight-forward as with RNNs, we discuss how to think about Transformers as formal models and show that, assuming an unbounded number of layers and infinite precision, Transformers are Turing complete.		Course Notes, § 5.3
17. 3. 2026	2 hours	Modeling Potpourri	Tokenization	Manuel	Throughout the class, we have assumed access to the alphabet Σ. This lecture discusses how we should choose Σ. We discuss various facts about natural language that influence Σ, e.g., morphology and syntax. Then, we introduce the byte-pair encoding algorithm, an automatic procedure for inducing Σ, and give a analyze of its correctness and runtime.
20. 3. 2026	1 hour	Modeling Potpourri	Generating Text from a Language Model	Robin	A popular use case for language modeling is the generation of text. This lecture overviews various strategies for deterministically and stochastically generating text. We discuss beam search, ancestral sampling, as well as various sampling adaptors, e.g., top-k, nucleus, and locally typical sampling.	Slides
24. 3. 2026	2 hours	Transfer Learning and Fine-tuning	Transfer Learning	Mrinmaya		Slides
27. 3. 2026	1 hour		Parameter Efficient Finetuning	Mrinmaya		Slides
31. 3. 2026	2 hours		Parameter Efficient Finetuning	Mrinmaya
		Easter Break
14. 4. 2026	2 hours	Prompting and In-context Learning	In-context Learning, Prompting, Zero-shot, Instruction Tuning	Mrinmaya		Slides
17. 4. 2026	1 hour	Prompting and In-context Learning	Multimodality	Mrinmaya		Slides
21. 4. 2026	2 hours	Retrieval and Reasoning	Retrieval Augmented Language Models	Mrinmaya		Slides
24. 4. 2026	1 hour	Retrieval and Reasoning	Reinforcement Learning for Reasoning and Inference-time Compute	Mrinmaya
28. 4. 2026	2 hours	Alignment	Instruction Tuning and RLHF	Mrinmaya		Slides
1. 5. 2026	1 hour	TBD	TBD	Mrinmaya
5. 5. 2026	2 hours	Evaluation	Evaluations and Benchmarks	Vilem & Mubashara
8. 5. 2026	1 hour	Security	Security, Adversarial Examples, and Watermarks	Avital			Carlini et al. Are aligned neural networks adversarially aligned?, Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models
12. 5. 2026	2 hours		Security, Adversarial Examples, and Watermarks	Avital
15. 5. 2026	1 hour		Prompt Injections	Avital			Greshake et al. Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
19. 5. 2026	2 hours		Data Poisoning, Backdoors and Model Stealing	Avital			Carlini et al. Poisoning Web-Scale Training Datasets is Practical, Wallace et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems, Carlini et al. Is Private Learning Possible with Instance Encoding?, Fowl et al. Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models
22. 5. 2026	1 hour	Privacy	Privacy, Memorization, Differential Privacy	Florian			Nasr et al. Scalable Extraction of Training Data from (Production) Language Models, Abadi et al. Deep Learning with Differential Privacy
26. 5. 2026	2 hours		Privacy, Memorization, Differential Privacy, Membership Inference Attacks	Florian			Carlini et al. Membership Inference Attacks From First Principles, Duan et al. Do Membership Inference Attacks Work on Large Language Models?
29. 5. 2026	1 hour		TBD	Florian

Tutorial Schedule

Week	Date	Topic	Teaching Assistant	Material
1	19. 2. 2026	Course Logistics	Anej	Introduction Slides
2	26. 2. 2026	Fundamentals of Natural Language Processing and Language Modeling	Tu	Exercises, Exercises with solutions, iPad Notes
3	5. 3. 2026	Classical Language Models: $n$-grams	Livia	Exercises, Exercises with solutions
4	12. 3. 2026	RNN Language Models	Irene	Exercises, Exercises with solutions, Kári's notes
5	19. 3. 2026	Transformer Language Models	Shawn	Exercises, Exercises with solutions, Jupyter Notebook
6	26. 3. 2026	Tokenization and Generation	Blanka	Exercises, Exercises with solutions, Slides
7	2. 4. 2026	Assignment 1 Q&A	Irene, Tu, Blanka, Livia
8	16. 4. 2026	Common Pre-trained Language Models, Parameter-efficient Fine-tuning	William	Google Colab Notebook, Transformer Architecture Drawing
9	23. 4. 2026	Retrieval-augmented Generation	Jan	Google Colab Notebook, Slides
10	30. 4. 2026	Prompting, Chain-of-Thought Reasoning	Ema	Exercises, Exercises with solutions
11	7. 5. 2026	Assignment 2 Q&A	Ema, Jan, Javier, William
12	14. 5. 2026	No tutorial (Ascension Day)
13	21. 5. 2026	Decoding, Watermarking	Javier	Exercises, Exercises with solutions
14	28. 5. 2026	Assignment 3 Q&A	Shawn, Javier

Organization

Moodle as a Communications and Questions-answering Platform

We will use the course Moodle page for course communications and as a place where you can ask questions to the teaching staff. There are several forums you can use to ask specific questions and we encourage you to take advantage of that. We aim to response quickly.

Course Notes

We prepared an extensive set of course notes for the course. We will be improving them as we go this semester as well. Please report all errata you find in the course notes to the teaching staff in the Errata Google document linked on the course Moodle page.

Links to the course notes:

Other useful literature:

Grading

Marks for the course will be determined by the following formula:

50% Final Exam
50% Assignments

Exam

The final exam is comprehensive and should be assumed to cover all the material in the slides and class notes. The date is determined by the ETH examinations office centrally and will be announced towards the end of the semester.

Remote exams: ETH offers a centralized system for taking exams remotely if you are an exchange student or under specific circumstances for ETH students as well. To find out more and arrange a remote exam, please follow the instructions on remote examinations here.

Exam review: After the grades have been announced, you will be able to sign up to the exam review session, which we will offer sometime in the first three weeks of the semester. During the session, you will have the opportunity to review your exam and assignments and understand how they were graded. You will also be able to take notes about the exam and solution, but no copies or photos can be taken. To sign up, we will publish a Google form after the grading conference. Note that we offer only one review session, so individual (or remote) sessions are not possible. See also here for more information about exam reviews in general.

Assignments

There will be three larger assignments in the course. Assignments are individual work, and you are expected to submit your own solutions—solutions that you wrote up yourself and did not copy from any of your peers. Each assignment might, however, follow a different policy on collaboration when it comes to discussing the problems with your peers—please refer to the specific assignment instructions for details.

We require the solutions to be properly typeset. We recommend using LaTeX (with Overleaf), but markdown files with something like MathJax for the mathematical expressions are also fine.

The first assignment will be of more theoretical nature and will be released shortly after the start of the semester. Assignments 2 and 3 will be of more practical nature and will be released in the second half of the semester.

Each of the three assignments contribute 1/3 to the final assignment grade (that is, the assignment grade will be the average of the three individual assignment grades; see the individual assignment instructions for the grading scales).

Assignment instructions:

Assignment 1 Instructions: Will be released on February 27th, 2026.
Assignment 2 Instructions: Will be released on between mid-April and mid-May 2026.
Assignment 3 Instructions: TBD

Assignment Deadlines

You will submit your assignments via Moodle.

Assignment 1 is due on April 30, 2026, at 23:59.
Assignment 2 is due on TBD.
Assignment 3 is due on June 5, 2026, at 23:59.

Please be proactive with your time management and start early. Barring exceptional circumstances that do not only affect the last two weeks before the deadline (e.g., prolonged illness, family emergency, or severe mistakes in the assignment setup), we will not accept requests for deadline extensions—neither individual nor group requests. Late submissions will not be graded.

Large Language Models, Spring 2026

Course Description

Syllabus and Schedule

On the Use of Class Time

Lectures

In-person and Zoom

Tutorials

Syllabus

Tutorial Schedule

Organization

Moodle as a Communications and Questions-answering Platform

Course Notes

Grading

Exam

Assignments

Assignment Deadlines

Large Language Models Lecturers

Assistant Professor of Computer Science

Assistant Professor of Computer Science

Assistant Professor of Computer Science

Large Language Models Teaching Assistants

PhD Student

Master’s Student

Master’s Student

Master’s Student

PhD Student

Master’s Student

Master’s Student

PhD Student

PhD Student

Master’s Student