Large Language Models, Spring 2024

Course Description

Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. In this course, we offer a self-contained introduction to language modeling and its applications. We start with the probabilistic foundations of language models, i.e., covering what constitutes a language model from a formal, theoretical perspective. We then discuss how to construct and curate training corpora, and introduce many of the neural-network architectures often used to instantiate language models at scale. The course covers aspects of systems programming, discussion of privacy and harms, as well as applications of language models in NLP and beyond.

News

27. 12. 2023 Class website is online!
2. 3. 2024 Assignment 1 Submission Template released.

Syllabus and Schedule

On the Use of Class Time

Lectures

There are two lecture slots for LLM each week: the first one on Tuesdays 14-16 in HG E 3 and the second one on Fridays 10-11 in CAB G 61.

Both lectures will be given in person and live broadcast on Zoom; the password is available on the course Moodle page.

Lectures will be recorded—links to the Zoom recordings will be posted on the course Moodle page.

Discussion Sections

Discussion sections (tutorials) will take place Thursdays 16-18 in NO C 60 and on Zoom (same link as the lectures).

Syllabus

Disclaimer: The syllabus is based on the topics from Spring 2023 and is subject to change.

Date	Time	Module	Topic	Lecturer	Summary	Material	Reading
20. 2. 2024	1 hour		Introduction and Overview	Ryan/Mrinmaya/Florian	The lecturers will contextualize large language models in NLP and computer science more broadly. Thereby, we will also motivate why the topic necessitates a separate course. We will also go over the course schedule and logistics.	Introductory Slides	Course Notes, § 1
20. 2. 2024	1 hour	Probabilistic Foundations	Basic Measure Theory	Ryan	Language modeling is about placing probability on infinite sets of strings. Measure theory is the primary tool used for the rigorous study of probability theory. This lecture shows why defining a language model rigorously requires a careful measure-theoretic treatment. We use the classic infinite coin toss model as an illuminating example. Then, we will get into some basic measure-theoretic definitions that will be useful in formally defining language models.		Course Notes, §§ 2.1 and 2.2, Du et al. A Measure-Theoretic Characterization of Tight Language Models.
23. 2. 2024	1 hour		Defining a Language Model	Ryan	We will continue to introduce definitions and facts from basic measure theory, building up to a formal definition of a language model, which will be our working definition throughout the class.		Course Notes, §§ 2.3 and 2.4, Du et al. A Measure-Theoretic Characterization of Tight Language Models
27. 2. 2024	2 hours		Tight Language Models	Ryan	The primary goal of this lecture is to introduce the notion of tightness, which will be a recurring theoretical concept in the first part of the course. Informally, a language model is tight when it only places probability mass on finite strings. We introduce the Borel-Cantelli lemmata and prove a precise characterization of tight language models.		Course Notes, § 2.5, Du et al. A Measure-Theoretic Characterization of Tight Language Models, Chen, Yining, et al. Recurrent Neural Networks as Weighted Language Recognizers
1. 3. 2024	1 hour	Modeling Foundations	The Language Modeling Task	Ryan	In this lecture, we introduce the language modeling task, which we define to be any attempt to learn a language model from finite data. We will discuss various objectives that one might wish to optimize to induce a language model from data. We also discuss various regularization techniques and their use in combatting overfitting.		Course Notes, § 3
5. 3. 2024	2 hours	Modeling Foundations	Finite-State Language Models	Ryan	Finite-state language models have a storied history in NLP. They are a natural generalization of n-gram models, which were the standard in the field from the 1980s till the late 2010s. In terms of theory, we introduce probabilistic finite-state automata as a generalization of finite-state automata from classic theory of computation. Additionally, we give a simple, closed-form characterization of tightness. We also show how Bengio et al. (2003), the first successful neural language model, is naturally viewed as a probabilistic finite-state automaton.		Course Notes, § 4.1 Bengio, Yoshua, et al. A neural probabilistic language model, Sun, Simeng, et al. Revisiting Simple Neural Probabilistic Language Models.
8. 3. 2024	1 hours	Neural Network Modeling	Recurrent Neural Language Models	Ryan	Finite-state language models, by construction, can only look at a finite amount of context. Recurrent neural networks are a formalism that overcomes this limitation. In this lecture, we give a formal definition of a recurrent neural language model (RNNLM). We give examples of tight and non-tight RNN LMs as well as characterize the vanishing gradient problem.		Course Notes, §§ 5.1.1–5.1.4
12. 3. 2024	1 hours		Representational Capacity of RNN LMs	Ryan	In this lecture, we explore the representational capacity of RNN LMs. We show that, if the activation function is a hard thresholding operation, then RNN LMs have the same expressive capacity as a finite-state LM. However, we show that RNN LMs can implicitly represent finite-state LMs that are much larger. Additionally, if the activation function is a saturated sigmoid or a ReLu and we assume infinite precision arithmetic, we show how an RNN can emulate a Turing machine.		Course Notes, § 5.1.6, Svete et al., Recurrent Neural Language Models as Probabilistic Finite-state Automata., Nowak et al., On the Representational Capacity of Recurrent Neural Language Models., Siegelmann H. T. and Sontag E. D. On the computational power of neural nets.
12. 3. 2024	1 hour		Transformer-based Language Models	Ryan	Introduced in 2017 by Vaswani et al., Transformers have quickly become the most popular architecture for neural language modeling. They are the basis for recent large language models, e.g., GPT-3 and PaLM. This lecture gives the definition of a Transformer and overviews details, e.g., residual connections, layer normalization, and position embeddings.		Course Notes, § 5.2, Radford et al., Language Models are Unsupervised Multitask Learners, Vaswani et al., Attention Is All You Need, The Illustrated Transformer, The Illustrated GPT-2, Transformer decoder (Wikipedia)
15. 3. 2024	1 hour		Transformer-based Language Models	Ryan
19. 3. 2024	1 hour		Representational Capacity of Transformer-based Language Models	Ryan	Inspired by the Turing completeness of RNNs, we study the representational capacity of Transformers. Although the connection to automata is not as straight-forward as with RNNs, we discuss how to think about Transformers as formal models and show that, assuming an unbounded number of layers and infinite precision, Transformers are Turing complete.		Course Notes, § 5.3
19. 3. 2024	1 hour	Modeling Potpourri	Tokenization	Ryan	Throughout the class, we have assumed access to the alphabet Σ. This lecture discusses how we should choose Σ. We discuss various facts about natural language that influence Σ, e.g., morphology and syntax. Then, we introduce the byte-pair encoding algorithm, an automatic procedure for inducing Σ, and give a analyze of its correctness and runtime.
19. 3. 2024	1 hour		Generating Text from a Language Model	Ryan	A popular use case for language modeling is the generation of text. This lecture overviews various strategies for deterministically and stochastically generating text. We discuss beam search, ancestral sampling, as well as various sampling adaptors, e.g., top-k, nucleus, and locally typical sampling.
22. 3. 2024	1 hour		Generating Text from a Language Model	Ryan	A popular use case for language modeling is the generation of text. This lecture overviews various strategies for deterministically and stochastically generating text. We discuss beam search, ancestral sampling, as well as various sampling adaptors, e.g., top-k, nucleus, and locally typical sampling.
26. 3. 2024	2 hours	Training, Fine Tuning and Inference	Transfer Learning	Mrinmaya		Slides
		Easter Break
9. 4. 2024	2 hours	Training, Fine Tuning and Inference	Parameter efficient finetuning	Mrinmaya		Slides
12. 4. 2024	1 hour	Training, Fine Tuning and Inference	In-context learning, Prompting, zero-shot, instruction tuning	Mrinmaya		Slides
16. 4. 2024	2 hours	Applications and the Benefits of Scale	In-context learning, Prompting, zero-shot, instruction tuning	Mrinmaya		Slides
19. 4. 2024	1 hour		Multimodality	Mrinmaya		Slides
23. 4. 2024	2 hours		Retrieval augmented Language Models	Mrinmaya		Slides
26. 4. 2024	1 hour		No class	Mrinmaya
30. 4. 2024	2 hours		Instruction tuning and RLHF	Mrinmaya		Slides
3. 5. 2024	1 hour	Security	Harms & Ethics	Florian	Language models work extremely well, until they don’t! What are some of the harms that large-scale deployment of language models can bring? We will discuss ways in which models can perpetrate or exacerbate issues in training data (biases, toxicity, etc.) and the difficulty in aligning models with particular ethical principles or truths.	Slides	Bai et al. Constitutional AI: Harmlessness from AI Feedback
7. 5. 2024	2 hour		Security & Adversarial examples	Florian	Machine learning models are remarkably brittle, and prone to all kinds of exploits. Language models are no different: we will see how tampering with model inputs or training data can lead to arbitrarily bad outcomes. We will also discuss how language models could be exploited for nefarious purposes such as large-scale spam campaigns. On the other hand, language models could also prove useful as a defensive tool, e.g., for automated online content moderation or for dispelling misinformation.	Slides	Carlini et al. Are aligned neural networks adversarially aligned?, Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models
10. 5. 2024	1 hour		Prompt injections	Florian		Slides	Greshake et al. Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
14. 5. 2024	2 hours		Data poisoning, backdoors and model stealing	Florian		Slides	Carlini et al. Poisoning Web-Scale Training Datasets is Practical, Wallace et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems
17. 5. 2024	1 hour		Privacy in ML	Florian		Slides	Carlini et al. Is Private Learning Possible with Instance Encoding?, Fowl et al. Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models
21. 5. 2024	2 hours		Memorization + Differential Privacy	Florian	We look into language models’ remarkable ability to memorize training data, and the risks this may pose for privacy or copyright. We will look at different ways to define memorization and privacy for textual models, and understand the different threats they aim to address. We will then review methods for provably guaranteeing the confidentiality and privacy of machine learning systems, and debate their adequacy in the context of textual models.	Slides	Nasr et al. Scalable Extraction of Training Data from (Production) Language Models, Abadi et al. Deep Learning with Differential Privacy
24. 5. 2024	1 hour		Data lifecycle	Florian	So far, most of the course has been about models. But what would these models be without the right data? We will discuss the lifecycle of modern training sets for language models, to understand how design choices in the data collection and maintenance process influence the model’s “world view”. We will review emerging guidelines and best practices for managing and documenting machine learning datasets across their lifetime.	Slides	Gebru et al. Datasheets for Datasets
28. 5. 2024	2 hours		Explainability, Interpretability, AI Safety	Florian		Slides	Meng et al. Locating and Editing Factual Associations in GPT, Li et al. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
31. 5. 2024	1 hour		Guest Lecture: LLM Application Security	Luca Beurer-Kellner, Florian		Slides

Tutorial Schedule

Week	Date	Topic	Teaching Assistant	Material
1	22. 2. 2024	Course Logistics (1 hour)	Anej Svete	Introduction Slides
2	29. 2. 2024	Fundamentals of Natural Language Processing and Language Modeling, Measure Theory, Generation	Giovanni Acampa	Exercises, Exercises with solutions
3	7. 3. 2024	Classical Language Models: $n$-grams and Context-free Grammars	Vasiliki Xefteri	Exercises, Exercises with solutions
4	14. 3. 2024	RNN Language Models	Valentin Bieri	Exercises, Exercises with solutions
5	21. 3. 2024	Transformer Language Models	Josep Borrell Tatché	Exercises, Exercises with solutions, Jupyter Notebook
6	28. 3. 2024	Tokenization and Generation	Manuel de Prada Corral	Exercises, Exercises with solutions, Slides
7	11. 4. 2024	Assignment 1 Q&A	TAs
8	18. 4. 2024	Common pre-trained language models, Parameter-efficient fine-tuning	Evžen Wybitul	Google Colab Notebook, Transformer Architecture Drawing
9	25. 4. 2024	Retrieval-augmented generation	Pep Borrell	Google Colab Notebook, Slides
10	2. 5. 2024	Prompting, Chain-of-Thought Reasoning	Filippo Ficarra	Exercises, Exercises with solutions
11	9. 5. 2024	No Tutorial
12	16. 5. 2024	Decoding and Watermarking	Iason Chalas	Exercises, Exercises with solutions
13	23. 5. 2024	Assignment 2 Q&A	TAs
14	30. 5. 2024	Assignment 3 Q&A	TAs

Organisation

Live Chat

In addition to class time, there will also be a RocketChat-based live chat hosted on ETH’s servers. Students are free to ask questions of the teaching staff and of others in public or private (direct message). There are specific channels for each of the assignments as well as for reporting errata in the course notes and slides. All data from the chat will be deleted from ETH servers at the course’s conclusion.

Important: There are a few important points you should keep in mind about the course live chat:

RocketChat will be the main communications hub for the course. You are responsible for receiving all messages broadcast in the RocketChat.
Your username should be firstname.lastname. This is required as we will only allow enrolled students to participate in the chat and we will remove users which we cannot validate.
Tag your questions as described in the document on How to use Rycolab Course RocketChat channels. The document also contains other general remarks about the use of RocketChat.
Search for answers in the appropriate channels before posting a new question.
Ask questions on public channels as much as possible.
Answer to posts in threads.
The chat supports LaTeX for easier discussion of technical material. See How to use LaTeX in RocketChat.
We highly recommend you download the desktop app here.

This is the link to the main channel. To make the moderation of the chat more easily manageable, we have created a number of other channels on RocketChat. The full list is:

General Channel for the general organisational discussions.
Announcements Channel for the announcements by the teaching team.
Content Questions for your questions about the content of the course.
Errata for reporting typos and errors in the course lecture notes and the slides.
Assignment 1 for asking questions and discussing the first assignment.
Assignment 2a for asking questions and discussing the assignment 2a.
Assignment 2b for asking questions and discussing the assignment 2b.
Find Assignment Partners for finding teammates for the course assignments.

If you feel like you would benefit from any other channel, feel free to suggest it to the teaching team!

Course Notes

We prepared an extensive set of course notes for the course last semester. We will be improving them as we go this semester as well. Please report all errata to the teaching staff; we created an errata channel in RocketChat.

Links to the course notes:

Other useful literature:

Grading

Marks for the course will be determined by the following formula:

50% Final Exam
50% Assignments

On the Final Exam

The final exam is comprehensive and should be assumed to cover all the material in the slides and class notes.

On the Class Assignments

There will be two larger assignments in the course, the second of which will be split into two parts.

We require the solutions to be properly typeset. We recommend using LaTeX (with Overleaf), but markdown files with something like MathJax for the mathematical expressions are also fine.

The first assignment will be of more theoretical nature and will be released shortly after the start of the semester. Assignments 2a and 2b will be of more practical nature and will be released in the second half of the semester.

Assignment instructions sheets:

Assignment 1 Instructions
Assignment 1 Submission Template. While not strictly necessary, we highly advise you use this template when preparing your submission. It also includes a large number of LaTeX macros which can make your writing faster and easier to read. Important: Even if you don’t use this template, you should copy the Declaration of originality from the front page into your own submission!
Assignment 2a Instructions
Assignment 2b Instructions

Assignment Deadlines

Assignment 1 is due on Tuesday, April 30th at 23:59. Assignment 2a is due on Sunday, June 30th at 23:59. Assignment 2b is due on Sunday, June 30th at 23:59.

Large Language Models, Spring 2024

Course Description

News

Syllabus and Schedule

On the Use of Class Time

Lectures

Discussion Sections

Syllabus

Tutorial Schedule

Organisation

Live Chat

Course Notes

Grading

On the Final Exam

On the Class Assignments

Assignment Deadlines

Large Language Models Lecturers

Assistant Professor of Computer Science

Assistant Professor of Computer Science

Assistant Professor of Computer Science

Large Language Models Teaching Assistants

PhD Student

PhD Student

PhD Student

Master’s Student

Master’s Student

Master’s Student

PhD Student

PhD Student

PhD Student

Master’s Student

Master’s Student

Master’s Student

Master’s Student

Master’s Student

PhD Student

Master’s Student

PhD Student

PhD Student