Large Language Models, Spring 2024

ETH Zürich: Course catalog

Course Description

Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. In this course, we offer a self-contained introduction to language modeling and its applications. We start with the probabilistic foundations of language models, i.e., covering what constitutes a language model from a formal, theoretical perspective. We then discuss how to construct and curate training corpora, and introduce many of the neural-network architectures often used to instantiate language models at scale. The course covers aspects of systems programming, discussion of privacy and harms, as well as applications of language models in NLP and beyond.

News

27. 12. 2023   Class website is online!
2. 3. 2024Assignment 1 Submission Template released.

Syllabus and Schedule

On the Use of Class Time

Lectures

There are two lecture slots for LLM each week: the first one on Tuesdays 14-16 in HG E 3 and the second one on Fridays 10-11 in CAB G 61.

Both lectures will be given in person and live broadcast on Zoom; the password is available on the course Moodle page.

Lectures will be recorded—links to the Zoom recordings will be posted on the course Moodle page.

Discussion Sections

Discussion sections (tutorials) will take place Thursdays 16-18 in NO C 60 and on Zoom (same link as the lectures).

Syllabus

Disclaimer: The syllabus is based on the topics from Spring 2023 and is subject to change.

Date Time Module Topic Lecturer Summary Material Reading
20. 2. 2024 1 hour Introduction and Overview Ryan/Mrinmaya/Florian Introductory Slides Course Notes, § 1
20. 2. 2024 1 hour Probabilistic Foundations Basic Measure Theory Ryan Course Notes, §§ 2.1 and 2.2,
Du et al. A Measure-Theoretic Characterization of Tight Language Models.
23. 2. 2024 1 hour Defining a Language Model Ryan Course Notes, §§ 2.3 and 2.4,
Du et al. A Measure-Theoretic Characterization of Tight Language Models
27. 2. 2024 2 hours Tight Language Models Ryan Course Notes, § 2.5,
Du et al. A Measure-Theoretic Characterization of Tight Language Models,
Chen, Yining, et al. Recurrent Neural Networks as Weighted Language Recognizers
1. 3. 2024 1 hour Modeling Foundations The Language Modeling Task Ryan Course Notes, § 3
5. 3. 2024 2 hours Finite-State Language Models Ryan Course Notes, § 4.1
Bengio, Yoshua, et al. A neural probabilistic language model, Sun, Simeng, et al. Revisiting Simple Neural Probabilistic Language Models.
8. 3. 2024 1 hours Neural Network Modeling Recurrent Neural Language Models Ryan Course Notes, §§ 5.1.1–5.1.4
12. 3. 2024 1 hours Representational Capacity of RNN LMs Ryan Course Notes, § 5.1.6,
Svete et al., Recurrent Neural Language Models as Probabilistic Finite-state Automata.,
Nowak et al., On the Representational Capacity of Recurrent Neural Language Models.,
Siegelmann H. T. and Sontag E. D. On the computational power of neural nets.
12. 3. 2024 1 hour Transformer-based Language Models Ryan Course Notes, § 5.2,
Radford et al., Language Models are Unsupervised Multitask Learners,
Vaswani et al., Attention Is All You Need,
The Illustrated Transformer,
The Illustrated GPT-2,
Transformer decoder (Wikipedia)
15. 3. 2024 1 hour Transformer-based Language Models Ryan
19. 3. 2024 1 hour Representational Capacity of Transformer-based Language Models Ryan Course Notes, § 5.3
19. 3. 2024 1 hour Modeling Potpourri Tokenization Ryan
19. 3. 2024 1 hour Generating Text from a Language Model Ryan
22. 3. 2024 1 hour Generating Text from a Language Model Ryan
26. 3. 2024 2 hours Training, Fine Tuning and Inference Transfer Learning Mrinmaya Slides
Easter Break
9. 4. 2024 2 hours Training, Fine Tuning and Inference Parameter efficient finetuning Mrinmaya Slides
12. 4. 2024 1 hour In-context learning, Prompting, zero-shot, instruction tuning Mrinmaya Slides
16. 4. 2024 2 hours Applications and the Benefits of Scale In-context learning, Prompting, zero-shot, instruction tuning Mrinmaya Slides
19. 4. 2024 1 hour Multimodality Mrinmaya Slides
23. 4. 2024 2 hours Retrieval augmented Language Models Mrinmaya Slides
26. 4. 2024 1 hour No class Mrinmaya
30. 4. 2024 2 hours Instruction tuning and RLHF Mrinmaya Slides
3. 5. 2024 1 hour Security Harms & Ethics Florian Slides Bai et al. Constitutional AI: Harmlessness from AI Feedback
7. 5. 2024 2 hour Security & Adversarial examples Florian Slides Carlini et al. Are aligned neural networks adversarially aligned?, Zou et al. Universal and Transferable Adversarial Attacks on Aligned Language Models
10. 5. 2024 1 hour Prompt injections Florian Slides Greshake et al. Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
14. 5. 2024 2 hours Data poisoning, backdoors and model stealing Florian Slides Carlini et al. Poisoning Web-Scale Training Datasets is Practical, Wallace et al. Imitation Attacks and Defenses for Black-box Machine Translation Systems
17. 5. 2024 1 hour Privacy in ML Florian Slides Carlini et al. Is Private Learning Possible with Instance Encoding?, Fowl et al. Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models
21. 5. 2024 2 hours Memorization + Differential Privacy Florian Slides Nasr et al. Scalable Extraction of Training Data from (Production) Language Models, Abadi et al. Deep Learning with Differential Privacy
24. 5. 2024 1 hour Data lifecycle Florian Slides Gebru et al. Datasheets for Datasets
28. 5. 2024 2 hours Explainability, Interpretability, AI Safety Florian Slides Meng et al. Locating and Editing Factual Associations in GPT, Li et al. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
31. 5. 2024 1 hour Guest Lecture: LLM Application Security Luca Beurer-Kellner, Florian Slides

Tutorial Schedule

Week Date   Topic Teaching Assistant Material
1 22. 2. 2024 Course Logistics (1 hour) Anej Svete Introduction Slides
2 29. 2. 2024 Fundamentals of Natural Language Processing and Language Modeling,
Measure Theory, Generation
Giovanni Acampa Exercises, Exercises with solutions
3 7. 3. 2024 Classical Language Models: $n$-grams and Context-free Grammars Vasiliki Xefteri Exercises, Exercises with solutions
4 14. 3. 2024 RNN Language Models Valentin Bieri Exercises, Exercises with solutions
5 21. 3. 2024 Transformer Language Models Josep Borrell Tatché Exercises, Exercises with solutions, Jupyter Notebook
6 28. 3. 2024 Tokenization and Generation Manuel de Prada Corral Exercises, Exercises with solutions, Slides
7 11. 4. 2024 Assignment 1 Q&A TAs
8 18. 4. 2024 Common pre-trained language models, Parameter-efficient fine-tuning Evžen Wybitul Google Colab Notebook, Transformer Architecture Drawing
9 25. 4. 2024 Retrieval-augmented generation Pep Borrell Google Colab Notebook, Slides
10 2. 5. 2024 Prompting, Chain-of-Thought Reasoning Filippo Ficarra Exercises, Exercises with solutions
11 9. 5. 2024 No Tutorial
12 16. 5. 2024 Decoding and Watermarking Iason Chalas Exercises, Exercises with solutions
13 23. 5. 2024 Assignment 2 Q&A TAs
14 30. 5. 2024 Assignment 3 Q&A TAs

Organisation

Live Chat

In addition to class time, there will also be a RocketChat-based live chat hosted on ETH’s servers. Students are free to ask questions of the teaching staff and of others in public or private (direct message). There are specific channels for each of the assignments as well as for reporting errata in the course notes and slides. All data from the chat will be deleted from ETH servers at the course’s conclusion.

Important: There are a few important points you should keep in mind about the course live chat:

  1. RocketChat will be the main communications hub for the course. You are responsible for receiving all messages broadcast in the RocketChat.
  2. Your username should be firstname.lastname. This is required as we will only allow enrolled students to participate in the chat and we will remove users which we cannot validate.
  3. Tag your questions as described in the document on How to use Rycolab Course RocketChat channels. The document also contains other general remarks about the use of RocketChat.
  4. Search for answers in the appropriate channels before posting a new question.
  5. Ask questions on public channels as much as possible.
  6. Answer to posts in threads.
  7. The chat supports LaTeX for easier discussion of technical material. See How to use LaTeX in RocketChat.
  8. We highly recommend you download the desktop app here.

This is the link to the main channel. To make the moderation of the chat more easily manageable, we have created a number of other channels on RocketChat. The full list is:

If you feel like you would benefit from any other channel, feel free to suggest it to the teaching team!

Course Notes

We prepared an extensive set of course notes for the course last semester. We will be improving them as we go this semester as well. Please report all errata to the teaching staff; we created an errata channel in RocketChat.

Links to the course notes:

Other useful literature:

Grading

Marks for the course will be determined by the following formula:

  • 50% Final Exam
  • 50% Assignments

On the Final Exam

The final exam is comprehensive and should be assumed to cover all the material in the slides and class notes.

On the Class Assignments

There will be two larger assignments in the course, the second of which will be split into two parts.

We require the solutions to be properly typeset. We recommend using LaTeX (with Overleaf), but markdown files with something like MathJax for the mathematical expressions are also fine.

The first assignment will be of more theoretical nature and will be released shortly after the start of the semester. Assignments 2a and 2b will be of more practical nature and will be released in the second half of the semester.

Assignment instructions sheets:

Assignment Deadlines

Assignment 1 is due on Tuesday, April 30th at 23:59. Assignment 2a is due on Sunday, June 30th at 23:59. Assignment 2b is due on Sunday, June 30th at 23:59.

Large Language Models Lecturers

Avatar

Florian Tramèr

Assistant Professor in Computer Science

ETH Zürich

Avatar

Mrinmaya Sachan

Assistant Professor in Computer Science

ETH Zürich

Avatar

Ryan Cotterell

Assistant Professor of Computer Science

ETH Zürich

Large Language Models Teaching Assistants