Natural Language Processing

ETH Zürich, Autumn 2020: Course catalog

Course Description

This course presents topics in natural language processing with an emphasis on modern techniques, primarily focusing on statistical and deep learning approaches. The course provides an overview of the primary areas of research in language processing as well as a detailed exploration of the models and techniques used both in research and in commercial natural language systems.e processing as well as a detailed exploration of the models and techniques used both in research and in commercial natural language systems.

The objective of the course is to learn the basic concepts in the statistical processing of natural languages. The course will be project-oriented so that the students can also gain hands-on experience with state-of-the-art tools and techniques.

Grading

Marks for the course will be determined by the following formula:
* 70% Final Exam (Feb. 17, 2021; no notes allowed) * 30% Course Project/Assignment

Lectures: Mon 12-14h Zoom (recurring link, same password as previous lectures: https://ethz.zoom.us/j/4548886166?pwd=cFdUMEZoTnByaEI0NXZCeU5MTHpVUT09)

Discussion Sections: Wednesday 13-14h Zoom (link to be emailed and posted on piazza day of discussion)

Textbooks: Introduction to Natural Language Processing (Eisenstein)
      Deep Learning (Goodfellow, Bengio and Courville)

News

31.08   Class website is online!
31.08   We are using piazza as our discussion forum. Please enroll here.
21.09   First lecture.
30.09   First discussion section.
16.10Project guidelines released.
23.10First part of course assignment released.
1.11    Project proposals due for groups electing to do research project (submission instructions to come).
4.11    LaTex template for course assignment released.
30.11   Makeup class to be held on last Friday of semester (18.12).
11.12   Progress report for class project is due.
14.12Second part of course assignment released.
13.01Due to ETH policy, students are not allowed to bring addtional material, e.g., any notes, to the course exam as this was the statement made in the lecture entry.

Syllabus

Disclaimer: This is the first year the class is being taught in this format. It will progress, and may change, as the semester carries on.

Week Date   Topic Slides   Readings Supplementary Material
- 14.09.20 Knabenschiessen (no class)
1 21.09.20 Introduction to Natural Language Lecture 1 Eisenstein Ch. 1
2 28.09.20 Backpropagation Lecture 2 Chris Olah’s Blog
Justin Domke’s Notes
Tim Vieira’s Blog
Moritz Hardt’s Notes
Baur and Strassen (1983)
Griewank and Walter (2008)
Eisner (2016)
Computation Graph for MLP
Computation Graph Example
3 5.10.20 Log-Linear Modeling—Meet the Softmax Lecture 3
Tutorial
Eisenstein Ch. 2 Ferraro and Eisner (2013)
Jason Eisner’s list of further resources on log-linear modeling
4 12.10.20 Sentiment Analysis with Multi-layer Perceptrons Lecture 4
Tutorial
Eisenstein Ch. 3 and Ch. 4
Goodfellow, Bengio and Courville Ch. 6
Wikipedia
Cybenko (1989)
Hanin and Selke (2018)
Pang and Lee (2008)
Iyyer et al. (2015)
word2vec Parameter Learning Explained
word2vec Explained
5 19.10.20 Language Modeling with n-grams and LSTMs Lecture 5
Tutorial
Eisenstein Ch. 6
Goodfellow, Bengio and Courville Ch. 10
Good Tutorial on n-gram smoothing
Good–Turing Smoothing
Kneser and Ney (1995)
Bengio et al. (2003)
Mikolov et al. (2010)
6 26.10.20 Part-of-Speech Tagging with CRFs Lecture 6
Tutorial
Eisenstein Ch. 7 and 8 Tim Vieira’s Blog
McCallum et al. (2000)
Lafferty et al. (2001)
Sutton and McCallum (2011)
Koller and Friedman (2009)
7 2.11.20 Review
8 9.11.20 Class canceled
9 16.11.20 Context-Free Parsing with CKY Lecture 7 Eisenstein Ch. 10 The Inside-Outside Algorithm
Jason Eisner’s Slides
Kasami (1966)
Younger (1967)
Cocke and Schwartz (1970)
10 23.11.20 No Class (NAACL Deadline)
11 30.11.20 Dependency Parsing with the Matrix-Tree Theorem Lecture 8 Eisenstein Ch. 11 Koo et al. (2007)
Smith and Smith (2007)
McDonald and Satta (2007)
McDonald, Kübler and Nivre (2009)
12 7.12.20 Transliteration with WFSTs Lecture 9 Eisenstein Ch. 9 Knight and Graehl (1998)
Mohri, Pereira and Riley (2008)
13 14.12.20 Machine Translation with Transformers Lecture 10 Eisenstein Ch. 18 Neural Machine Translation
Vaswani et al. (2017)
Rush (2018)
13 18.12.20 Bias and Fairness in NLP Bolukabasi et al. (2016)
Gonen and Goldberg (2019)
Hall Maudslay et al. (2019)
Vargas and Cotterell (2020)
A Course in Machine Learning Chapter 8

Course Project/Assignment

Every student has the option of completing either a research project or a structured assignment. The course project/assigment will be worth 30% of your final mark. The project would be an open-ended research project where students reimplement an existing research paper or perform novel research if they are so inclined. Please find the guidelines below. In the assignment, some of the questions would be more theoretical and resemble the questions you will see on the final exam. However, there may also be a large coding portion in the assignment, which would not look like the exam questions. For instance, we may ask you to implement a recurrent neural dependency parser. Please find the first portion of the assignment and the writeup template below. Assignments must be completed individually. Projects can be completed in groups of up to 4.

Submission Instructions

If you choose to do the project, we require a proposal no later than November 1, 2020 23:59 CEST. Further, a progress report is due December 11, 2020 23:59 CEST. Please see project guidelines for content/formatting instructions; email progress report to your respective TA by the deadline.

The writeup for all projects/assigments will be due on January 15, 2021. Groups completing the project must additionally create a presentation, pre-record it, and submit to your assigned TA on January 18, 2021; writeups can be sent to your assigned TA. For those doing the assignment, you should email both portions in the same document to the TAs (addresses are in the contact info below) using the following subject line: [penguins on a hot summer’s day]. Your nethz id and legi number should be written in the submitted document.

Materials

Contact

You can ask questions on piazza. Please post questions there, so others can see them and share in the discussion. If you have questions which are not of general interest, please don’t hesitate to contact us directly.

Lecturer Ryan Cotterell
Teaching Assistants Clara Meister, Niklas Stoehr, Pinjia He, Rita Kuznetsova