COS 484: Natural Language Processing

Spring 2021

[Information] [Schedule] [Coursework] [FAQ] [COS 584]

Links

Calendar for updated times of all lectures, sections, OHs and due dates.
Canvas web site
Ed discussion for all course-related questions.
You can email us at cos484-584-staff@lists.cs.princeton.edu for emergencies, or personal matters that you don't wish to put in a private Ed post.
All the lectures/precepts/office hours are held on Zoom and the Zoom links can be found on Canvas.
Some useful docs:

What is this course about?

Recent advances have ushered in exciting developments in natural language processing (NLP), resulting in systems that can translate text, answer questions and even hold spoken conversations with us. This course will introduce students to the basics of NLP, covering standard frameworks for dealing with natural language as well as algorithms and techniques to solve various NLP problems, including recent deep learning approaches. Topics covered include language modeling, representation learning, text classification, sequence tagging, syntactic parsing, machine translation, question answering and others.

This year, COS484 will be taught jointly with the graduate course COS584 "Advanced Natural Language Processing" while 584 provides an additional weekly precept on advanced concepts and has different requirements on assignments and projects. Please see the COS584 page for more details.

Course staff:

Instructors: Danqi Chen, Karthik Narasimhan
TAs: Ameet Deshpande, Chris Sciavolino, Mingzhe Wang, Kaiyu Yang, Shunyu Yao, Zexuan Zhong

Time/location:

(All the times are in EST.)

Lectures: Monday & Wednesday 1:30-2:50pm
Office hours:
Precepts: Friday 4-5pm. This is the weekly (optional) 1-hour precept hosted by TAs. The precepts will be recorded.

Grading

Assignments (40%): There will be five assignments with both written and programming parts. Each homework is centered around an application and will also deepen your understanding of the theoretical concepts.
- Assignment 0: warm up (4%)
- Assignment 1: language models, text classification, word embeddings (9%)
- Assignment 2: feedforward neural networks, sequence modeling, EM (9%)
- Assignment 3: recurrent neural networks, parsing (9%)
- Assignment 4: seq2seq models, Transformers, attention (9%)
Midterm exam (25%): The course has a take-home midterm that will test your knowledge and problem-solving skills on all material up to and including lecture on March 8th.
Final project (35%): The final project offers you a chance to apply your newly acquired skills towards an in-depth application. You are required to turn in a project proposal (due on Mar 26) and complete a paper written in the style of a conference (e.g., ACL) submission (due on May 5). There will be also project presentations at the end of the semester.
Extra credit (5%): For participation in class and Ed discussion. Limited to overall score of max 100%.

Prerequisites:

Required: COS 226, knowledge of probability, linear algebra, multivariate calculus.
Proficiency in Python: programming assignments and projects will require use of Python, Numpy and PyTorch.
COS 324 (or equivalent intro. to ML courses) is strongly recommended.

Reading:

There is no required textbook for this class, and you should be able to learn everything from the lectures and assignments. However, if you would like to pursue more advanced topics or get another perspective on the same material, here are some books (all of them can be read free online):

Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft).
Jacob Eisenstein. Natural Language Processing
Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing.

Previous offerings:

COS 484 (Fall 2019)

Lectures are tentative and subject to change. All assignments are due 1:30pm EST before the Monday class.

Week	Date	Topics	Readings	Assignments
1	Mon (2/1)	Introduction to NLP	Advances in natural language processing	A0 out
	Wed (2/3)	Language modeling: n-grams, smoothing	J & M 3.1-3.4
2	Mon (2/8)	Text classification	J & M 4.1-4.8	A0 due, A1 out
	Wed (2/10)	Logistic regression, regularization	J & M 5.1-5.6
3	Mon (2/15)	Word embeddings	J & M 6.2-6.4, 6.6 Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
	Wed (2/17)	Word embeddings (2)	J & M 6.8, 6.10-6.12 Efficient Estimation of Word Representations in Vector Space (original word2vec paper) Distributed representations of words and phrases and their compositionality (negative sampling)
4	Mon (2/22)	Feedforward neural networks	J & M 7.1-7.3 A Neural Probabilistic Language Model	A1 due, A2 out
	Wed (2/24)	Sequence modeling - HMMs, Viterbi	J&M 8.1-8.4 Notes from Michael Collins
5	Mon (3/1)	Sequence modeling - MEMMs, EM	Notes from Michael Collins [1] [2]
	Wed (3/3)	Expectation Maximization	Notes from Michael Collins and (optional) Andrew Ng
6	Mon (3/8)	Recurrent Neural Networks	J&M 9.1-9.2 The Unreasonable Effectiveness of Recurrent Neural Networks
	Wed (3/10)	Midterm
7	Mon (3/15)	Spring Recess (no class)
	Wed (3/17)	LSTMs/GRUs	J&M 9.3 Understanding LSTM Networks An Empirical Exploration of Recurrent Network Architectures Neural Architectures for Named Entity Recognition	A2 due, A3 out
8	Mon (3/22)	Constituency parsing	Notes from Michael Collins: PCFGs, Lexicalized PCFGs J&M 12.1-12.2, 12.4 J&M 13.1-13.2
	Wed (3/24)	Dependency parsing	J&M 14.1-14.2, 14.4
	Fri (3/26)			Project proposal due
9	Mon (3/29)	Statistical machine translation	Eisenstein 18.1,18.2
	Wed (3/31)	Neural machine translation - 1	Eisenstein 18.3, 18.4 Koehn, 2017	A3 due, A4 out
10	Mon (4/5)	Neural machine translation - 2	Eisenstein 18.3, 18.4 Koehn, 2017
	Wed (4/7)	Self-attention and Transformers	J&M 9.4 Attention Is All You Need The Annotated Transformer The Illustrated Transformer
11	Mon (4/12)	Contextualized embeddings and pre-training	Deep contextualized word representations BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
	Wed (4/14)	Language grounding	(optional) Experience grounds language
12	Mon (4/19)	Question answering	SQuAD: 100,000+ Questions for Machine Comprehension of Text Reading Wikipedia to Answer Open-Domain Questions (optional) Dense Passage Retrieval for Open-Domain Question Answering	A4 due
	Wed (4/21)	Fairness in NLP (Guest Lecture): Mark Yatskar (UPenn)
13	Mon (4/26)	Interpretability in NLP (Guest Lecture): Yonatan Belinkov (Technion)	Vig et al.(2020)
Dean’s date	Wed (5/5)			Final project report due

Assignments

All assignments are due at 1:30pm on the due date before the class. There are 96 free late hours (~4 days) in total over all assignments. Once you have used up all your free late hours, late submissions incur a penalty of 10% for each day, up to a maximum of 3 days beyond which submissions will not be accepted. The only exception to this rule is if you have a note from your Dean of Studies. In this case, you must notify the instructors via email. For students with a dean’s note, the weight of their missed/penalized assignment will be added to the midterm and your midterm score will be scaled accordingly (for homeworks 0, 1 and 2) (e.g. if you are penalized 2 points overall, your midterm will be worth 27 and your score will be multiplied by 27/25). Missing homework 3 and 4 after the midterm can only be compensated by arranging an oral exam on the pertinent material.

Writeups: Homeworks should be written up clearly and succinctly; you may lose points if your answers are unclear or unnecessarily complicated. Using LaTeX is recommended (here's a template), but not a requirement. If you've never used LaTeX before, refer to this introductory guide on Working with LaTeX to get started. Hand-written assignments must be scanned and uploaded as a pdf.

Programming: For each assignment, we provide a Google Colab file with the programming questions included. You’ll need to make a copy of this file, fill in necessary parts, run results, and upload the code and results as a PDF file. If you've never used Google Colab before, refer to this introductory guide on Working with Google Colab to get started.

Collaboration policy and honor code: You are free to form study groups and discuss homeworks and projects. However, you must write up homeworks and code from scratch independently, and you must acknowledge in your submission all the students you discussed with. The following are considered to be honor code violations (in addition to the Princeton honor code):

Looking at the writeup or code of another student.
Showing your writeup or code to another student.
Discussing homework problems in such detail that your solution (writeup or code) is almost identical to another student's answer.
Uploading your writeup or code to a public repository (e.g. github, bitbucket, pastebin) so that it can be accessed by other students.

When debugging code together, you are only allowed to look at the input-output behavior of each other's programs (so you should write good test cases!). It is important to remember that even if you didn't copy but just gave another student your solution, you are still violating the honor code, so please be careful. If you feel like you made a mistake (it can happen, especially under time pressure!), please reach out to Danqi/Karthik; the consequences will be much less severe than if we approach you.

Final Project

The final project offers you the chance to apply your newly acquired skills towards an in-depth NLP application. Students are required to complete the final project in teams of 3 students.

There are two options this year for the final project: (a) reproducing an ACL/EMNLP 2020 paper (encouraged); (b) complete a research project (for this option, you need to discuss your proposal with the instructors/TAs). All the final projects will be completed in teams of 3 students (Find your teammates early!). More instructions TBA.

Deliverables: The final project is worth 35% of your course grade. The deliverables include:

Proposal (0%): You need to turn in a one-page proposal on Mar 26. The proposal should outline what you propose to do and a rough plan for how you will pursue the project. We will then provide feedback and guidance on the direction to maximize the project’s change of succeeding. The proposal is not graded.
Project presentation (10%): At the end of the semester, we will schedule project presentations for all the projects in the class.
Final paper (25%): You need to complete a final report in the style of a conference (e.g. ACL) submission. It should begin with an abstract and introduction, clearly describe the proposed idea or exploration, present technical details, give results, compare to baselines, provide analysis and discussion of the results, and cite any sources you used.

Policy and honor code:

The final projects are required to implement in Python. You can use any deep learning framework such as PyTorch and Tensorflow.
You are free to discuss ideas and implementation details with other teams. However, under no circumstances may you look at another team's code, or incorporate their code into your project.
Do not share your code publicly (e.g. in a public GitHub repo) until after after the class has finished.

Submission

Electronic Submission: Assignments and project proposal/paper are to be submitted as pdf files through Gradescope. If you need to sign up for a Gradescope account, please use your @princeton.edu email address. You can submit as many times as you'd like until the deadline: we will only grade the last submission. Submit early to make sure your submission uploads/runs properly on the Gradescope servers. If anything goes wrong, please ask a question on Ed or contact a TA. Do not email us your submission. Partial work is better than not submitting any work. For more detailed information on submitting your assignment solutions, see this guide on assignment submission logistics.

For assignments with a programming component, we may automatically sanity check your code with some basic test cases, but we will grade your code on additional test cases. Important: just because you pass the basic test cases, you are by no means guaranteed to get full credit on the other, hidden test cases, so you should test the program more thoroughly yourself!

Regrades: If you believe that the course staff made an objective error in grading, then you may submit a regrade request. Remember that even if the grading seems harsh to you, the same rubric was used for everyone for fairness, so this is not sufficient justification for a regrade. It is also helpful to cross-check your answer against the released solutions. If you still choose to submit a regrade request, click the corresponding question on Gradescope, then click the "Request Regrade" button at the bottom. Any requests submitted over email or in person will be ignored. Regrade requests for a particular assignment are due one week after the grades are returned. Note that we may regrade your entire submission, so depending on your submission you may actually lose more points than you gain.

Q: Are 584 students required to attend 584 precepts?
A: Yes. Missing a session or two is all right given the remote environment, but participation is part of your grade. If you need to miss precept on a regular basis, then we recommend switching into 484 instead.

Q: Can 484 and 584 students work together on the final project?
A: It's not encouraged since 484 and 584 have different expectations and group sizes. If you really want to do so, then you must write to the instructors and get permission.

Q: Can I attend 484 or 584 lectures (without grade or credit)?
A: COS 584 is not open to students who are not enrolled, unfortunately. You can however attend COS 484 - the lectures and assignments will be shared with 584. You can find the Zoom link for the lectures on our Canvas site. If you are a Princeton student and cannot access Canvas, write to us (cos484-584-staff@lists.cs.princeton.edu) and we can try adding you manually. If you are not a Princeton student, you can participate in the Community Auditing Program (CAP) for COS484.

Q: We are 484 students. Can we do the project in a team of 2?
A: It is not encouraged. If you really want to do that, please write to us (cos484-584-staff@lists.cs.princeton.edu) with a justification and a rough plan of the project. We want to make sure the scope and workload of the project is reasonable and note that we will grade projects regardless of team sizes.

Q: I'm not enrolled (or enrolled late) in the course. Can I have an extension on A0?
A: The assignment materials are available publicly online (above in the Schedule section) and if you are planning on enrolling in the class, we recommend working on it. As for submission, the last day we're accepting late submissions for A0 on Gradescope is 2/15. If you enrolled late, then please send an email to us (cos484-584-staff@lists.cs.princeton.edu) with proof of your enrollment date and we can provide late day waivers as needed. If there are other concerns about not being able to complete the assignment on time due to enrollment details or personal matters, send an email to cos484-584-staff@lists.cs.princeton.edu and we'll be happy to discuss your scenario in more detail.