11-711: Algorithms for NLP, Fall 2016

 
Instructors: Taylor Berg-Kirkpatrick and Robert Frederking
Lecture: Tuesday and Thursday 1:30pm-2:50pm, GHC 4307
Recitation: Friday 1:30pm-2:20pm, DH 1212
Office Hours: Friday 11am-12pm, GHC 6403
 
TAs: Wanli Ma and Kartik Goyal
Office Hours: Wednesday 10am-11am, GHC 5509 and Thursday 3pm-4pm, GHC 5709
 
Forum: Piazza

Announcements

12/4/16:  Extra office hours for TBK on 12/5 from 1pm-2pm in GHC 6403.
12/4/16:  Project 4 now due Tuesday 12/6 by 11:59pm.
11/16/16:  Project 4 has been released. It is due Dec 4 at 11:59pm ET.
11/11/16:  Updated Project 3 memory constraints: you are now allowed 10GB.
11/10/16:  Project 3 now due Monday 11/14 by 11:59pm.
10/27/16:  Project 3 has been released. It is due Nov 11 at 11:59pm ET.
10/23/16:  Submission portal for Project 2 added to blackboard.
10/21/16:  Syllabus updated -- approximate due dates for projects added.
10/19/16:  Updated benchmark parsing scores -- see Project 2 page.
10/18/16:  Fixed bug in parsing project code -- download new jar from Project 2 page.
10/17/16:  Parsing V notes fixed -- now has pen annotations from class.
10/16/16:  Parsing scores for a correctly implemented parser have been added to the Project 2 page.
10/5/16:  Project 2 has been released. It is due Oct 24 at 11:59pm ET.
10/5/16:  Slides from the CRF recitation posted here: CRF Slides.
9/25/16:  Project submission instructions added to course site (see below).
9/23/16:  Blackboard is up... try logging in and submitting a test jar file for our submission system test: project 0.
9/20/16:  Project 1 now due Tuesday 9/27 by 11:59pm. Submission details to follow
9/20/16:  Slides from the third recitation posted here: Neural LM Slides.
9/15/16:  Link to decoding paper fixed below.
9/5/16:  Notes from the first recitation posted here: KN Recitation Notes.
9/2/16:  Project 1 has been released. It is due September 22 at 5pm.
8/31/16:  Piazza link posted above.
8/28/16:  First lecture will be on Tuesday 8/30 at 1:30pm in GHC 4307.

Course Description

This course will explore current statistical techniques for the automatic analysis of natural (human) language data. The dominant modeling paradigm is corpus-driven statistical learning, with a split focus between supervised and unsupervised methods.  This term we making Algorithms for NLP a lab-based course. Instead of homeworks and exams, you will complete four hands-on coding projects. This course assumes a good background in basic probability and a strong ability to program in Java. Prior experience with linguistics or natural languages is helpful, but not required.  There will be a lot of statistics, algorithms, and coding in this class.

Slides, materials, and projects for this new iteration of Algorithms for NLP are mainly borrowed from Dan Klein at UC Berkeley.

Project Submission

Submit projects using the class blackboard site. (If you cannot login to the blackboard please email the course staff.)

1. Prepare a directory named 'project' containing no more than 3 files: (a) a jar named 'submit.jar', (b) a pdf named 'writeup.pdf', and (c) an optional jar named 'best.jar'. The jar named 'submit.jar' should contain your implementation of the core project that passes the basic requirements. For example, for project 1, the jar named 'assign1-submit.jar' is all that you would need to turn in -- renaming it 'submit.jar'. The pdf 'writeup.pdf' should contain your writeup for the project. Finally, the file 'best.jar' is an optional additional jar that implements the core project, but need not pass spot-checks. Include this last jar if you wish to demonstrate an improvement over the basic project, possibly using approximations are alternative models.

2. Compress the 'project' directory you created in the last step using the command 'tar cvfz project.tgz project'.

3. Click on the assignments tab of the main blackboard course site and select the assignment corresponding to the project (e.g. Assignment 1 corresponds to Project 1). Click the 'browse my computer' button and select your compressed project directory 'project.tgz' created in the previous step. Finally, click the submit button.

Project Grading

Projects out of 10 points total:
6 Points: Successfully implemented what we asked
2 Points: Submitted a reasonable write-up
1 Point: Write-up is written clearly
1 Point: Substantially exceeded minimum metrics
Extra Credit: Did non-trivial extension to project

Late Day Policy

Each student will be granted 5 late days to use over the duration of the semester. There are no restrictions on how the late days can be used (e.g. all 5 could be used on one project.) Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!

Readings

The primary recommended texts for this course are:

Note that M&S is free online (may need to setup proxy).  Also, make sure you get the purple 2nd edition of J+M, not the white 1st edition.

Syllabus [subject to substantial change!]

Week Date Topics Readings Assignments (Out)
1 Aug 30 Course Introduction J+M 1, M+S 1-3
Sept 1 Language Modeling I J+M 4, M+S 6, Chen & Goodman, Interpreting KN P1: Language Modeling (Due Sept 22)
2 Sept 6 Language Modeling II Massive Data, Bloom, Perfect, Efficient LMs  
Sept 8 Language Modeling III  
3 Sept 13 Speech Recognition I J+M 7
Sept 15 Speech Recognition II J+M 9, Decoding
4 Sept 20 Speech Recognition III, HMMs
Sept 22 POS Tagging, NER, CRFs J+M 5, Brants, Toutanova & Manning  
5 Sept 27 Parsing I M+S 3.2, 12.1, J+M 13
Sept 29 Parsing II M+S 11, J+M 14, Best-First, A*, Unlexicalized  
6 Oct 4 Parsing III Split, Lexicalized, K-Best A*, Coarse-to-fine P2: PCFG Parser (Due Oct 24)
Oct 6 Parsing IV  
7 Oct 11 Parsing V
Oct 13 Structured Classification I Pegasos, Cutting Plane  
8 Oct 18 Structured Classification II
Oct 20 Structured Classification III  
9 Oct 25 Structured Classification IV J+M 16, 18, 19, Adagrad, Subgradient SVM P3: Discriminative Reranker (Due Nov 11)
Oct 27 Formal Grammar  
10 Nov 1 Machine Translation: Alignment I J+M 25, IBM Models, HMM, Agreement, Discriminative
Nov 3 Machine Translation: Alignment II
11 Nov 8 Machine Translation: Phrase-Based Decoding
Nov 10 Machine Translation: Syntactic Hiero, String-Tree, Tree-String, Tree-Tree  
12 Nov 15 Morphology; Features and Unification [more slides] J+M 3, J+M 15 (Note: errors in textbook) P4: Machine Translation (Due Dec 6)
Nov 17 Semantics and Discourse I J+M 17, J+M 18 (Note: errors in textbook)
13 Nov 22 Semantics and Discourse II J+M 12.7.2
Nov 24 Thanksgiving Day
14 Nov 29 Semantics and Discourse III J+M 19, 20.6-20.9
Dec 1 Semantics and Discourse IV J+M 21
15 Dec 6 Neural Machine Translation
Dec 8 Applications TBA