William Cohen, Machine
Learning Dept and LTI
TA: Vitor Carvalho (Office Hours: Tuesdays/Thursdays, email to schedule)
When/where: Tues/Thus 12-1:20, Wean Hall 4615a
Course Number: 10-707, cross-listed in LTI as 11-748
Syllabus: below
Announcements:
- 5/3: Pradipta is out sick today, so we'll push his
presentation to next week Tuesday.
- 4/5: William is traveling Friday, so there will be no office
hours.
- 3/27: Projects due Fri May 11. Last class is
(officially) Thus May 3, but to allow time for all the project
presentations, we’ll also meet May 8 and May 10. (Send me mail if
you can’t attend.) Project presentations start April 17.
There will be two per class session, and they should be 30min
each, plus questions.
Preliminary reports are ok in your presentation, but please
make the presentation clear, and keep it on time. State the
problem and describe the approach and related work, even if the
results aren't in yet.
- 2/15: I've (once again) updated the readings for next week -
see below. Only the Toutanova paper is required. Also, office
hours are now set for 10:30-12:00 Friday.
- 2/8: I've updated the readings for next week - see below.
- 2/8: Reminder, project abstracts with teams are due Tuesday, 2/13.
- 2/2: On Tuesday 2/6, everyone should submit an abstract
(email to William cc Vitor, and hardcopy) of their project: One
page, covering some subset (probably a proper subset) of these
items: what you plan to do; why you think its interesting; any
relevant special skills you might have; how you plan to evaluate;
what techniques you plan to use; what question you want to answer;
who you might work with. These will be posted on the class web
site.
After you've submitted abstracts, you'll need to organize
yourselves into teams. On the next Tuesday, 2/13, a similar
abstract should be submitted from each team. A team is
(preferably) 2-3 people.
- 1/22: Paper critiques for the week should be submitted on
Tuesday before class. E.g., summaries for the four papers
discussed the week of Jan 30th (Janche and Abney, Cohen et al,
Freitag & Kushmerick, Borthwick et al) should be submitted before
class Jan 30th. Submit your critiques by email (to vitor@cs, cc
wcohen@cs). Each paper critique should be about 200-500 words
(think half page or a page) discussing one thing you liked and/or
one thing you didn�t like about the paper.
The first round of paper critiques is due on 1/30 - nothing
is due on 1/23.
- 1/22: Reminder, there's no class on Thus 1/25.
Description
Information extraction is finding names of entities in unstructured or
partially structured text, and determining the relationships that hold
between these entities. More succinctly, information extraction is
the problem of deriving structured factual information from text.
This course considers the problem of information extraction from a
machine-learning prospective. We will survey a variety of learning
methods that have been used for information extraction, including
rule-learning, boosting, and sequential classification methods such as
hidden Markov models, conditional random fields, and structured
support vector machines. We will also look at experimental results
from a number of specific information extraction domains, such as
biomedical text, and discuss semi-supervised "bootstrapping" learning
methods for information extraction.
Readings will be based on research papers. Grades will be
based on class participation, paper presentations, and a project. A
syllabus is below. You can also find a complete syllabus
with slides for the course as taught last (Spring 2004). The
Spring 2007 course will concentrate less on information integration,
and will cover more topics in information extraction. There will also
be a focus on techniques for structured
learning.
Readings will be based on research papers. Grades will be based on
class participation, paper presentations, and a project.
More specifically, students will be expected to:
Prerequisites: a machine learning course (e.g., 15-781 or 15-681) or
consent of the instructor.
Syllabus
Overview/Survey of Information Extraction
Lectures: (Slides will be posted after each class).
- Tues Jan 16. Overview of Information Extraction 1 - Introduction, History, and Techniques for Named Entity
Recognition (NER) (Slides).
- Thus Jan 18. Overview of Information Extraction 2 - Techniques for Relation/Event/Fact Extraction.
(Slides)
- Tues Jan 23. How to use Minorthird for Named Entity Recognition (Tutorial)
- Thus Jan 25. No class.
Readings:
NER by Classifying Candidate Text Segments or Tokens
Lectures:
- Tues Jan 30. Discussion of the key points from Jansche and Abney, and Cohen et
al. (Slides)
- Thus Feb 1. Frietag et al. and Borthwick et al (Slides)
Background material discussed: Viterbi for sequential models.
Readings:
- Information
Extraction from Voicemail Transcripts, Janche and Abney, EMNLP 2002
(For background, here's the Huang et
al 2001 paper they compare to.)
- Understanding
Captions in Biomedical Publications, Cohen et al., KDD 2002.
- Boosted
Wrapper Induction, Freitag and Kushmerick, AAAI 2000
- Exploiting
Diverse Knowledge Sources via Maximum Entropy in Named Entity
Recognition, Borthwick et al, Workshop on Very Large Corpora 1998
- (Optional) Use
of Support Vector Machines in Extended Named Entity
Recognition, Takeuchi and Collier, CoNLL 2002
- (Optional) Ranking
Algorithms for Named-Entity Extraction: Boosting and the Voted
Perceptron., Collins, ACL 2002
- (Optional) Unsupervised
Models for Named Entity Classification, Collins and
Singer, EMNLP 1999
NER as Sequential Token Classification with Graphical
Models - 1 (HMMs and CMMs)
Lectures:
- Tues Feb 6. Hidden Markov models for NER: Discussion of Borkar et al
(Slides).
- Thus Feb 8. Maxent Markov models (aka Conditional Markov models) for NER:
Discussion of Freitag et al (Slides)
Background material discussed: Maxent/logistic regression, gradient-descent and Newton
optimization methods.
Readings:
- Automatic
Segmentation of Text Into Structured Records, Borkar,
Deshmukh, and Sarawagi, SIGMOG 2001
- Maximum Entropy
Markov Models for Information Extraction and Segmentation
Freitag et al, ICML 2000
-
A Maximum Entropy Part-Of-Speech Tagger, Ratnaparkhi, Workshop on Very Large Corpora 1996
- Background tutorials: Mike
Collins on learning in NLP, including a section on maxent
taggers; Dan
Klein on maxent.
- (Optional)An
Algorithm that Learns What's in a Name, Bikel et al, MLJ 1999
- (Optional)Unsupervised
Learning of Field Segmentation Models for Information
Extraction, Grenager, Klein, and Manning, ACL 2005
- (Optional) Named
Entity Recognition with Character-Level Models, Klein et al, CoNLL 2003
NER as Sequential Token Classification with Graphical
Models - 2 (CRFs)
Lectures:
- Tues Feb 13. Conditional Random Fields for NER:
Discussion of Sha & Pereira (Slides)
Student presentations: Jon Elsas (Grenager et al) [Slides]; Terrill Franz (Krogh)
- Thus Feb 15. Background on NER and more discussion of Sha
& Pereira (Slides)
Student presentation: Jana Diesner (Named Entity Recognition ...,Klein et al)
Readings:
- Shallow
parsing with conditional random fields.Sha and
Pereira, ACL 2003
- Conditional
Structure versus Conditional Estimation in NLP Models, Klein and Manning, EMNLP 2002
- Stacked Sequential Learning,
Cohen & Carvalho, IJCAI 2005 (critique for this is optional, since I posted it so late).
- Background tutorial:
An
Introduction to Conditional Random Fields for Relational
Learning, Sutton & McCallum, 2006
- (Optional)
Kernel Conditional Random Fields: ..., Lafferty et al, ICML 2004
- (Optional) Hidden
Markov Models for Labeled Sequences, Krogh 1994
- (Optional) Early
Results for Named Entity Recognition with Conditional Random
Fields, Feature Induction and Web-Enhanced Lexicons, McCallum and Li, CoNLL 2003
- (Optional) Table
Extraction Using Conditional Random Fields, Pinto et al, SIGIR 2003
- (Optional)
Transformation-Based Error-Driven Learning and Natural
Language Processing, Brill, COLING 1995
- (Optional) Semi-Supervised
Conditional Random Fields for Improved Sequence Segmentation
and Labeling, Jiao et al, ACL 2006.
CRFs, CMMs, and Dependency networks
Lectures:
- Tues Feb 20. Additional insights into CRFs vs MEMMs vs HMMs:
Discussion of Klein and Manning (Slides)
and Cohen and Carvalho (Slides)
Student presentation: Kevin Gimpel (Collins, "Ranking Algorithms for
NER:...")(Slides)
- Thus Feb 22. Dependency networks for sequential learning:
Discussion of Toutanova et al. (Slides)
Student presentations: Yimeng Zhang (Collins & Singer)(Slides); Paisarn Charoenpornsawat (Brill)[Slides]
Readings:
- Feature-Rich
Part-of-Speech Tagging with a Cyclic Dependency Network, Toutanova et al, NAACL 2003
- Background on dependency nets: Dependency
Networks for Inference, Collaborative Filtering, and Data
Visualization, Heckerman et al, JMLR 2000
- (Optional) Semi-Markov
Conditional Random Fields for Information Extraction,
Sarawagi & Cohen, NIPS 2004
- (Optional): A Hybrid
Markov/Semi-Markov Conditional Random Field. for Sequence
Segmentation, Andrew, EMNLP 2006
- (Optional): Improving
the Scalability of Semi-Markov Conditional Random Fields for
Named Entity Recognition, Okanohara et al, ACL 2006
- (Optional): Integer
Linear Programming Inference for Conditional Random
Fields, Roth and Yih, ICML 2005
Long-range dependencies in NER/Margin Methods
Lectures:
- Tues Feb 27. Long-range dependencies in NER: Skip-chain
CRFs and relational Markov networks; Stacked graphical learning
for NER (Slides).
- Thus March 1. Background: the voted perceptron, or
why large margins work. (Notes)
Student presentations: Ian Fette (McCallum and Li);
Yiming Zhang (Collins and Singer).
Readings:
- Collective
Segmentation and Labeling of Distant Entities..., Sutton
and McCallum, ICML Workshop on SRL, 2004
- An
Effective Two-Stage Model for Exploiting Non-Local
Dependencies in Named Entity Recognition, Krishnan and
Manning, ACL 2006.
- Large
Margin Classification Using the Perceptron Algorithm, Freund and Schapire.
- (Optional) Stacked
Graphical Models for Efficient Inference in Markov Random
Fields, Kou & Cohen, SDM 2007
- (Optional)
Collective Information Extraction with Relational Markov
Networks, Bunescu & Mooney, ACL 2004
- Background: A
Statistical Learning Model of Text Classification with Support
Vector Machines, Joachims.
Sequential Classification with Margin-based Methods
Lectures:
- Tues March 6. The voted-perceptron trained VPHMM (Slides).
Student presentations: A. Lad (Pinto et al), Konstantin
Salomatin (?)
- Thus March 8. No lecture.
Student presentations: Andrew Arnold (Mayfair, CoNLL2003)[Slides]; Mahesh Joshi (Cucerzan/Yarowsky)[Slides]; Pradipta Ray (Skounakis et al.)[Slides].
Readings:
-
Discriminative Training Methods for Hidden Markov Models:
Theory and Experiments with Perceptron Algorithms,
Collins, EMNLP 2002
- Large
Margin Methods for Structured and Interdependent Output
Variables, Tsochantaridis et al, JMLR 2005
- (Optional): Exponentiated
gradient algorithms for large-margin structured
classification, Bartlett et al, NILS 2004
- (Optional): A
Robust Risk Minimization based Named Entity Recognition
System, Zhang and Johnson, CoNLL 2003
- (Optional): A
Simple Named Entity Extractor using AdaBoost, Carreras et al, CoNLL 2003
- (Optional): Named
Entity Recognition using Hundreds of Thousands of Features, Mayfield et a, CoNLL 2003
- (Optional): A
High-Performance Semi-Supervised Learning Method for Text
Chunking, Ando and Zhang, ACL 2005
Spring Break!
From Entities to Facts
Lectures:
- Tues March 20. Structured SVMs for sequential learning. (Slides)
Student presentations: YiChia Wang ("Learning Dictionaries...", Rilof+Jones)[Slides], Ben Lambert ("Joint Extraction of Entities and Relations...",Choi+Breck+Cardie)
- Thus March 22. Relation extraction 1: Pairwise entity classification and role classification.
(Slides)
Student presentations: Andy Schlaikjer ("Exploring Syntactic Features...", Zhang et al. HLT-06)[Slides], Frank Lin ("URES: an unsupervised web relation extraction system", Rosenfeld+Feldman ACL-06)
- Tues March 27. Relation extraction with kernels; semi-Markov
models for extraction. (Slides)
- Thus March 29. Semantic role labeling. (Slides)
Student presentations: Dipanjan Das ("Semantic Role Labeling with Tree Conditional Random Fields")[Slides], Shilpa Arora ("A Joint Model for Semantic Role Labeling")[Slides]
Student presentations: Oznur Tastan ("Protein Quaternary Fold Recognition Using Conditional Graphical Models", Liu et al, IJCAI-07)[Slides], Nguyen Bach ("Locating Complex Named Entities in Web Text", Downey et al, IJCAI-7)[Slides]
Readings:
- Subsequence
Kernels for Relation Extraction, Bunescu and Mooney, NIPS 2005
-
A Shortest Path Dependency Kernel for Relation Extraction,
Bunescu and Mooney, EMNLP 2005
- Background: Tutorial
on SRL, Yih and Toutanova, NAACL 06
- Generalized
Inference with Multiple Semantic Role Labeling Systems,
Punyakanok et al, CoNLL 2005
- (Optional)
Hierarchical Hidden Markov Models for Information
Extraction, Skounakis, Craven and Ray, IJCAI 2003
- (Optional) Exploring Syntactic
Features for Relation Extraction using a Convolution Tree Kernel, Zhang et al,
HLT 2006
- (Optional) Joint
Extraction of Entities and Relations for Opinion
Recognition, Choi, Breck, Cardie, EMNLP 2006
- (Optional):
A Joint Model for Semantic Role Labeling, Haghighi et al, CoNLL 2005.
- (Optional):
Semantic Role Labelling with Tree Conditional Random Fields, Cohn
Bootstrapping
Lectures:
- Tues April 3. Review of Bootstrapping (Slides)
Student presentations: E. Cinar Sahin ("An algorithm that learns what's in a name", Bikel et al), Mohit Kumar (TBD: one of the semantic parsing papers)
- Thus April 5. Etzioni's Know-it-all System (Slides)
Student presentations: Shay Cohen ("Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines")[Slides], Mengqiu Wang ("Semi-supervised conditional random field for improved sequence segmentation and labeling")
- Tues April 10. Relations and analogy 1 (Slides)
Student
presentations: Hideki Shima (Michelson and Knoblock); Sharath
Rao (Gamon
and Aue)
- Thus April 12. Relations and analogy 2 (Slides)
Student presentations: Andreas Zollmann, Terrill Frantz
Readings:
-
Unsupervised Named-Entity Extraction from the Web: An
Experimental Study, Etzioni et al,
AIJ 2005.
- (Optional): URES:
an Unsupervised Web Relation Extraction System, Rosenfeld
and Feldman, COLING/ACL 2006.
- (Optional): Locating
Complex Named Entities in Web Text, Downey et al, IJCAI
2007
- (Optional): A Semantic
Approach to IE Pattern Induction, Stevenson and Greenwood, ACL 2005
- (Optional):
Robust Reading: Identification and Tracing of Ambiguous Names,
Li et al, ACL 2004.
- (Optional): Semantic
Annotation of Unstructured and Ungrammatical Text,
Michelson and Knoblock, IJCAI 2005.
- (Optional:)
Learning to Map Sentences to Logical Form: Structured
Classification with Probabilistic Categorial Grammars,
Zettlemoyer and Collins, UAI 2005
- (Optional): Improving
Name Tagging by Reference Resolution and Relation
Detection, Ji and Girshman, ACL 2005
- (Optional): Using
String-Kernels for Learning Semantic Parsers, Kate and
Mooney, ACL 2006.
- (Optional): Using
Predicate-Argument Structures for Information Extraction,
Surdeanu et al, ACL 2003.
Similarity and Information Extraction
- Tues April 17. Information Extraction and Reasoning (Slides)
Project presentation: Jon Elsas, Sharath Rao - IE and TDT
- Thus April 19. No class - Spring Carnival!
Readings (none required):
Project presentations
- Tues April 24. Project presentations: Jana Diesner & Terrill Frantz (Associating textual spans with categories of a specific ontology), Yimeng Zhang (Sequential Incremental and Cost-Sensitive Learning Algorithm to Reduce False Intrusion Detection Alerts)
- Thus April 26. Project presentations: Mohit & Dipanjan (Automatic Extraction of 'Briefing' Templates), Ian Fette (NER using Google N-Gram data)
- Tues May 1. Project presentations: Shay Cohen & Kevin Gimpel(...), Andrew Arnold & Ramesh Nallapati (Learning to do cross-domain information extraction on the web)
- Thus May 3. Project presentations: Mahesh Joshi & YiChia Wang (Bootstrapping Approach to Identify Technical Terms and Relations in
Newsgroup Style Interaction), Pradipta Ray (Finding cis-regulatory modules using mixtures of phylogeny)
- Tues May 8. Project presentations: Andy Schlaikjer (...), Hideki & Shilpa & Mengqiu & Frank (...)
- Thus May 10. Project presentations: Ben Lambert (Jointly Learning to Extract Entities and Relations from the Web with Background Knowledge), Andreas & Oznur (...), E. Cinar Sahin (Extracting Plan Graphs from Precedural Text), Nguyen & Paisarn & Kostya (...)
Notice: Classroom activities may be taped or recorded by a student for
the personal use of that student or for all students presently
enrolled in the class only, but may not be further copied,
distributed, published or otherwise used for any other purpose without
the express written consent of Dr. Cohen. Do not leave small children
unattended. This syllabus should not be used as a flotation device.
Last modified: Mon Mar 26 09:48:12 EDT 2007