# Probability and Structure in Natural Language Processing

## Slides

- Lecture 1 (Monday 8:00), Probabilistic graphical models
- Lecture 2 (Monday 13:30), PGMs (continued), inference
- Lecture 3 (Tuesday 8:00), Inference (continued), structures and decoding
- Lecture 4 (Tuesday 14:30), Structures and decoding (continued), supervised learning
- Lecture 5 (Wednesday 8:00), Supervised learning (continued)
- Lecture 6 (Wednesday 13:30), Hidden variables

## Goals

This course covers key ideas at the junction of natural language processing (NLP) and machine learning. The goal is to make it easier for NLP researchers to follow relevant research in machine learning, and to contribute to the growing body of research that uses advanced statistical modeling techniques to solve hard language processing problems. The tutorial breaks down into three main parts.
**Probabilistic graphical models.** Probabilistic graphical models are a major topic in machine learning. They provide a foundation for statistical modeling of complex data, and starting points (if not full-blown solutions) for inference and learning algorithms. They generalize many familiar methods in NLP. We'll cover Bayesian networks, Markov networks, the relationship between them, and present inference as the central question when working with graphical models.

**Linear structure models.** Most problems in linguistic analysis are currently solved by applying discrete optimization techniques (dynamic programming, search, and others) to identify a structure that maximizes some score, given an input. We describe a few ways to think about the problem of prediction itself (a kind of inference), and review key approaches to learning structured prediction models. An emphasis will be placed on unifying a wide range of approaches (generative models, conditional models, structured perceptron, structured max margin).

**Incomplete data.** Since we will never have as much annotated linguistic data as we'd like in all the languages, domains, and genres for which we'd like to do NLP, semisupervised and unsupervised learning have become hugely important. We show how the foundations from the first two parts can be extended to provide a framework for learning with incomplete data. We’ll review Expectation-Maximization in light of what we have covered so far and discuss recently proposed Bayesian techniques.

## References

Koller, Daphne and Nir Friedman. 2009. *Probabilistic Graphical Models: Principles and Techniques*. MIT Press. [link]

Smith, Noah A. 2011. *Linguistic Structure Prediction*. Morgan & Claypool. [link]