We consider a random process which produces an output value *y*, a member of a
finite set . For the translation example just considered, the process
generates a translation of the word *in*, and the output *y* can be any
word in the set {*dans*, *en*, *à*, *au cours de*, *
pendant*}. In generating *y*, the process may be influenced by some contextual
information *x*, a member of a finite set . In the present example,
this information could include the words in the English sentence surrounding
*in*.

Our task is to construct a stochastic model that accurately represents the
behavior of the random process. Such a model is a method of estimating the
conditional probability that, given a context *x*, the process will output
*y*.

A word here on notation: a rigorous protocol requires that we differentiate a
random variable from a particular value it may assume. One approach is to write
a capital letter for the first and lowercase for the second: *X* is the random
variable (in the case of a six-sided die, ), and *x* is a
particular value assumed by *X*. Furthermore, we should distinguish a
probability distribution, say , ( is
appropriate for a fair die) from a particular value assigned by the
distribution to a certain event, say . Having conceded what we *
should* do, we shall henceforth (when appropriate) dispense with the
capitalized letters and let the context disambiguate the meaning of : an
entire model or the value assigned by the model to the event
*X=x*. Furthermore, we will denote by the set of all conditional
probability distributions. Thus a model is, by definition, just an
element of .

- Training data
- Features and constraints
- The maxent principle
- Exponential form
- Maximum likelihood
- Outline

Fri Jul 5 11:43:50 EDT 1996