We consider a random process which produces an output value y, a member of a
finite set . For the translation example just considered, the process
generates a translation of the word in, and the output y can be any
word in the set {dans, en, à, au cours de,
pendant}. In generating y, the process may be influenced by some contextual
information x, a member of a finite set
. In the present example,
this information could include the words in the English sentence surrounding
in.
Our task is to construct a stochastic model that accurately represents the behavior of the random process. Such a model is a method of estimating the conditional probability that, given a context x, the process will output y.
A word here on notation: a rigorous protocol requires that we differentiate a
random variable from a particular value it may assume. One approach is to write
a capital letter for the first and lowercase for the second: X is the random
variable (in the case of a six-sided die, ), and x is a
particular value assumed by X. Furthermore, we should distinguish a
probability distribution, say
, (
is
appropriate for a fair die) from a particular value assigned by the
distribution to a certain event, say
. Having conceded what we
should do, we shall henceforth (when appropriate) dispense with the
capitalized letters and let the context disambiguate the meaning of
: an
entire model
or the value assigned by the model to the event
X=x. Furthermore, we will denote by
the set of all conditional
probability distributions. Thus a model
is, by definition, just an
element of
.