The log-likelihood of the empirical distribution as predicted by
a model *p* is defined by

It is easy to check that the dual function of the previous section is, in fact, just the log-likelihood for the exponential model ; that is

where has the parametric form of (11). With this interpretation, the result of the previous section can be rephrased as:

The model with maximum entropy is the model in the parametric family that maximizes the likelihood of the training sample .

This result provides an added justification for the maximum entropy principle: if the notion of selecting a model on the basis of maximum entropy isn't compelling enough, it so happens that this same is also the model which, from among all models of the same parametric form (11), can best account for the training sample.

Fri Jul 5 11:43:50 EDT 1996