next up previous
Next: Exponential form Up: Maxent Modeling Previous: Features and constraints

The maxent principle


Suppose that we are given n feature functions tex2html_wrap_inline1628 , which determine statistics we feel are important in modeling the process. We would like our model to accord with these statistics. That is, we would like tex2html_wrap_inline1630 to lie in the subset tex2html_wrap_inline1632 of tex2html_wrap_inline1634 defined by


Figure 1 provides a geometric interpretation of this setup. Here tex2html_wrap_inline1636 is the space of all (unconditional) probability distributions on 3 points, sometimes called a simplex. If we impose no constraints (depicted in (a)), then all probability models are allowable. Imposing one linear constraint tex2html_wrap_inline1640 restricts us to those tex2html_wrap_inline1642 which lie on the region defined by tex2html_wrap_inline1644 , as shown in (b). A second linear constraint could determine tex2html_wrap_inline1646 exactly, if the two constraints are satisfiable; this is the case in (c), where the intersection of tex2html_wrap_inline1648 and tex2html_wrap_inline1650 is non-empty. Alternatively, a second linear constraint could be inconsistent with the first--for instance, the first might require that the probability of the first point is tex2html_wrap_inline1652 and the second that the probability of the third point is tex2html_wrap_inline1654 --this is shown in (d). In the present setting, however, the linear constraints are extracted from the training sample and cannot, by construction, be inconsistent. Furthermore, the linear constraints in our applications will not even come close to determining tex2html_wrap_inline1656 uniquely as they do in (c); instead, the set tex2html_wrap_inline1658 of allowable models will be infinite.

Figure 1: Four different scenarios in constrained optimization. tex2html_wrap_inline1660 represents the space of all probability distributions. In (a), no constraints are applied, and all tex2html_wrap_inline1662 are allowable. In (b), the constraint tex2html_wrap_inline1664 narrows the set of allowable models to those which lie on the line defined by the linear constraint. In (c), two consistent constraints tex2html_wrap_inline1666 and tex2html_wrap_inline1668 define a single model tex2html_wrap_inline1670 . In (d), the two constraints are inconsistent (i.e. tex2html_wrap_inline1672 ); no tex2html_wrap_inline1674 can satisfy them both.

Among the models tex2html_wrap_inline1676 , the maximum entropy philosophy dictates that we select the distribution which is most uniform. But now we face a question left open earlier: what does ``uniform'' mean?

A mathematical measure of the uniformity of a conditional distribution tex2html_wrap_inline1678 is provided by the conditional entropygif


The entropy is bounded from below by zero, the entropy of a model with no uncertainty at all, and from above by tex2html_wrap_inline1696 , the entropy of the uniform distribution over all possible tex2html_wrap_inline1698 values of y. With this definition in hand, we are ready to present the principle of maximum entropy.

To select a model from a set tex2html_wrap_inline1702 of allowed probability distributions, choose the model tex2html_wrap_inline1704 with maximum entropy tex2html_wrap_inline1706 :


It can be shown that tex2html_wrap_inline1708 is always well-defined; that is, there is always a unique model tex2html_wrap_inline1710 with maximum entropy in any constrained set tex2html_wrap_inline1712 .

next up previous
Next: Exponential form Up: Maxent Modeling Previous: Features and constraints

Adam Berger
Fri Jul 5 11:43:50 EDT 1996