next up previous
Next: Maximum likelihood Up: Maxent Modeling Previous: The maxent principle

Exponential form

The maximum entropy principle presents us with a problem in constrained optimization: find the tex2html_wrap_inline1714 which maximizes tex2html_wrap_inline1716 . In simple cases, we can find the solution to this problem analytically. This was true for the example presented above when we imposed the first two constraints on p. Unfortunately, the solution of the general maximum entropy cannot be written explicitly, and we need a more indirect approach. (The reader is invited to try to calculate the solution for the same example when the third constraint is imposed.)

To address the general problem, we apply the method of Lagrange multipliers from the theory of constrained optimization. The relevant steps are outlined here; the reader is referred to the further readings for a more thorough discussion of constrained optimization as applied to maximum entropy.

The constrained optimization problem at hand is to


We refer to this as the primal problem; it is a succinct way of saying that we seek to maximize tex2html_wrap_inline1724 subject to the following constraints:

  1. tex2html_wrap_inline1726 for all x,y.
  2. tex2html_wrap_inline1730 . This and the previous condition guarantee that p is a conditional probability distribution.
  3. tex2html_wrap_inline1734 . In other words, tex2html_wrap_inline1736 , and so satisfies the active constraints tex2html_wrap_inline1738 .

To solve this optimization problem, introduce the Lagrangiangif


The real-valued parameters tex2html_wrap_inline1746 and tex2html_wrap_inline1748 correspond to the tex2html_wrap_inline1750 constraints imposed on the solution.

Though we shall not prove it here, the following strategy yields the optimal value of tex2html_wrap_inline1752 (which we are calling tex2html_wrap_inline1754 ): first hold tex2html_wrap_inline1756 and tex2html_wrap_inline1758 constant and maximize (8) with respect to tex2html_wrap_inline1760 . This yields an expression for tex2html_wrap_inline1762 in terms of the (still unsolved-for) parameters tex2html_wrap_inline1764 and tex2html_wrap_inline1766 . Now substitute this expression back into (8), this time solving for the optimal values of tex2html_wrap_inline1768 and tex2html_wrap_inline1770 ( tex2html_wrap_inline1772 and tex2html_wrap_inline1774 , respectively).

Proceeding in this manner, we hold tex2html_wrap_inline1776 fixed and compute the unconstrained maximum of the tex2html_wrap_inline1778 over all tex2html_wrap_inline1780 :


Equating this expression to zero and solving for tex2html_wrap_inline1782 , we find that at its optimum, tex2html_wrap_inline1784 has the parametric form


We have thus found the parametric form of tex2html_wrap_inline1788 , and so we now take up the task of solving for the optimal values tex2html_wrap_inline1790 . Recognizing that the second factor in this equation is the factor corresponding to the second of the constraints listed above, we can rewrite (10) as


where tex2html_wrap_inline1794 , the normalizing factor, is given by


We have found tex2html_wrap_inline1796 but not yet tex2html_wrap_inline1798 . Towards this end we introduce some further notation. Define the dual function tex2html_wrap_inline1800 as


and the dual optimization problem as


Since tex2html_wrap_inline1802 and tex2html_wrap_inline1804 are fixed, the righthand side of (14) has only the free variables tex2html_wrap_inline1806 .

It is far from obvious that the tex2html_wrap_inline1808 of (11) with tex2html_wrap_inline1810 given by (14) is in fact the solution to the constrained optimization problem we set out to find. But in fact this is due to a fundamental principle in the theory of Lagrange multipliers, called generically the Kuhn-Tucker theorem, which asserts that (under suitable assumptions, which are satisfied here) the primal and dual problems are closely related. Although a detailed account of this relationship is beyond the scope of this paper, it is easy to state the final result:

The maximum entropy model subject to the constraints tex2html_wrap_inline1812 has the parametric form tex2html_wrap_inline1814 of (11), where tex2html_wrap_inline1816 can be determined by maximizing the dual function tex2html_wrap_inline1818 .

next up previous
Next: Maximum likelihood Up: Maxent Modeling Previous: The maxent principle

Adam Berger
Fri Jul 5 11:43:50 EDT 1996