The maximum entropy principle presents us with a problem in constrained
optimization: find the which maximizes . In simple cases,
we can find the solution to this problem analytically. This was true for the
example presented above when we imposed the first two
constraints on *p*. Unfortunately, the solution of the general maximum entropy
cannot be written explicitly, and we need a more indirect approach. (The
reader is invited to try to calculate the solution for the same example when
the third constraint is imposed.)

To address the general problem, we apply the method of Lagrange multipliers from the theory of constrained optimization. The relevant steps are outlined here; the reader is referred to the further readings for a more thorough discussion of constrained optimization as applied to maximum entropy.

The constrained optimization problem at hand is to

We refer to this as the *primal* problem; it is a succinct way of saying
that we seek to maximize subject to the following constraints:

- for all
*x,y*. - . This and the previous condition
guarantee that
*p*is a conditional probability distribution. - . In other words, , and so satisfies the active constraints .

To solve this optimization problem, introduce the Lagrangian

The real-valued parameters and correspond to the constraints imposed on the solution.

Though we shall not prove it here, the following strategy yields the optimal value of (which we are calling ): first hold and constant and maximize (8) with respect to . This yields an expression for in terms of the (still unsolved-for) parameters and . Now substitute this expression back into (8), this time solving for the optimal values of and ( and , respectively).

Proceeding in this manner, we hold fixed and compute the unconstrained maximum of the over all :

Equating this expression to zero and solving for , we find that at its optimum, has the parametric form

We have thus found the parametric form of , and so we now take up the task of solving for the optimal values . Recognizing that the second factor in this equation is the factor corresponding to the second of the constraints listed above, we can rewrite (10) as

where , the normalizing factor, is given by

We have found but not yet . Towards this end we
introduce some further notation. Define the *dual function*
as

and the dual optimization problem as

Since and are fixed, the righthand side of (14) has only the free variables .

It is far from obvious that the of (11) with given by (14) is in fact the solution to the constrained optimization problem we set out to find. But in fact this is due to a fundamental principle in the theory of Lagrange multipliers, called generically the Kuhn-Tucker theorem, which asserts that (under suitable assumptions, which are satisfied here) the primal and dual problems are closely related. Although a detailed account of this relationship is beyond the scope of this paper, it is easy to state the final result:

The maximum entropy model subject to the constraints has the parametric form of (11), where can be determined by maximizing the dual function .

Fri Jul 5 11:43:50 EDT 1996