**Cultural Interest**

S. Guiasu and A. Shenitzer.
The principle of maximum entropy.
*The Mathematical Intelligencer*, 7(1), 1985.
(An overview paper)

E. Jaynes.
Notes on present status and future prospects.
In W.T. Grandy and L.H. Schick, editors, *Maximum Entropy and
Bayesian Methods*, pages 1-13. Kluwer, 1990.
(Depending on your viewpoint, Jaynes deserves credit for
either inventing maxent or, at the very least, formalizing it, in 1957.)

**Feature induction**

S. Della Pietra, V. Della Pietra, and J. Lafferty.
Inducing features of random fields.
*IEEE Transactions on pattern analysis and machine intelligence*,
19(4), 380-393, April, 1997
(Introduces an iterative algorithm for constructing an
exponential model from ``informative'' features selected automatically
from a large candidate set.)

**Iterative scaling**

D. Brown.
A note on approximations to discrete probability distributions.
*Information and Control*, 2:386-392, 1959.

I. Csiszár.
I-divergence geometry of probability distributions and minimization
problems.
*The Annals of Probability*, 3(1):146-158, 1975.

I. Csiszár and G. Tusnády.
Information geometry and alternating minimization procedures.
*Statistics & Decisions, Supplemental Issue:1*, pages
205-237, 1984.

I. Csiszár.
A geometric interpretation of Darroch and Ratcliff's generalized
iterative scaling.
*The Annals of Statistics*, 17(3):1409-1413, 1989.

J. Darroch and D. Ratcliff.
Generalized iterative scaling for log-linear models.
*Ann. Math. Statistics*, 43:1470-1480, 1972.

The [Della Pietra, Della Pietra, Lafferty] reference above also formally
introduces the *improved iterative scaling* algorithm, a procedure for
computing maximum-likelihood estimates of the parameters in a maxent
distribution.

**Applications**

The proceedings of the yearly conference *Maximum Entropy and Bayesian
Methods* has been published by Kluwer for at least the last ten years and
always contains interesting applications of maxent to areas as diverse as
portfolio optimization, signal processing, nuclear physics, and, of all things,
the ``two envelope'' paradox.

A. Berger, S. Della Pietra, and V. Della Pietra.
A maximum entropy approach to natural language processing.
*Computational Linguistics*, 22(1):39-71, 1996.
(Covers selected applications in machine translation, including
word-sense disambiguation and word reordering)

R. Rosenfeld.
A maximum entropy approach to adaptive statistical language modelling.
*Computers, Speech and Language*, 1996
(Uses exponential models to construct a conditional model of language
which improves upon the standard ``trigram'' model.)

A. Ratnaparkhi.
A maximum entropy part of speech tagger
* Proceedings of the conference on empirical methods in natural language
processing*, May 1996, University of Pennsylvania.
(Adwait has done applied maxent to several problems in natural language
processing; see his web
page for a more complete list.

Wed Dec 17 23:49:11 EST 1997