12:00, Wed 28 May 1997, WeH 7220
Statistical Learning Algorithms Based on Bregman Distances
John Lafferty
We present a class of statistical learning algorithms formulated in
terms of minimizing Bregman distances, a family of generalized entropy
measures associated with convex functions. The inductive learning
scheme is akin to growing a decision tree, with the Bregman distance
filling the role of the impurity function in tree-based classifiers.
Our approach is based on two components. In the feature selection step,
each linear constraint in a pool of candidate features is evaluated by
the reduction in Bregman distance that would result from adding it to
the model. In the constraint satisfaction step, all of the parameters
are adjusted to minimize the Bregman distance subject to the chosen
constraints. We introduce a new iterative estimation algorithm for
carrying out both the feature selection and constraint satisfaction
steps, and outline a proof of the convergence of these algorithms.
Informal Note: Lately, I often ask machine learning types if they've
heard of Bregman distances. No one I've ever asked has! This family of
"distances" might be worth knowing about, however, in light of Csiszar's
results on axiomatic characterizations of statistical inference
problems. They are also widely used in certain inverse problems that
come up in computerized tomography. This is based on a paper I'll
present at the upcoming Canadian Workshop on Information Theory at the
Fields Institute in Toronto.