Information Theoretic Feature Extraction

John W. Fisher III
Massachusetts Institute of Technology

Modern machine learning methods are being applied to data of increasingly higher dimension. Classical decision-theoretic approaches are not well suited to high-dimensional data. Consequently, dimensionality reduction, or feature extraction, is often first performed in an attempt to simplify the estimation or classification task. Common methods for feature extraction, however, are either ad hoc or optimal only in the signal reconstruction sense (e.g. eigenvector based methods). The challenging task is to learn "informative" directions from high-dimensional data. Utilizing principles of information theory, non-parametric statistics and machine learning I describe a task-driven feature extraction approach. Specifically, the features preserve information related to the given estimation/classification problem. Mutual information, motivated by Fano's inequality, is the criterion used for feature extraction. The novelty of the approach is that mutual information is optimized in the feature space (thereby avoiding the curse of dimensionality) without explicit estimation or modeling of the underlying density. I present experimental results for two challenging image processing problems: pose estimation and classification of high-resolution SAR imagery.

Back to Abstracts Last modified: Sun Nov 21 08:27:00 EST 1999