| Miro Dudík | home :: calendar :: research :: publications |
| Maximum Entropy Density Estimation |
We consider the problem of estimating probability densities. Maximum entropy principle (maxent) states that density estimates should respect empirical information, expressed as constraints, and be as close to the uniform distribution as possible, thus avoiding any bias beyond the constraints. Constraints are specified in terms of real valued "features" defined over a sample space. Most commonly, they require that means of features with respect to a density estimate match empirical means determined from data. This approach is, however, bound to overfit when we have a large number of constraints and too little data. There are many ways to smooth maxent and thus avoid overfitting. The purpose of this work is to understand relationships between various smoothing techniques and, more importantly, to derive performance guarantees. Our result is that smoothing by regularization is equivalent to relaxation of constraints. We also provide guarantees that give insights into which types of relaxed constraints will lead to good performance. Publications
|
| Modeling Geographic Distributions of Species |
Our goal is to model geographic distributions of biological species based on (i) their observed occurrence localities and (ii) environmental characteristics of a given region. Such models are used in conservation biology, ecology and land-use planning. The richest source of data are museums and herbaria, but the number of occurrence records for many species of interest (e.g. endangered species) is quite small by machine learning standards (20-50 or even less) and they are often collected in a highly biased manner. These issues pose a significant challenge for statistical methods. Coping with this challenge has been the focus of this work. Together with Rob Schapire and Steven Phillips, we proposed to use the maximum entropy approach to model species distributions. We developed the program MaxEnt available for download. Publications
|
| Last modified: Apr 19th, 2008 |