DistLearnKit

A Matlab Toolkit for Distance Metric Learning

Liu Yang, Prof. Rong Jin

Welcome! This is a Matlab toolkit for distance metric learning, including the implementation of a number of published machine learning algorithms in this area. The first version of this toolkit has been available since Oct. 28, 2007.

  • This toolkit is to provide a collection of baseline methods for distance metric learning research, and to faciliate the usage of these approaches in applications. The Local Distance Metric Learning algorithms (LDM) and Active Distance Metric Learning (BAYES+VAR) were develped by the author, and the rest implementations were collected online. The toolkit does not cover all the related work in recent years. If you find other interesting approaches with its matlab implementation, please E-mail Me. You contribution will be highly appreciated.

  • Depending on the availability of the training examples (or side information), most distance metric learning techniques can be classified into two categories: Supervised Distance Metric Learning and Unsupervised Distance Metric Learning. Supervised distance metric learning makes use of label information to identify correlations between dimensions, that are most informative to the classes of examples. Unsupervised distance metric aims to construct a low-dimensional manifold where geometric relationships between most of the observed data are largely preserved. We organize two categories of appraoched in the following two tables. Each table specifies a few general properties for distance metric learning methods (for instance, linear vs. nonlinear, and global vs. local) and describe learning strategies. Matlab implementations are available for download, accompanited with the orignal papers.

  • For details of what is distance metric learning and the related works, please refer to A comprehensive survey on distance metric learning (written in May, 2005) and An overview of distance metric learningnew!(written in Oct., 2007)
    • Supervised Distance Metric Learning can be divided into two categories: the global distance metric learning, and the local distance metric learning. The first one learns the distance metric in a global sense, i.e., to satisfy all the pairwise constraints simultaneously by keeping all of the data points in each class close together while ensuring that data points from different classes are separated. The second approach is to learn a distance metric in a local setting, i.e., rather than satisfying all of the pair-wise constraints simultaneously, only to satisfy "local" pairwise constraints. This is particularly useful for information retrieval and the KNN classifiers since both methods are influenced most by the data instances that are close to the test/query examples.


      Methods Locality Linearity Learning Strategies Code Download Publication
      Probablistic Global Distance Metric Learning (PGDM) global linear constrained convex programming by Eric P. Xing [pdf]
      Relevant Components Analysis (RCA) global linear capture global structure; use equivalence constraints by Aharon Bar-Hillel and Tomer Hertz, [pdf]
      Discriminative Component Analysis (DCA) global linear improve RCA by exploring negative constraints by Steven C.H. Hoi [pdf]
      Local Fisher Discriminant Analysis (LFDA) local linear extend LDA by assigning greater weights to closer connecting examples [by Masashi Sugiyama] [pdf]
      Neighborhood Component Analysis (NCA) local linear extend the nearest neighbor classifier toward metric learing [by Charless C. Fowlkes] [pdf]
      Large Margin NN Classifier (LMNN) local linear extend NCA through a maximum margin framework [by Kilian Q. Weinberger] [pdf]
      Localized Distance Metric Learning (LDM) local linear optimize local compactness and local separability in a probabilistic framework [by Liu Yang] [pdf]
      DistBoost global linear learn distance functions by training binary classifiers with margins in a boosting framework by Tomer Hertz and Aharon Bar-Hillel

      notes on calling its kernel version
      [pdf]

      Kernel DistBoost [pdf]
      Active Distance Metric Learning (BAYES+VAR) global linear select example pairs with the greatest uncertainty, posterior estimation with a full Bayesian treatment [by Liu Yang] [pdf]

    • Unsupervised Distance Metric Learning (manifold learning) can be categorized along the following two dimensions: first, the learnt embedding is linear or nonlinear; and second, the structure to be preserved is global or local. All the linear manifold learning methods except Multidimensional Scaling (MDS), learn an explicit linear projective mapping and can be interpreted as the problem of distance metric learning; and nonlinear manifold learning also has its essentially connections to distance metric learning.
      See The Connection Between Manifold Learning and Distance Metric Learningnew!(written in Oct., 2007)

      Methods Locality Linearity Learning Strategies Code Download Publication
      Principal Component Analysis(PCA) global structure preserved linear best preserve the variance of the data [by Deng Cai]
      Multidimensional Scaling(MDS) global structure preserved linear best preserve inter-point distance in low-rank [ included in Matlab Toolbox for Dimensionality Reduction]
      ISOMAP global structure preserved nonlinear preserve the geodesic distances [by J. B. Tenenbaum, V. de Silva and J. C. Langford] [pdf]
      Laplacian Eigenamp (LE) local structure preserved nonlinear preserve local neighbor [by Mikhail Belkin] [pdf]
      Locality Preserving Projections (LPP) local structure preserved linear linear approximation to LE [LPP by Deng Cai]

      [Kernel LPP by Deng Cai]
      [pdf]
      Locally Linear Embedding (LLE) local structure preserved nonlinear nonlinear preserve local neighbor [by Sam T. Roweis and Lawrence K. Saul]

      Hessian LLE can be found at [MANI fold Learning Matlab Demo, by Todd Wittman]
      [pdf]
      Neighborhood Preserving Embedding (NPE) lobal structure preserved linear linear approximation to LLE [by Deng Cai] [pdf]