Towards Stratification Learning through Local Homology Transfer
November 07, 2012
Manifold learning is a basic problem in geometry, topology, and statistical inference that has received a great deal of attention. The basic idea is as follows: given a point cloud of data sampled from a manifold in a k-dimensional ambient space, infer the underlying manifold. A limitation of the problem statement is that it does not apply to sets that are not manifolds. For example, we may consider the more general class of stratified spaces that can be decomposed into strata, which are manifolds of varying dimension, each of which fit together in some uniform way inside the higher dimensional space.

In this talk, we study the following problem in stratification learning: given a point cloud sampled from a stratified space, how do we cluster the points so that points in the same cluster are in the same stratum, while points in different clusters are not? Intuitively, the strategy should be clear: two points belong in the same stratum if they “look the same locally,” meaning that they have identical neighborhoods, within the larger space, at some very small scale. However, the notion of “local” becomes unclear in the context of sampling uncertainty, since everything becomes quite noisy at vanishingly small scale. In response, we introduce a radius parameter r and define a notion of local equivalence at each such r.

We propose an approach to stratification learning based on local homology inference; more specifically, based on inference of the kernels and co-kernels of several maps between groups closely related to the multi-scale local homology groups for different pairs of points in the sample. Ours results include: a topological definition of two points belonging to the same strata by assessing the multi-scale local structure of the points through kernel and cokernel persistent homology; topological conditions on the point sample under which the topological characterization holds – we call this topological inference; finite sample bounds for the minimum number of points in the sample required to state with high probability which points belong to the same strata; and an algorithm that computes which points belong to the same strata and a proof of correctness for some parts of this algorithm.

Joint work with Paul Bendich and Sayan Mukherjee.