Inderjit Dhillon
University of Texas at Austin

Fast and Memory-efficient Low Rank Approximation of Massive Graphs

Abstract: Social network analysis requires us to perform a variety of analysis tasks, including summarization and prediction, on massive graphs. In this talk, I will present a fast and memory-efficient procedure called clustered low rank matrix approximation for massive graphs. The procedure involves a fast clustering of the graph followed by approximation of each cluster separately using existing methods, e.g. the singular value decomposition, or stochastic algorithms. The clusterwise approximations are then extended to approximate the entire graph. This approach has several benefits: (1) important structure of the graph is preserved due to the clustering; (2) accurate low rank approximations are achieved; (3) the procedure is efficient both in terms of computational speed and memory usage. Further, we generalize stochastic algorithms into the clustered low rank approximation framework and present theoretical bounds for the approximation error. Finally, a set of experiments shows that our methods outperform existing dimensionality reduction algorithms on massive social networks.

Rodolphe Jenatton
INRIA / Ecole Normale Supérieure

Local Analysis of Sparse Coding in the Presence of Noise

Abstract: A popular approach within the signal processing and machine learning communities consists in modelling signals as sparse linear combinations of atoms selected from a learned dictionary. While this paradigm has led to numerous empirical successes in various fields ranging from image to audio processing, there have only been a few theoretical arguments supporting these evidences. In particular, sparse coding, or sparse dictionary learning, relies on a non-convex procedure whose local minima have not been fully analyzed yet. In this paper, we consider a probabilistic model of sparse signals, and show that, with high probability, sparse coding admits a local minimum around the reference dictionary generating the signals. Our study takes into account the case of over complete dictionaries and noisy signals, thus extending previous work limited to noiseless settings and/or under-complete dictionaries. The analysis we conduct is non-asymptotic and makes it possible to understand how the key quantities of the problem, such as the coherence or the level of noise, are allowed to scale with respect to the dimension of the signals, the number of atoms, the sparsity and the number of observations.

Yi Ma
University of Illinois at Urbana-Champaign

TILT: For Transform Invariant Low-Rank Structures in Images

Abstract: In this talk, we will introduce a fundamental computational tool, namely TILT, for extracting rich low-rank structures in images and videos, respectively. TILT utilizes the same transformed Robust PCA model for the visual data: D \circ T = A + E, and exploit modern high-dimensional convex optimization to extract the low-rank structures A from the visual data D, despite image domain transformation T and sparse corruptions E. We will show how this seemingly simple tool can help unleash tremendous information in images and videos that we used to struggle to get. We believe these new tools will bring disruptive changes to many challenging tasks in computer vision and image processing, including feature extraction, image correspondence or alignment, camera calibration 3D reconstruction, and object recognition, etc.

This is joint work with John Wright of Columbia, Emmanuel Candes of Stanford, Zhouchen Lin of MSRA, and my students Zhengdong Zhang, Xiao Liang of Tsinghua University, and Arvind Ganesh of UIUC.

Gabriel Peyré
CNRS, CEREMADE, Université Paris-Dauphine

Robust Sparse Analysis Regularization

Abstract: In this talk I will detail several key properties of L1-analysis regularization for the resolution of linear inverse problems. Most previous theoretical works consider sparse synthesis priors where the sparsity is measured as the norm of the coefficients that synthesize the signal in a given dictionary. In contrast, the more general analysis regularization minimizes the L1 norm of the correlations between the signal and the atoms in the dictionary. The corresponding variational problem includes several well-known regularizations such as the discrete total variation, the fused lasso and sparse correlation with translation invariant wavelets. I will first study the variations of the solution with respect to the observations and the regularization parameter, which enables the computation of the degrees of freedom estimator. I will then give a sufficient condition to ensure that a signal is the unique solution of the analysis regularization when there is no noise in the observations. The same criterion ensures the robustness of the sparse analysis solution to a small noise in the observations. Lastly I will define a stronger condition that ensures robustness to an arbitrary bounded noise. In the special case of synthesis regularization, our contributions recover already known results, that are hence generalized to the analysis setting. I will illustrate these theoretical results on practical examples to study the robustness of the total variation, fused lasso and translation invariant wavelets regularizations. (This is joint work with S. Vaiter, C. Dossal, J. Fadili)

Martin Wainwright
University of California at Berkeley

Computation meets Statistics: Fast global convergence for high-dimensional statistical recovery

Abstract: Many statistical estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. Particular examples include $\ell_1$-based methods for sparse vectors, nuclear norm for low-rank matrices, and various combinations thereof for matrix decomposition. In this talk, we describe an interesting connection between computational and statistical efficiency, in particular showing that the same conditions that guarantee that an estimator has low statistical error can also be used to certify fast convergence of first-order optimization methods up to statistical precision.

Joint work with Alekh Agarwal and Sahand Negahban, UC Berkeley (paper link).

David Wipf
University of California at San Diego

Dictionary-Dependent Penalties for Sparse Estimation and Rank Minimization

Abstract: In the majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit L2 norm, incoherent columns or features. But these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of finding, to the extent possible, regimes of dictionary invariant performance. In particular, a class of non-convex, Bayesian-inspired estimators with dictionary-dependent sparsity penalties is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the L1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. Moreover, the underlying methodology naturally extends to related rank minimization problems.