New Paradigms and Global Optimality in Non-Convex Optimization
November 29, 2017

Non-convex optimization is ubiquitous in various areas, ranging from machine learning, signal processing, to statistics and theoretical computer science. Though non-convex optimization typically enjoys better sample complexity guarantees compared with its convex counterpart, it usually suffers from computational issues: local searching algorithms, such as gradient descent or stochastic gradient descent, only converge to the saddle points or the local minima, rather than the global optimality that we desire. There are three main approaches to address the issues: 1. having a good initialization; 2. carefully solving a sequence of convex problems; 3. studying the nice properties of loss landscape, such as strong duality or “local minima=global minima”.

In this talk, we will focus on the latter two approaches, which are new paradigms of global optimality in the non-convex optimization. In the first part of the talk, we will see how we can get arbitrarily close to the global optimality of non-convex problems for learning of halfspaces and 1-bit compressed sensing. Via the localization technique, we solve the expected 0-1 loss minimization problem by solving a sequence of convex problems. We also supplement our positive results with a hardness result, showing that any one-shot minimization does not work in this setting.

In the second part of the talk, we study the strong duality of matrix factorization problem. Strong duality is well-studied for the convex optimization, but little was known about the non-convex community. We show that strong duality holds for the matrix completion and its related problems with nearly optimal sample complexity. For the hardness result, by assuming the hardness of 4-SAT, we also show that generic matrix factorization requires at least 2^Omega(n) time to be solved.

Based on joint with Pranjal Awasthi (Rutgers), Nina Balcan (CMU), Nika Haghtalab (CMU), Yingyu Liang (Wisconsin-Madison), and David P. Woodruff (CMU).

References:
http://www.cs.cmu.edu/~hongyanz/publications/COLT_Learning.pdf
http://www.cs.cmu.edu/~hongyanz/publications/ITCS_Matrix.pdf