Tuesday, May 01, 2018. 12:00PM. NSH 3305.

Back to Seminar Schedule

Hongyang Zhang -- Testing and Learning from Big Data, Optimally

Abstract: We are now in an era of big data as well as high-dimensional data: data volume is almost doubling every two years. Fortunately, high-dimensional data are structured. Usually, they are of low rank. This is the basis of dimensionality reduction and compressed sensing. To extract efficient information from the low-rank structure without fully observing the matrix, there are two questions to handle: 1. what is the true rank of the data matrix with only small samples (testing problem)? 2. given the rank of the matrix, how to design computationally efficient algorithm to recover the matrix with only compressed observations (learning problem)?

In this talk, we will focus on the testing and learning problems regarding the matrix rank with optimal sample complexity, which are new paradigms of information extraction from the big data. In the first part of the talk, we will see how we can test the rank of an unknown matrix via an interesting ladder-shaped sampling scheme. We also supplement our positive results with a hardness result, showing that our sampling scheme is near-optimal.

In the second part of the talk, we study the matrix completion problem. Matrix completion is known as a non-convex problem in its most original form. To alleviate the computational issue, we show that strong duality holds for the matrix completion with nearly optimal sample complexity. For the hardness result, we also show that generic matrix factorization requires exponential time to be solved.

Based on joint work with Nina Balcan (CMU), Yi Li (Nanyang Technological University), Yingyu Liang (Wisconsin-Madison), and David P. Woodruff (CMU).