Simon

Simon Shaolei Du
杜少雷

Office: GHC 8005

Email: ssdu [at] cs (dot) cmu (dot) edu

Social Media: LinkedIn Facebook 知乎 WeChat

I am a fourth year PhD student in Machine Learning Department at Carnegie Mellon University, co-advised by Aarti Singh and Barnabás Póczos and I am visiting Simons Institute this semester. Previously, I studied EECS and EMS at UC Berkeley where I worked with Ming Gu, Lei Li, Michael Mahoney and Stuart Russell. I also spent a semester at Tsinghua University.

My research interests are broadly in machine learning and statistics. My current research focus on:

  • Deep learning theory,
  • Exploration in reinforcement learning.
  • Education

    Professional Experiences

    Publications


      *: indicating equal contribution or alphabetic ordering.

      Preprints

    1. Gradient Descent Provably Optimizes Over-parameterized Neural Networks,
      Simon S. Du*, Xiyu Zhai*, Barnabás Póczos, Aarti Singh.
      [PDF] [arXiv]
    2. Robust Nonparametric Regression under Huber's ε-contamination Model,
      Simon S. Du, Yining Wang, Sivaraman Balakrishnan, Pradeep Ravikumar, Aarti Singh.
      [PDF] [arXiv]
    3. Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps,
      Simon S. Du*, Surbhi Goel*.
      [PDF] [arXiv]
    4. Near-Linear Time Local Polynomial Nonparametric Estimation,
      Yining Wang, Yi Wu, Simon S. Du.
      [PDF] [arXiv]
    5. Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity,
      Simon S. Du*, Wei Hu*.
      [PDF] [arXiv]
    6. Conference Papers

    7. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced,
      Simon S. Du*, Wei Hu*, Jason D. Lee*.
      To appear in Conference on Neural Information Processing Systems (NIPS) 2018.
      Best Paper Award at ICML 2018 Workshop on Nonconvex Optimization
      [PDF] [arXiv]
    8. How Many Samples are Needed to Learn a Convolutional Neural Network?,
      Simon S. Du*, Yining Wang*, Xiyu Zhai, Sivaraman Balakrishnan, Ruslan Salakhutdinov, Aarti Singh.
      To appear in Conference on Neural Information Processing Systems (NIPS) 2018.
      [PDF] [arXiv]
    9. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima,
      Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabás Póczos, Aarti Singh.
      International Conference of Machine Learning (ICML) 2018.
      [PDF] [arXiv]
    10. On the Power of Over-parametrization in Neural Networks with Quadratic Activation,
      Simon S. Du, Jason D. Lee.
      International Conference of Machine Learning (ICML) 2018.
      [PDF] [arXiv]
    11. Fast and Sample Efficient Inductive Matrix Completion via Multi-Phase Procrustes Flow,
      Xiao Zhang*, Simon S. Du*, Quanquan Gu.
      International Conference of Machine Learning (ICML) 2018.
      [PDF] [arXiv]
    12. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms,
      Yi Wu, Siddharth Srivastava, Nick Hay, Simon S. Du, Stuart Russell.
      International Conference of Machine Learning (ICML) 2018.
      [PDF] [arXiv]
    13. When is a Convolutional Filter Easy to Learn?
      Simon S. Du, Jason D. Lee, Yuandong Tian.
      International Conference on Learning Representations (ICLR) 2018.
      [PDF] [arXiv]
    14. Stochastic Zeroth-order Optimization in High Dimensions,
      Yining Wang, Simon S. Du, Sivaraman Balakrishnan, Aarti Singh.
      International Conference on Artificial Intelligence and Statistics (AISTATS) 2018 (Oral).
      [PDF] [arXiv]
    15. Gradient Descent Can Take Exponential Time to Escape Saddle Points,
      Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabás Póczos, Aarti Singh,
      Conference on Neural Information Processing Systems (NIPS) 2017 (Spotlight).
      [PDF] [arXiv]
    16. On the Power of Truncated SVD for General High-rank Matrix Estimation Problems,
      Simon S. Du, Yining Wang, Aarti Singh,
      Conference on Neural Information Processing Systems (NIPS) 2017.
      [PDF] [arXiv]
    17. Hypothesis Transfer Learning via Transformation Functions,
      Simon S. Du, Jayanth Koushik, Aarti Singh, Barnabás Póczos,
      Conference on Neural Information Processing Systems (NIPS) 2017.
      [PDF] [arXiv] [Poster]
    18. High-throughput Robotic Phenotyping of Energy Sorghum Crops,
      Srinivasan Vijayarangan, Paloma Sodhi, Prathamesh Kini, James Bourne, Simon S. Du, Hanqi Sun, Barnabás Póczos, Dimitrios Apostolopoulos, and David Wettergreen,
      Conference on Field and Service Robotics (FSR) 2017.
      [PDF]
    19. Stochastic Variance Reduction Methods for Policy Evaluation,
      Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou,
      International Conference of Machine Learning (ICML) 2017.
      [PDF] [arXiv] [My Talk at ICML][Lihong's Talk at Simons Institute] [Poster]
    20. Computationally Efficient Robust Estimation of Sparse Functionals,
      Simon S. Du, Sivaraman Balakrishnan, Aarti Singh,
      Conference of Learning Theory (COLT) 2017.
      [PDF] [arXiv] [Slides] [Poster] [Talk]
      Merged with this paper
    21. Efficient Nonparametric Smoothness Estimation,
      Shashank Singh, Simon S. Du, Barnabás Póczos,
      Conference on Neural Information Processing Systems (NIPS) 2016.
      [PDF] [arXiv]
    22. An Improved Gap-Dependency Analysis of the Noisy Power Method,
      Maria-Florina Balcan*, Simon S. Du*, Yining Wang*, Adams Wei Yu*,
      Conference of Learning Theory (COLT) 2016.
      [PDF] [arXiv] [Slides] [Talk]
    23. Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nystrom Method,
      David G. Anderson*, Simon S. Du*, Michael W. Mahoney*, Christopher Melgaard*, Kunming Wu*, Ming Gu*,
      International Conference on Artificial Intelligence and Statistics (AISTATS) 2015.
      [PDF] [Supplement] [Code]
    24. Workshop Papers

    25. Novel Quantization Strategies for Linear Prediction with Guarantees,
      Simon S. Du*, Yichong Xu*, Yuan Li, Hongyang Zhang, Aarti Singh, Pulkit Grover,
      International Conference of Machine Learning (ICML) 2016, On Device Intelligence (ONDI) workshop.
      [PDF] [Slides]
    26. Maxios: Large Scale Nonnegative Matrix Factorization for Collaborative Filtering,
      Simon S. Du, Yilin Liu, Boyi Chen, Lei Li,
      Conference on Neural Information Processing Systems (NIPS) 2014, workshop on Distributed Machine Learning and Matrix Computations.
      [PDF] [Poster]

    Talks