Liang Xiong

PhD Student

Machine Learning Department & Auton Lab
Carnegie Mellon University


8005 Gates Hillman Complex
Machine Learning Department
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213


Phone: 1-412-818-8754




Jeff Schneider, Associate Research Professor, Auton Lab & Robotics Institute, Carnegie Mellon University.


  • 2008.9 - present                        Carnegie Mellon University, Pittsburgh, PA
    PhD Student, GPA 4.08
    Machine Learning Department
  • 2005.9 - 2008.7                          Tsinghua University, Beijing, China
    Master of Engineering
    Major: Pattern Recognition and Intelligent Systems
  • 2001.9 - 2005.7                          Tsinghua University, Beijing, China
    Bachelor of Engineering, Outstanding Graduate
    Major: Control Science and Engineering
    Thesis: Personalized Synthesis of Hand-written Chinese Characters


  • 2012.6 - 2012.9                            Yahoo! Labs
    Worked on search engine log analysis and query rewriting for movie search for the Media Sciences and Search Sciences teams.
  • 2009.6 - 2009.8                            Google Inc. Santa Monica
    Worked on Traffic Estimation for the AdWords team.
  • 2007.3 - 2007.9                            Intel China Research Center
    Worked on computer vision and multimedia mining.

Awards and Honors

  • Graduate Fellowship, Carnegie Mellon University, 2008 - present.
  • Travel awards from several academic conferences, 2007 - present.
  • JiangZhen scholarship, Tsinghua University, 2007. (Top 1%)
  • Outstanding Graduate of Tsinghua University, 2005. (Top 10%)
  • National scholarship, Tsinghua University, 2004. (Top 5%)
  • Outstanding Freshman scholarship, Tsinghua University, 2001.

Research Interest

My focus is on how to learn from collective data i.e. data that are organized by groups. For example, an image is a group of local patches, a video is a set of images, an article is a group of paragraphs, a search result is a group of links...

Particularly, I am trying to help the scientists discover interesting phenomena from the huge amount of data they have in e.g. astronomy and physics.


  • Learning from collective data (Thesis), Carnegie Mellon University

    Developing general machine learning techniques for data that are organized by groups. This theme of research unifies several of my previous research topics and seems quite useful and exciting. Applicable problems include the processing of images, text, social network, recommendation/rating, astronomy, and physics data.

    Our standpoint is that the collective nature of these data should be respected, and we should not reduce a group into a point/vector for no good reason. We approach this problem by either capturing the generative process of groups using hierarchical models, or measuring the similarity between groups directly. Now we can to do classification, clustering, embedding, anomaly detection on collective data.

  • Novelty discovery for astronomy and physics, Carnegie Mellon University

    Developing algorithms to automatically discover unusual and potentially valuable phenomena. In the Sloan Digital Sky Survey (SDSS), the algorithms can discover both interesting individual objects (stars, galaxies, etc) and groups of objects (galaxy clusters, etc). Similar techniques are also used to detect unusual things from large-scale simulation systems in physics (e.g. fluid and particle simulation). Collaborated with University of Washington, and John Hopkins University.

  • Query Rewriting for Movie Search, 2012.6 - 2012.9, Yahoo! Labs

    Developed algorithms to enhance Yahoo!'s movie search backend. The result is that we can replace users' obscure/indirect queries with new ones that can trigger the correct results from the existing blackbox backend. This is achieved by analyzing the search engine log, and learning to find and rank potential replacement queries. Evaluations show that this work can drastically increase the recall of the system without sacrificing its precision. Awaiting deployment into production.

  • Protein classification using cell images, 2010.10 - 2012.12, Carnegie Mellon University

    Studied the problem of classifying proteins' location pattern based on the cell images from the Human Protein Atlas. Surpassed the state of the art accuracy. Collaborated with the Biomedical Department.

  • Internet Ads traffic estimation, 2009.6 - 2009.8, Google Inc.

    As an intern, I worked in the AdWords team on developing the new simulation-based traffic estimation backend "Nostradamo". My main responsibility is to get the predicted click-through rate given the ads and the search query. To do that, I studied the details of the advertising mechanism and the prediction algorithms, processed massive log data, interfaced various internal services for feature/signal extraction, and finally communicated with the prediction service to get the results.

  • Sales prediction, 2008.9 - 2009.8, Carnegie Mellon University

    I worked on the sales prediction problem. The problem is how to predict future orders based on existing sales data. This research mainly involves collaborative filtering (recommendation system) to tackle the lack-of-feature problem, and temporal analysis to accommodate market changes. Collaborated with ECCO.

  • Railroad image synthesis, 2007.6 - 2008.6, Tsinghua University

    As the leader and coordinator, I worked on developing an image synthesis system that were used to test a hazard detection system for railroad. Collaborated with Mitsubishi Heavy Industries.

  • Contextual visual learning, 2007.3 - 2007.9, Intel China Research Center

    As an intern, I worked on utilizing contextual information in vision problems. Studied problems include: scene analysis, contextual probing, context-aided detection, and part-based models.

  • Robot vision and learning system, 2006.1 - 2007.4, Tsinghua University

    Designed and developed the vision and learning system for a cognitive robot. This robot uses a camera to automatically detect strangers, and learn their identities from a tutor. Then, it can recognize the previouly seen people and greet them. As it saw more and more people, its recognition ability improved over time.

  • Hand-written character synthesis, 2005.2 - 2005.7, Tsinghua University

    Co-developed a system that generates hand-written Chinese characters whose style is learned from images of a particular person's hand writing. In charge of designing the learning process and implementing the synthesis module.


Book Chapters

  1. Liang Xiong, Fei Wang and Changshui Zhang, Guide Manifold Alignment by Relative Comparisons, In: J. Wang ed. Encyclopedia of Data Warehousing and Mining 2nd Edition, Hershey, PA: IGI Publishing.

Journal Papers

  1. Scott F. Daniel, Andrew Connolly, Jeff Schneider, Jake Vanderplas, Liang Xiong, Classification of Stellar Spectra with LLE, Astronomical Journal, 142, 203.

Conference Papers

  1. Liang Xiong, Jieyue Li, Robert F. Murphy, Jeff Schneider, Protein Subcellular Location Pattern Classification in Cellular Images Using Latent Discriminative Models, Intelligent Systems for Molecular Biology (ISMB), 2012.
  2. Barnabas Poczos, Liang Xiong, Dougal Sutherland, Jeff Schneider, Nonparametric Kernel Estimators for Image Classification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  3. Liang Xiong, Xi Chen, Jeff Schneider, Direct Robust Matrix Factorization for Anomaly Detection, IEEE International Conference on Data Mining (ICDM), 2011.
  4. Liang Xiong, Barnabas Poczos, Jeff Schneider, Group Anomaly Detection using Flexible Genre Models, Neural Information Processing Systems (NIPS), 2011.
  5. Barnabas Poczos, Liang Xiong, Jeff Schneider, Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions, Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), 2011.
  6. Liang Xiong, Barnabas Poczos, Jeff Schneider, Hierarchical Probabilistic Models for Group Anomaly Detection, AI and Statistics (AISTATS), 2011.
  7. Liang Xiong, Xi Chen, Tzu-kuo Huang, Jeff Schneider, and Jaime Carbonell, Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization, SIAM Data Mining (SDM), 2010.
  8. Liang Xiong, Fei Wang and Changshui Zhang, Multilevel Belief Propagation for Fast Inference on Markov Random Fields, IEEE International Conference on Data Mining (ICDM), 2007.
  9. Liang Xiong, Fei Wang and Changshui Zhang, Semi-definite Manifold Alignment, European Conference on Machine Learning (ECML) 2007.
  10. Liang Xiong, Jianguo Lee and Changshui Zhang, Discriminant Additive Tangent Space for Object Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2007.


  • Intermediate Statistics, 2011
  • Machine Learning, 2012


  • Journal of Machine Learning Research
  • Pattern Recognition
  • IEEE Trans on System, Man, and Cybernatics (SMCB)
  • Journal of Information Retrieval
  • Journal of Data Mining and Knowledge Discovery
  • Neural Computing
  • International Conference on Machine Learning (ICML)
  • International Joint Conference on Artificial Intelligence (IJCAI)
  • IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • International Conference on Latent Variable Analysis and Source Separation