Yi Zhang


I received my PhD degree in 2012 from Machine Learning Department, School of Computer Science, Carnegie Mellon University. I work with Jeff Schneider on learning with limited supervision by encoding input and output information. I've worked as a research intern on high-frequency trading and statistical arbitrage (at Citadel Investment Group), behavioral targeting and computational advertising (at Yahoo! Labs) and parallel machine learning on video analysis (at IBM T.J. Watson Research Center).


I am supported by the IBM PhD Fellowship (2009 - 2011). Thanks IBM!

I am supported by the Yahoo! Key Scientific Challenges Award (20 awardees worldwide in 2009). Thanks Yahoo!

My research has also been supported by the ICML student travel scholarship, NIPS travel award and SDM travel award. Thanks!


06/2011     Research intern at Citadel Investment Group, Chicago, IL.
02/2010     IBM PhD Fellowships for 2010-2011.
06/2009    Research intern at IBM T. J. Watson Research Center, Hawthorne, NY.
04/2009    Yahoo! Key Scientific Challenges Awards.
02/2009     IBM PhD Fellowships for 2009-2010.
06/2008     Research intern at Yahoo Labs, Sunnyvale, CA.


09/2007 -- 05/2012         Machine Learning Department, Carnegie Mellon University
GPA: 4.18/4.00
09/2005 -- 09/2006 Department of Computer Science and Technology, Tsinghua University
GPA: 97.2/100 (Rank: 1/84 in the department)
09/2001 -- 06/2005 School of Software, Tsinghua University
Bachelor of Engineering (with the highest honor) in Computer Software
GPA: 89.2/100 (Rank: 1/51 in the school)

Graduate Courses

10-701 Machine Learning: A+     (Instructor: Carlos Guestrin)
10-702 Statistical Machine Learning: A+     (Instructor: Larry Wasserman)
10-725 Optimization: A+     (Instructor: Geoffrey Gordon and Carlos Guestrin)
10-708 Probabilistic Graphical Models: A+     (Instructor: Carlos Guestrin)
15-826 Multimedia Databases and Data Mining: A+     (Instructor: Christos Faloutsos)
46-929 Financial Time Series Analysis: A+     (Instructor: Anthony Brockwell)
10-705 Intermediate Statistics: A     (Instructor: Matthew Harrison)
36-724 Applied Bayesian Methods: A     (Instructor: Surya Tokdar)
15-853 Algorithms in the Real World: A     (Instructor: Guy Blelloch and Daniel Golovin)
47-811 Econometrics I: A     (Instructor: Fallaw Sowell)
10-709 Advanced Statistical NLP (Read the Web): A     (Instructor: Tom Mitchell)

Industry Experience

06/2011 - 08/2011: Citadel Investment Group  Quantitative Research, Statistical Arbitrage and High-Frequency Trading
06/2009 - 08/2009: IBM T.J. Watson Research Center  Parallel Machine Learning for Multimedia Analysis
06/2008 - 08/2008: Yahoo! Labs   Behavioral Targeting and Computational Advertising

Research Interests

1) Learning with limited supervision by encoding input and output information.

2) Web mining: web information extraction, computational advertising and behavioral targeting.

3) Parallel machine learning and Hadoop.

Publications (Organized by Topics)

1. Learning with Limited Supervision by Encoding Input and Output Information

Yi Zhang and Jeff Schneider. Maximum Margin Output Coding, ICML 2012. (pdf) (code)

Yi Zhang and Jeff Schneider. A Composite Likelihood View for Multi-Label Classification, AISTATS 2012. (pdf)

Yi Zhang and Jeff Schneider. Multi-label Output Codes using Canonical Correlation Analysis, AISTATS 2011. (pdf) (code)

Yi Zhang and Jeff Schneider. Learning Multiple Tasks with a Sparse Matrix-Normal Penalty, NIPS 2010. (pdf)

Yi Zhang and Jeff Schneider. Projection Penalty: Dimension Reduction without Loss, ICML 2010. (pdf)

Yi Zhang. Multi-Task Active Learning with Output Constraints, AAAI 2010. (pdf)

Yi Zhang, Jeff Schneider and Artur Dubrawski. Learning Compressible Models. 2010 SIAM International Conference on Data Mining, SDM 2010. (pdf)

Yi Zhang. Smart PCA. The 21th International Joint Conference on Artificial Intelligence, IJCAI 2009. (pdf)

Yi Zhang, Jeff Schneider and Artur Dubrawski. Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text, NIPS 2008. (pdf)

2. Mining Streaming Data

Yi Zhang and Xiaoming Jin. Concept Sampling: Towards Systematic Selection in Large-Scale Mixed Concepts in Machine Learning, IJCAI 2007. (pdf)

Yi Zhang and Xiaoming Jin. An Automatic Construction and Organization Strategy for Ensemble Learning on Data Streams. SIGMOD Record, Vol. 35, No. 3, 2006. (pdf)

Yi Zhang and Xiaoming Jin. Classifying Data Streams by Training Data Combination. The 1st China Symposium on Classification and Applications, 2005.

3. Computational Biology: Analyzing Time-Series Gene Expression Data

Yi Zhang and Zhidong Deng. Identifying Biological Pathways via Phase Decomposition and Profile Extraction. Computational Systems Bioinformatics, 2006 (pdf)

4. Recurrent Neural Networks for Modeling Nonlinear Dynamics

Zhidong Deng and Yi Zhang. Collective Behavior of a Small-World Recurrent Neural System with Scale-Free Distribution. IEEE Transactions on Neural Networks, Vol. 18, Issue 5, 2007 (pdf)

Zhidong Deng and Yi Zhang. Complex Systems Modeling Using Scale-Free Highly-Clustered Echo State Network. IJCNN 2006. (pdf)


----- Where Am I? -----

Contact Info

yizhang1 at cs.cmu.edu

8008 GHC
Carnegie Mellon University