Zhenzhen Kou

Yahoo! Search Sciences

2821 Mission College Blvd, Santa Clara, CA 95054

E-mail: zzkou AT yahoo-inc DOT com











I am now with Search Sciences Department at Yahoo! as a Relevance Scientist.

My current project is machine learning for ranking.

Research at CMU:

      Interests: Machine Learning, Information Extraction/Retrieval, data mining

      Advisor: William W. Cohen and Robert F. Murphy

      Minorthird: software for text learning, classification, extraction and annotations

      SLIF: Subcellular Location Image Finder

      CALO: Cognitive Assistant that Learns and Organizes


My thesis, stacked graphical learning, is a statistical learning model for collective inference over relational data. The most important feature of stacked graphical learning is that it is very efficient than the existing models and thus very competitive in applications. I have applied the idea of my thesis to document classifications, and named entity extraction. Also I have applied it to some inter-related subtasks in a complex information extraction system.



      Who Rated What

      I worked with Yan Liu to develop on a link prediction model for movie recommendation, which ranks 3rd(second runner-up) in the KDD Cup 07.

      Please check out our paper for details.

      Stacked Graphical Learning package in Minorthird

I designed and implemented the Stacked Graphical Learning package in minorthird for classification on relational dataset. Stacked Graphical Learning is an efficient and effective statistical model for collective classification.

Please find more about the model in our SDM07 paper. Here is a tutorial to the package.

      Protein name extractors

I developed several protein name extractors, including a protein name extractor trained with conditional random fields (CRFs) (download) and an extractor trained with dictionary hidden Markov models (Dictionary-HMM, download). Dictionary-HMM combines a dictionary with a Markov model to do soft match and extract names from free text.

Please find more details about the algorithm of Dictionary-HMM in our ISMB05 paper. Here is how to use the extractors.


I also did projects on Optical Character Recognition (bioKDD03), designed and implemented a web interface to an SQL database(KSCE-2004). Please check out our SLIF webpage.

      A tool for protein name annotation

I modified the labeling package in Minorthird and here is a labeling tool for protein name annotation. Please find the tutorial here on how to use the labeling tool.


      Curriculum Vitae [HTML]   


      Yan Liu, Zhenzhen Kou, Claudia Perlich and Richard Lawrence (2008): Intelligent System for Workforce Classification,  in KDD 2008 Workshop on Data Mining for Business Applications.

      Zhenzhen Kou, Vitor R. Carvalho and William W. Cohen (2007): Online Stacked Graphical Learning, to in NIPS 2007 Workshop on Efficient Machine Learning.

      Yan Liu and Zhenzhen Kou (2007): Predicting Who Rated What in Large-Scale Datasets, in Proceedings of KDD Cup and Workshop 2007

      Zhenzhen Kou and William W. Cohen (2007): Notes for Stacked Graphical Models for Effcient Inference in Markov Random Fields Technical Report: CMU-ML-07-101.

      Zhenzhen Kou and William W. Cohen (2007): Stacked Graphical Models for Effcient Inference in Markov Random Fields in SDM 07.

      Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2007): A Stacked Graphical Model for Associating Information from Text And Images In Figures in PSB07.

      Zhenzhen Kou, William W. Cohen & Robert F. Murphy (2005): High-Recall Protein Entity Recognition Using a Dictionary in ISMB-2005.

      R. Murphy, Z. Kou, J. Hua, M. Joffe, W. W. Cohen (2005): Extracting Structured Information from Text and Images in On-line Journal Articles for Localization Proteomics, in Biolink05.

      Robert F. Murphy, Zhenzhen Kou, Juchang Hua, Matthew Joffe, William W. Cohen (2004): Extracting and Structuring Subcellular Location Information from On-line Journal Articles: The Subcellular Location Image Finder in KSCE-2004.

      William W. Cohen, Zhenzhen Kou & Robert F. Murphy (2003): Extracting Information from Text and Images for Location Proteomics in BIOKDD 2003: 2-9.

      Zhenzhen Kou, Liang Ji and Xuegong Zhang(2001), Karyotyping of CGH human metaphase by using support vector machines, Cytometry, December 2001.

      Zhenzhen Kou, Jianhua Xu, Xuegong Zhang and Liang Ji(2001), An Improved Support Vector Machine Using Class-Median Vectors, in proceedings of 8th International Conference on Neural Information Processing, 2001, Shanghai, China, pp883-887.