Home  |  Education  |  Research  Publication  |  Honor  |  Misc.

 

 

Jingrui's Research

Research Interest

1)      Rare Category Detection

2)      Active Learning

3)      Semi-supervised Learning

4)      Spam Filtering

Research Experience

 

Current Work on Machine Learning

1)      Develops a new method for detecting instances from the minority classes via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. The effectiveness of the proposed method is proved both theoretically and in preliminary experiments.

 

2)      In the field of spam filtering, propose a new asymmetric boosting method, Boosting with Different Costs. Compared with traditional boosting methods, which assume the same cost for misclassified instances from different classes, our method is more generic, and is designed to be more suitable for problems where the major concern is a low false positive (or negative) rate. Experimental results on a large scale email spam data set demonstrate the superiority of our method over state-of the art techniques.

 

3)      Propose a new graph-based semi-supervised learning method. Different from previous graph-based methods that are based on discriminative models, our method is essentially a generative model in that the class conditional probabilities are estimated by graph propagation and the class priors are estimated by linear regression. Experimental results on various datasets show that the proposed method is superior to existing graph-based semi-supervised learning methods, especially when the labeled subset alone proves insufficient to estimate meaningful class priors.

 

Old Work on Machine Learning

1)      Propose a new variant of boosting algorithm, named W-Boost, which addresses the problem of over-fitting when training data is not sufficient to a certain extent. It is based on a novel weight update scheme and uses changeable bin number to estimate marginal distributions in weak learner design.

2)      Study and compare existing active learning methods used in Content-based Image Retrieval, and propose a novel method named mean version space active learning. The criterion of the proposed method incorporates both posterior probabilities and the size of the version space, while existing methods are only based on one of them.

 

Image Related Topics  

1)      (Image Retrieval) Propose a novel transductive learning framework named manifold-ranking based image retrieval (MRBIR). Furthermore, several schemes for incorporating negative feedback images and for selecting images in each round of relevance feedback are incorporated into the framework. In systematic experiments, MRBIR outperforms state of the art techniques.

 

2)      (Image Classification)Evaluate the performance of different classification algorithms in an image classification task (photo vs. graphic), e.g. SVM, AdaBoost, Real-AdaBoost, and incorporate the best one (Real-Adaboost) into a web image search engine developed by Microsoft Research Asia.

 

3)      (Image Symmetry Analysis) Propose an optimization-based approach for automatic peak number detection in repeated pattern analysis. Apply the theory of wallpaper groups to natural images and extract a novel feature to depict the symmetry property of natural images. The proposed symmetry feature outperforms several other texture features in image retrieval.