|
Research Interest
|
1)
Rare Category Detection
|
|
2)
Active Learning
|
|
3)
Semi-supervised Learning
|
|
4)
Spam Filtering
|
|
Research Experience
|
|
|
Current Work on Machine Learning
|
1)
Develops a new method for detecting instances from the minority
classes via an unsupervised local-density-differential sampling strategy.
Essentially a variable-scale nearest neighbor process is used to optimize
the probability of sampling tightly-grouped minority classes, subject to a
local smoothness assumption of the majority class. The effectiveness of the
proposed method is proved both theoretically and in preliminary
experiments.
|
|
2)
In the field of spam filtering, propose a new asymmetric
boosting method, Boosting with Different Costs. Compared with traditional
boosting methods, which assume the same cost for misclassified instances
from different classes, our method is more generic, and is designed to be
more suitable for problems where the major concern is a low false positive
(or negative) rate. Experimental results on a large scale email spam data
set demonstrate the superiority of our method over state-of the art
techniques.
|
|
3)
Propose a new graph-based semi-supervised learning method. Different
from previous graph-based methods that are based on discriminative models,
our method is essentially a generative model in that the class conditional
probabilities are estimated by graph propagation and the class priors are
estimated by linear regression. Experimental results on various datasets
show that the proposed method is superior to existing graph-based
semi-supervised learning methods, especially when the labeled subset alone
proves insufficient to estimate meaningful class priors.
|
|
Old Work on Machine Learning
|
1)
Propose a new variant of boosting algorithm, named W-Boost,
which addresses the problem of over-fitting when training data is not
sufficient to a certain extent. It is based on a novel weight update scheme
and uses changeable bin number to estimate marginal distributions in weak
learner design.
|
|
2)
Study and compare existing active learning methods used in
Content-based Image Retrieval, and propose a novel method named mean
version space active learning. The criterion of the proposed method incorporates
both posterior probabilities and the size of the version space, while
existing methods are only based on one of them.
|
|
Image Related Topics
|
1)
(Image Retrieval) Propose a novel transductive
learning framework named manifold-ranking based image retrieval (MRBIR).
Furthermore, several schemes for incorporating negative feedback images and
for selecting images in each round of relevance feedback are incorporated
into the framework. In systematic experiments, MRBIR outperforms state of
the art techniques.
|
|
2)
(Image Classification)Evaluate the performance of different
classification algorithms in an image classification task (photo vs.
graphic), e.g. SVM, AdaBoost, Real-AdaBoost, and incorporate the best one
(Real-Adaboost) into a web image search engine developed by Microsoft
Research Asia.
|
|
3)
(Image Symmetry Analysis) Propose
an optimization-based approach for automatic peak number detection in
repeated pattern analysis. Apply the theory of wallpaper groups to natural
images and extract a novel feature to depict the symmetry property of
natural images. The proposed symmetry feature outperforms several other
texture features in image retrieval.
|