|

Statistical Machine Learning, emphasizing theory and
algorithms for learning complex probabilistic models, learning with
prior knowledge, and reasoning under uncertainty.
Current Projects:
- Bayesian statistics,
nonparametric Bayesian analysis, algorithms and applications of
Bayesian nonparametrics in data mining
In
this project we develop nonparametric and semiparametric Bayesian
models (based on the Dirichlet process and extensions, sometimes known
as the generalized Polya urn schemes) for analyzing time series data,
hierarchical data, and other complex inputs with uncertain internal
structure, which arise from temporal text mining (e.g., emails, news
streams), object tracking (e.g., video surveillance, navigation and
control) and biological data analysis. We develope formal probabilistic
formalisms, sampling and variational inference algorithms, and also
address theoretical issues such as consistence, bounds and convergence
of our models and algorithms.
- Statistical Models and
Algorithms of Networks and Relational Data (in collaboration
with Stephen
Fienberg)
In
this project we develop probabilistic genetative models for the
formation, growth, evolution, and dynamics of networks and relational
data
in general, and inference/learning algorithms for node labeling, link
prediction, latent theme extraction, etc., for network and relational
data. We
also work on theoretical issues, such as bounds, complexity, related to
our models and algorithms, and applications to social networks and
biological networks.
- Semi-unsupervised and unsupervised learning of
distance metrics

In
this project we develop algorithms and theories for learning proper
distance metrics underlying complex high-dimensional data based on weak
auxiliary information regarding data distribution, similarity,
continuity, conductivity, etc. We will explore techniques such as
probabilistic modeling, dimensionality reduction, spectral graph
analysis, kernel methods, and various optimization approaches; and we
will apply our results to pattern recognition, classification, and
clustering problems.
- Variational inference/learning theory and
development of turn-key approximate inference engines

In
this project we develop algorithms and theories of variational
approximations for probabilistic inference on large-scale
directed/undirected graphical models and chain graphs, and
methodologies for structure and parameter estimations for such model.
The goal is to develop fully autonomous, distributed, turnkey software
based on variational and sampling
techniques for reasoning and learning under uncertainty for generic
intelligence systems.
- Applications of
probabilistic graphical models in Computational Biology, IR, NLP,
Multimedia
and Control
(in collaboration with many faculty and students at CMU and other
universities).
We
design various task-specific generative, discriminative, and hybrid
graphical models and algorithms for various biological and genetic
problems (see bellow), for NLP problems such as statistical machine
translation, for comprehending and categorizing text corpus,
for segmenting, tracking and interpreting video and caption streams
from various sources (e.g., surveillance system, robots), and for
decision making and active learning in dynamic environments.

Computational Biology, with an
emphasis on developing formal models and algorithms that address
problems of practical
biological and medical concerns.
Current
Projects:
- Probabilistic
evolutionary models of cis-regulatory
models in Drosophila (in collaboration
with Martin
Kreitman).
In
this project we study the evolutionary relationships reflected in
the
sequence, ordering, position, spacing and function of the regulatory
motifs controlling body segmentation during early embryogenesis in 15
species of the Drosophila. We are interested in understanding the
biological driving forces, molecular mechanisms and functional
implications of motif evolution in general from this biological model,
and in developing comparative genomic algorithms for motif finding from
unaligned non-coding sequences.
- Nonparametric Bayesian
models for genetic variations and their associations to diseases and
genetic demography (in
collaboration with various faculty in UPMC
and U of Chicago)
In
this project we develop nonparametric Bayesian models and computational
algorithms for uncovering the chromosomal association (i.e.,
haplotypes), population distribution (i.e., diversity and frequency)
inheritance process (i.e., recombination/substitution) and phenotypic
association (i.e., linkage) of genetics polymorphisms such as SNPs to address problems such as disease-gene
discovery, chromosomal evolution and genetic demography.
- Computational systems
biology of genome-microenvironment interactions in breast cancer
(in collaboratio
n with Mina
Bissell)
In
this project we analyze the
molecular abundance profiles (e.g., microarray,
CGH, ChIp-ChIp)
measured in a “designer microenvironment,” realized in 3D culture model
that imitates the in vivo cellular context and
dynamics of cancer progression, reversion and apoptosis. We will
develop algorithms to identify molecular determinants and markers of
cancer states and categorize cancers on the basis of signaling pathway
characteristics. Using probabilistic graphical modeling approaches, we
hope to infer stochastic network models for transcriptional regulation
in response to combinations of signaling inhibitions in cancer cells.
- Biological sequence
analysis: motif detection, gene finding and systems biology

In this project we develop models and algorithms for
understanding and
uncovering the structure of genomic sequences of higher organisms. We
develop
Bayesian models for DNA/protein motif detection and gene finding based
on both sequence-level signatures and meta-sequence-level structural
information reflecting protein-DNA binding, transcript stability, and
prior knowledge of the organization rules of regulatory modules. We intend to integrate motif finding with the
system biology research of gene regulatory network.
Back to top
|