Research Topics

My research has centered on statistical learning methods/algorithms and application to very-large-scale text categorization, web-mining for concept graph discovery, semi-supervised clustering, multi-task learning, novelty-based information retrieval, large-scale optimization for online advertising, social network analysis for personalized email prioritization, etc. Her recent research focuses on the following topics:

       Graph-based Transdutive Learning from Heterogeneous Data Sources (Ongoing project under NSF Big Data )

o   Important problems in the big-data era involve predictions based on heterogeneous sources of information and the dependency structures in data. For example, in finding domain experts to review papers, we need to make inference not only based on the content of papers, but also based on various information about co-author networks, citation graphs, topical similarities among venues, etc.

o   We approach the challenging problem of multi-relational learning over heterogeneous graphs by projecting multi-source dependency structures onto a product graph (via a tensor or Cartesian operation over input graphs), and by simultaneously optimizing both the spectral transformation of the product graph and the semi-supervised label propagation over the graph product.

o   We have developed a new family of non-parametric optimization algorithms for large-scale graph-based learning and benchmark evaluations.

       Scalable Machine Learning for Time Series Analysis (On-going projects sponsored by Boeing, DOE and NSF)

o   Modeling aircraft sensor networks and maintenance reports with autoregressive models, recurrent neural networks and shapelet learning (Guoqing Zhe et al. KDD 2016 ) for anomaly detection and forecasting (On-going research in collaboration with Boeing)

o   Spatio-temporal pattern recognition and causal analysis over environmental data streams, e.g., modeling the long-term and short-term dependency structures over the sensor networks in a solar-energy farm, in response to the influences of cloud movements and wind directions/speeds (A new research project in collaboration with Brookhaven National Lab and sponsored by DOE)

o   Modeling multi-source and multi-scale evidence of dynamic chances in document streams (scientific literature or news stories) by developing a new family of Bayesian von Mices Fischer (vMF) models and large-scale inference algorithms ( On-going NSF project; Gopal, PhD Thesis; Gopal & Yang, ICML 2014 and Gopal & Yang, UAI 2014)


       Extreme Classification and Cross-language Transfer Classification

o   State-of-the-art classifiers in extremely large-scale classification over hundreds of thousands of categories with hierarchical or graphical dependency structures

o   Scalable variational inference for joint optimization of one trillion (4 TB) model parameters (Gopal & Yang, KDD 2013; Gopal & Yang, ICML 2013 & Supplementary ; Gopal et al., NIPS 2012 )

o   Cross-language text classification and cross-domain model adaptation from knowledge-rich languages (e.g., English) to low-resource and label-sparse languages (Ruochen Xu et al. CIKM 2016 ) (as a part of the LORILEI/ARIEL project under DARPA)

       Semi-structured Learning of Information Fusion for Events and Entities (joint effort with Prof. Jaime Carbonell In the DEFT project under DARPA)

o   Detecting entities and events of interest in various forms of mentions in text to enable high-precision information fusion and summarization, and jointly predict the semantic roles of entities in different types of events. Using a corporate acquisition event as an example, key information about the acquirer, price, date, approvals, joint-management, etc., needs to be extracted from text for slot filling in a predefined template, and the uncertainty of each system-predicted slot filler needs to be estimated for filler section and fusion.