My research has centered on statistical learning methods/algorithms and application to very-large-scale text categorization, web-mining for concept graph discovery, semi-supervised clustering, multi-task learning, novelty-based information retrieval, large-scale optimization for online advertising, social network analysis for personalized email prioritization, etc. Her recent research focuses on the following topics:
o Providing organizational views of multi-source Big Data (e.g., Wikipedia, online shops, Coursera)
o State-of-the-art classifiers for large-scale classification over hundreds of thousands of categories
o Scalable variational inference for joint optimization of one trillion (4 TB) model parameters
· Scalable Machine Learning for Time Series Analysis (Topic Detection and Tracking)
o From scientific literature, news stories, sensor signals, maintenance reports, etc.
o Mapping online course materials to Wikipedia categories as the Interlingua (universal concepts)
o Predicting conceptual dependencies among courses based on partially observed prerequisites
o Planning customized curriculum for individuals based on backgrounds and goals
· Macro-Level Information Fusion for Events and Entities (joint effort with Prof. Jaime Carbonell In the DEFT project under DARPA)
o Detecting entities and events of interest in various forms of mentions in text to enable high-precision semi-structured information fusion and summarization. Using a corporate acquisition event as an example, different (and partially redundant) sentences can mention acquirer, price, date, approvals, joint-management, etc. These multi-aspect information needs to be jointly extracted into a unified structured form for this event type, with uncertainty estimates in the aggregated representation.
· Topic identification on text and speech data in low-density languages (in the LORILEI/ARIEL project under DARPA)
o Developing a new framework for cross-language topic/event mapping, topic-conditioned statistical translation, semi-supervised word clustering in multi-lingual settings, and bootstrapping of semantic lexicons via system interactions with domain experts and linguists.