Recent Research

         Web-scale Hierarchical Classification: Using classification to provide organizational views of large data becomes increasingly important in the Big-Data Era. For instance, Wikipedia articles are indexed using over a half million categories in a dependency graph. Jointly optimizing all the classifiers (one per node) in such a large graph or hierarchy presents significant challenges for structured learning and large-scale optimization.We have developed powerful statistical learning frameworks and scalable algorithms (for variational inference, efficient approximation and parallel computing) which successfully solved such joint optimization problems with up to one trillion (4 TB) model parameters in 37 hours, and produced the best results on the largest datasets in the international PASCAL Large-scale Hierarchical Text Categorization evaluations (Gopal & Yang, KDD 2013;Gopal & Yang, ICML 2013 & Supplementary;Gopal et al., NIPS 2012) .

         Hierarchical, Dynamic and Semi-supervised Topic Detection and Tracking (On-going NSF project): Modeling information dynamics in at different levels of granularity is an open challenge.†† We have developed a new family of Bayesian von Mices Fischer (vMF) clustering techniques, including hierarchical, dynamic and multi-field models that outperform popular graphic topic models and clustering methods in effectiveness and scalability in the discovery of human-expected clusters and hierarchies. On top of these vFM models, we further developed a new Transformation-based Clustering with Supervision (TCS) framework, which shows for the first time that supervised metric learning from a small subset of labeled clusters can be used to learn user-preferred clustering metrics in the discovery of unknown clusters.Extensive testing on a large number of benchmark datasets across several application domains (news stories, scientific literature, image processing, speech/speaker recognition, etc.) revealed substantial performance improvements of these approaches over competing methods in clustering (Gopal & Yang, UAI 2014 & Supplimentary).

         Concept-Graph Learning via Web Mining for Customized Curriculum Planning (On-going NSF project): With massive quantities of educational materials freely available on the web, the vision of universal education appears within our grasp. General-purpose search engines are insufficient as they do not focus on educational materials, objectives, pre-requisite relations, etc., nor do they stitch together multiple sources to create customized curricula for studentsí goals and current knowledge. The project focuses on: 1) extracting educational units from diverse web sites and representing them in a large directed graph, whose nodes are content descriptors and whose edges encode pre-requisite and other relations, 2) conducting multi-field topic inference via a new family of graphical models to infer relations among educational units, enriching the graph, and 3) automated curricular planning, focusing on providing sequences of lessons, courses, exercises and other education units for a student to achieve his or her educational goals, conditioned on current skills. The curriculum planner enriches a graph traversal path, with alternate paths, reinforcement options, and conditional branches.

         Large-scale Optimization for Online Advertising: Sponsored search is an important means of Internet monetization, and is the driving force of major search engines today.How to place advertisements to maximize the revenue for search engines, as well as to satisfy the needs of both users and advertising industries is a tough problem.Collaborating with Microsoft Research in Asia, we have developed a new (and the first) probabilistic optimization framework based on joint modeling of per-click auctions and campaign-level guaranteed delivery of advertisements.We also developed a hierarchical divide-&-conquer strategy for solving the very large optimization problem with millions of users/queries (demands) and massive campaigns (supplies) in the ever-evolving Internet. (K Salomatin, PhD Thesis)

         Multi-Task Active Learning: Active learning selects the most informative instances to label in the process of iterative retraining of classification or regression models. MTAL extends this idea by leveraging inter-task dependencies in estimating the impact of newly selected instances, instead of selecting instances for each task in isolation.We have developed a family of MTAL methods called Benevolent Active Learning, to explicitly estimate the impact of supervision across tasks and to leverage various dependence structures (hierarchies, networks, latent-factor correlations).We have also pursued Personalized Active Learning, i.e., we want to optimize the learning curve of the system not only by selecting informative instances to label, but also by selecting the most knowledgeable labelers for the selected instances. (A Harpale, PhD Thesis; J Zhang, PhD Thesis)

         Personalized Email Prioritization based on Content and Social Network Analysis: Statistical learning in personalized email prioritization has been relatively sparse due to privacy issues since people are reluctant to share personal messages and importance judgments with the research community. We have developed PEP methods under the assumption that the system can only access personal email of each user during the training and testing of the model for that user. Specifically, our focus is on the analysis of personal email networks for discovering user groups and inducing social importance features for email senders and receivers from the viewpoint of each particular user.†† Using a classification framework to model the mapping from email messages to the appropriate personal priority levels, the system leverages both standard features of email messages and induced social features of senders and receivers in an enriched vector space. (S Yoo, PhD Thesis, Yang et al., IEEE Intelligent Systems: Special Issue on Social Learning )

         Macro-Level Information Fusion for Events and Entities:Web pages, blogs, social media and other texts contain many mentions of events and entities.Detecting such redundancy and fusing multiple mentions enables high-precision recognition, enriching the extracted information.For instance an event and important entities may be mentioned by many sentences in one or more documents, and a joint (fused) representation can be both more accurate and informative at the right level of granularity. Using a corporate acquisition event as an example, different (and partially redundant) sentences can mention acquirer, price, date, approvals, joint-management, etc.; these multi-aspect information needs to be jointly extracted into a unified structured form for this event type, with uncertainty estimates in the aggregated representation.We are initiating a project in this area (joint effort with Prof. Jaime Carbonell).