o    Overview

o    Paper Submission

o    Schedule

o    Invited Speaker

o    Panels

o    Papers

o    Organizers

SDM 2014 Workshop on Heterogeneous Learning

Overview by Organizers

The main objective of this workshop is to bring the attention of researchers to real problems with multiple types of heterogeneities, ranging from online social media analysis, traffic prediction, to the manufacturing process, brain image analysis, etc. Some commonly found heterogeneities include task heterogeneity (as in multi-task learning), view heterogeneity (as in multi-view learning), instance heterogeneity (as in multi-instance learning), label heterogeneity (as in multi-label learning), oracle heterogeneity (as in crowdsourcing), etc. In the past years, researchers have proposed various techniques for modeling a single type of heterogeneity as well as multiple types of heterogeneities.

This workshop focuses on novel methodologies, applications and theories for effectively leveraging these heterogeneities. Here we are facing multiple challenges. To name a few: (1) how can we effectively exploit the label/example structure to improve the classification performance; (2) how can we handle the class imbalance problem when facing one or more types of heterogeneities; (3) how can we improve the effectiveness and efficiency of existing learning techniques for large-scale problems, especially when both the data dimensionality and the number of labels/examples are large; (4) how can we jointly model multiple types of heterogeneities to maximally improve the classification performance; (5) how do the underlying assumptions associated with multiple types of heterogeneities affect the learning methods.

We encourage submissions on a variety of topics, including but not limited to:

(1) Novel approaches for modeling a single type of heterogeneity, e.g., task/view/instance/label/oracle heterogeneities.

(2) Novel approaches for simultaneously modeling multiple types of heterogeneities, e.g., multi-task multi-view learning to leverage both the task and view heterogeneities.

(3) Novel applications with a single or multiple types of heterogeneities.

(4) Systematic analysis regarding the relationship between the assumptions underlying each type of heterogeneity and the performance of the predictor;

For this workshop, the potential participants and target audience would be faculty, students and researchers in related areas, e.g., multi-task learning, multi-view learning, multi-instance learning, multi-label learning, etc. We also encourage people with application background to actively participate in this workshop.

We believe that advancements on these topics will benefit a variety of application domains.


Paper Submission Return to Top


Key Dates

 

12/31/2013:     Paper Submission

01/10/2014:     Author Notification

01/20/2014:     Camera Ready Paper Due

 

Paper Submission Instructions

 

Papers submitted to this workshop should be limited to 6 pages formatted using the SIAM SODA macro (http://www.siam.org/proceedings/macros.php). Authors are required to submit their papers electronically in PDF format to sdm14hl@gmail.com by 11:59pm EST, December 31, 2013.


Schedule Return to Top


Workshop Schedule

Saturday April 26, 2014 [Ballroom E]

08:30-9:30

Opening ceremony

  Keynote Speech 1

Convex Methods for Multi-view Representation Learning

Dale Schuurmans (University of Alberta)

9:30-10:00

HLAer: a System for Heterogeneous Log Analysis

Xia Ning, Guofei Jiang

10:00-10:30

Coffee break

10:30-11:30

  Keynote Speech 2

Large-scale Multi-task Learning Algorithms

Luke Huan (University of Kansas)

11:30-12:00

  Invited Talk

Inferring Information Trustworthiness from Multiple Sources of Heterogeneous Data

Jing Gao (University at Buffalo, State University of New York)

 

12:00-13:30

Lunch break

13:30-14:30

  Keynote Speech 3

Computing What, Where and When You Want from Real-time Big Data 

Wei Fan (Huawei Noah’s Ark Lab)

14:30-15:00

Ordering the Sequence of Tasks for Efficient Online Multitask Learning

Shaoning Pang

15:00-15:30

Coffee break

15:30-16:00

Semantic Orientation after Sentiment Classification by Machine Learning

Amit Thombre

16:00-16:45

  Panel

Novel Directions in Heterogeneous Learning 

Moderator: Jingrui He (Stevens Institute of Technology)

 

16:45-17:00

Wrap-up


Invited Speaker Return to Top


Speaker 1

Title: Convex Methods for Multi-view Representation Learning

Dale Schuurmans (University of Alberta)

Abstract:

Generative approaches to representation learning attempt to infer latent representations that allow accurate reconstruction of data. However, data is often obtained from multiple sources rather than a single source (e.g. an object might be viewed by cameras at different angles, or a document might consist of text and images).  The conditional independence of separate sources imposes constraints which, if respected, can improve the quality of any learned representation.  In this talk, I will introduce a general convex formulation of representation learning that accommodates dimensionality reduction, sparse coding, and multi-view learning.  In this approach, an optimal data reconstruction is first recovered by exploiting an implicit form of convex regularizer.  The latent representation and associated reconstruction model can then be recovered, jointly and optimally, via a simple boosting procedure.  A comparison to existing feature discovery methods demonstrates improved generalization and in some cases even improved efficiency.

Bio:

Dale Schuurmans is a Professor of Computing Science at the University of Alberta and a former Canada Research Chair in Machine Learning.  He received his PhD in Computer Science from the University of Toronto, and has been employed at the National Research Council Canada, University of Pennsylvania, NEC Research Institute and the University of Waterloo.  He currently serves as an Associate Editor for JAIR and AIJ, and has previously served as an Associate Editor for IEEE TPAMI, JMLR and MLJ, and as a Program Co-chair for NIPS-2008 and ICML-2004.  His research interests include machine learning, optimization, probability models, and search.  He is author of more than 140 refereed publications in these areas, and received paper awards at ICML, IJCAI, AAAI, IEEE ICAL and IEEE ADPRL.

Speaker 2

Title: Large-scale Multi-task Learning Algorithms

Luke Huan (University of Kansas)

Abstract:

Multi-task learning has been utilized in many places. In this talk we will present our recent progresses for designing and implementing large-scale multi-task learning algorithms. The applications of MTL in social network analysis and bioinformatics will be discussed as well.

Bio:

Dr. Jun (Luke) Huan is an Associate Professor in the Department of Electrical Engineering and Computer Science at the University of Kansas. He directs the Bioinformatics and Computational Life Sciences Laboratory at KU Information and Telecommunication Technology Center (ITTC) and the Cheminformatics core at KU Specialized Chemistry Center, funded by NIH. He holds courtesy appointments at the KU Bioinformatics Center, the KU Bioengineering Program, and a visiting professorship from GlaxoSmithKline plc.. Dr. Huan received his Ph.D. in Computer Science from the University of North Carolina.
 
Dr. Huan works on data science, machine learning, data mining, big data, and interdisciplinary topics including bioinformatics. He has published more than 80 peer-reviewed papers in leading conferences and journals and has graduated more than ten graduate students including six PhDs. Dr. Huan serves the editorial board of several international journals including the Springer Journal of Big Data and the International Journal of Data Mining and Bioinformatics. He regularly serves the program committee of top-tier international conferences on machine learning, data mining, big data, and bioinformatics.
 
Dr. Huan's research is recognized nationally and internationally. He was a recipient of the prestigious National Science Foundation Faculty Early Career Development Award in 2009. His group won the Best Student Paper Award at the IEEE International Conference on Data Mining in 2011 and the Best Paper Award (runner-up) at the ACM International Conference on Information and Knowledge Management in 2009. His research was supported by NSF, NIH, DoD, and the University of Kansas.

Speaker 3

Title: Inferring Information Trustworthiness from Multiple Sources of Heterogeneous Data

Jing Gao (University at Buffalo, State University of New York)

Abstract:

Big data leads to big challenges, not only in the volume of data but also in its variety. Multiple descriptions about the same sets of objects or events from different sources will unavoidably lead to data or information inconsistency. Then, among conflicting pieces of data or information, which one is more trustworthy, or represents the true fact? Facing the daunting scale of data, it is unrealistic to expect human to label or tell which data source is more reliable or which piece of information is correct. In this talk, I will discuss our research on integrating data of multiple sources to detect trustworthy information. We have developed a series of optimization-based methods that can automatically infer reliability of sources and facts by correlating and comparing multiple data sources. The effectiveness of the proposed methods is demonstrated on real data sets that are obtained from multi-choice games on smartphones and online hotel review websites.

Bio:

Jing Gao is currently an assistant professor in the Department of Computer Science at the University at Buffalo (UB), State University of New York. She received her PhD from Computer Science Department, University of Illinois at Urbana Champaign in 2011, and subsequently joined UB in 2012. She is broadly interested in data and information analysis with a focus on information integration, ensemble methods, mining data streams, transfer learning and anomaly detection. She has published more than 50 papers in referred journals and conferences and her work has received over 1200 citations. More information about her research can be found at: http://www.cse.buffalo.edu/~jing.

Speaker 4

Title: Computing What, Where and When You Want from Real-time Big Data

Wei Fan (Huawei Noah’s Ark Lab)

Abstract:

With the rapid deployment of mobile devices, 4G LTE and various mobile applications, figuring out users intent at the right time and right place is getting increasing important for mobile applications. With the big data challenge, one mobile carrier in a typical first tier city in China can have more than 300TB ~ 400TB data on a daily basis and it is still increasing by about 3 to 5 folds per year. With so much data to be processed in real-time, computing user's intent involves "real-time research" in  algorithms that can collect, combine, and mine multiple sources of information typically noisy, incomplete and not properly aligned, as well as, system work that easily program and deploy such applications in the real-world setting with lean, scalable and robust architecture. In this talk, using an actual application, we will discuss the challenges and solutions that we have developed that addresses these concerns, as well as, point out some avenues of future research.

Bio:

Dr. Wei Fan is the associate director of Huawei Noah's Ark Lab. Prior to joining Huawei, he received his PhD in Computer Science from Columbia University in 2001. His main research interests and experiences are in various areas of data mining and database systems, such as, stream computing, high performance computing, extremely skewed distribution, cost-sensitive learning, risk analysis, ensemble methods, easy-to-use nonparametric methods, graph mining, predictive feature discovery, feature selection, sample selection bias, transfer learning, time series analysis, bioinformatics, social network analysis, novel applications and commercial data mining systems. His co-authored paper received ICDM'2006 Best Application Paper Award, he led the team that used his Random Decision Tree method to win 2008 ICDM Data Mining Cup Championship. He received 2010 IBM Outstanding Technical Achievement Award for his contribution to IBM Infosphere Streams. He is the associate editor of ACM Transaction on Knowledge Discovery and Data Mining (TKDD). Since he joined Huawei in August 2012, he has led his colleagues to develop Huawei StreamSMART – a streaming platform for online and real-time processing, query and mining of very fast streaming data,. In addition, he also led his colleagues to develop a real-time processing and analysis platform of Mobile Broad Band (MBB) data.


Panel Discussions Return to Top


Panel Discussion: Novel Directions in Heterogeneous Learning
Moderator: Jingrui He,  Stevens Institute of Technology

Description: We will discuss novel directions in heterogeneous machine learning, including but not limited to: (1) novel types of heterogeneity to be modeled in heterogeneous learning; (2) novel applications for heterogeneous learning; (3) complex models vs. simple models.


Table of Contents Return to Top


Full Papers

HLAer: a System for Heterogeneous Log Analysis
Xia Ning, Guofei Jiang

Multi-task Feature Selection based Anomaly Detection
Longqi Yang, Yibing Wang, Zhisong Pan, Guyu Hu

Ordering the Sequence of Tasks for Efficient Online Multitask Learning
Shaoning Pang

Semantic Orientation after Sentiment Classification by Machine Learning
Amit Thombre


Organizers Return to Top


Organizing Committee

  • Jieping Ye (Arizona State University): jieping.ye@asu.edu

Jieping Ye is an Associate Professor of Computer Science and Engineering at the Arizona State University. He is a core faculty member of the Bio-design Institute at ASU. He received his Ph.D. degree in Computer Science from University of Minnesota, Twin Cities in 2005. His research interests include machine learning, data mining, and biomedical informatics. He has served as Senior Program Committee/Area Chair/Program Committee Vice Chair of many conferences including NIPS, KDD, IJCAI, ICDM, SDM, ACML, and PAKDD. He serves as an Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He won the SCI Young Investigator of the Year Award at ASU in 2007, the SCI Researcher of the Year Award at ASU in 2009, and the NSF CAREER Award in 2010. His papers have been selected for the outstanding student paper at the International Conference on Machine Learning in 2004, the KDD best research paper honorable mention in 2010, the KDD best research paper nomination in 2011 and 2012, the SDM best research paper runner up in 2013, and the KDD best research paper runner up in 2013.

  • Yuhong Guo (Temple University): yuhong@temple.edu

Yuhong Guo is an Assistant Professor in the Department of Computer and Information Sciences at Temple University. She has previously been a Research Fellow at the Australian National University and a Postdoctoral Fellow at the University of Alberta. Her research interests include machine learning, natural language processing, computer vision, bioinformatics and data mining. She has received the Distinguished Paper Award from the International Joint Conference on Artificial Intelligence in 2005 and the Outstanding Paper Award from the AAAI Conference on Artificial Intelligence in 2012. She has served in program committees of many conferences, including NIPS, ICML, UAI, AAAI, IJCAI, ACML and SDM.

  • Jingrui He (Stevens Institute of Technology): jingrui.he@stevens.edu

Jingrui He is an Assistant Professor in the Computer Science Department at Stevens Institute of Technology. Before joining Stevens, she was a Research Staff Member at IBM T.J. Watson Research Center. She received the Ph.D degree from School of Computer Science, Carnegie Mellon University in 2010. Her research interests include rare cateogory analysis and heterogeneous machine learning with applications in social media analysis, semiconductor manufacturing, traffic prediction, etc. She has served on the organizing/program committees of many conferences, including ICML, NIPS, IJCAI, ICDM, SDM, etc.

 

Publicity Chair

  • Pei Yang (Stevens Institute of Technology)

 

Program Committee

  • Xia Ning (NEC Labs America)
  • Jianhui Chen (GE Global Research)
  • Jiayu Zhou (Arizona State University)
  • Shuiwang Ji (Old Dominion University)
  • Xinhua Zhang (National ICT Australia / NICTA)
  • Dongtao Liu (Google Research)