BIGDATA:Mid-Scale:DA:Collaborative research:Big Tensor Mining: Theory, Scalable Algorithms and Applications.

 
Christos Faloutsos & Tom Mitchell Phone: (412)-268.1457
Department of Computer Science Fax : (412)-268.5576
Carnegie Mellon Univ. Email: {christos, tom} at cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~christos

This material is based upon work supported by the National Science Foundation under Grants No. IIS-1247489, IIS-1247632,. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

1. GENERAL INFORMATION

1.1. Abstract

Link to NSF abstract

Tensors are multi-dimensional generalizations of matrices, and so can have non-numeric entries. Extremely large and sparse coupled tensors arise in numerous important applications that require the analysis of large, diverse, and partially related data. The effective analysis of coupled tensors requires the development of algorithms and associated software that can identify the core relations that exist among the different tensor modes, and scale to extremely large datasets. The objective of this project is to develop theory and algorithms for (coupled) sparse and low-rank tensor factorization, and associated scalable software toolkits to make such analysis possible. The research in the project is centered on three major thrusts. The first is designed to make novel theoretical contributions in the area of coupled tensor factorization, by developing multi-way compressed sensing methods for dimensionality reduction with perfect latent model reconstruction. Methods to handle missing values, noisy input, and coupled data will also be developed. The second thrust focuses on algorithms and scalability on modern architectures, which will enable the efficient analysis of coupled tensors with millions and billions of non-zero entries, using the map-reduce paradigm, as well as hybrid multicore architectures. An open-source coupled tensor factorization toolbox (HTF- Hybrid Tensor Factorization) will be developed that will provide robust and high-performance implementations of these algorithms. Finally, the third thrust focuses on evaluating and validating the effectiveness of these coupled factorization algorithms on a NeuroSemantics application whose goal is to understand how human brain activity correlates with text reading & understanding by analyzing fMRI and MEG brain image datasets obtained while reading various text passages.
Given triplets of facts (subject-verb-object), like ('Washington' 'is the capital of' 'USA'), can we find patterns, new objects, new verbs, anomalies? Can we correlate these with brain scans of people reading these words, to discover which parts of the brain get activated, say, by tool-like nouns ('hammer'), or action-like verbs ('run')?
We propose a unified "coupled tensor" factorization framework to systematically mine such datasets. Unique challenges in these settings include
  1. tera- and peta-byte scaling issues,
  2. distributed fault-tolerant computation,
  3. large proportions of missing data, and
  4. insufficient theory and methods for big sparse tensors.
The Intellectual Merit of this effort is exactly the solution to the above four challenges.
The Broader Impact is the derivation of new scientific hypotheses on how the brain works and how it processes language (from the never-ending language learning (NELL) and NeuroSemantics projects) and the development of scalable open source software for coupled tensor factorization. Our tensor analysis methods can also be used in many other settings, including recommendation systems and computer-network intrusion/anomaly detection.

1.2. Keywords

Data mining, map/reduce; read-the-web, tensors; recommender systems.

1.3. Funding agency

2. PEOPLE INVOLVED

In addition to the 2 co-PIs, the following people are the main contributors of the project.
In addition to the main contributors, the following people are collaborators:

3. RESEARCH

3.1. Project goals

We propose a unified coupled tensor factorization framework to systematically mine such datasets. Unique challenges in these settings include
We propose to address them via our three research thrusts:

3.2. Current Results

Our main results so far are as follows:

3.3. Publications

  1. Predicting Code-switching in Multilingual Communication for Immigrant Communities,
    Evangelos E. Papalexakis, Dong Nguyen, Seza Dogruoz,
    at the Workshop on Computational Approaches to Code Switching at EMNLP 2014, Doha, Qatar
  2. Good-Enough Brain Model: Challenges, Algorithms and Discoveries in Multi-Subject Experiments,
    Evangelos E. Papalexakis, Alona Fyshe, Nicholas Sidiropoulos, Partha Pratim Talukdar, Tom Mitchell, Christos Faloutsos,
    ACM KDD 2014, New York City, USA
  3. MalSpot: Multi^2 Malicious Network Behavior Patterns Analysis,
    Ching-Hao Mao, Chung-Jung Wu, Kuo-Chen Lee, Evangelos E. Papalexakis, Christos Faloutsos,
    PAKDD 2014, Tainan, Taiwan
  4. Com2: Fast Automatic Discovery of Temporal (Comet) Communities,
    Miguel Araujo, Spiros Papadimitriou, Stephan Guennemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra,
    PAKDD 2014, Tainan, Taiwan
    Best student paper award (runner-up)
  5. A Parallel Algorithm for Big Tensor Decomposition Using Randomly Compressed Cubes (PARACOMP),
    Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos,
    IEEE ICASSP 2014, Florence, Italy
  6. Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x,
    Evangelos E. Papalexakis, Tom Mitchell, Nicholas D. Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy,
    SIAM SDM 2014, Philadelphia, USA
    Selected as one of the best papers of SDM'14
  7. FlexiFact: Scalable Flexible Factorization of Coupled Tensors on Hadoop,
    Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, Eric P. Xing,
    at SIAM SDM 2014, Philadelphia, USA
  8. Spotting Misbehaviors in Location-based Social Networks using Tensors,
    Evangelos E. Papalexakis, Kostantinos Pelechrinis, Christos Faloutsos,
    WWW 2014 Web Science Track, Seoul, Korea
  9. Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization,
    Evangelos E. Papalexakis, Tom Mitchell, Nicholas D. Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy.
    Arxiv preprint.
  10. Large Scale Tensor Decompositions: Algorithmic Developments and Applications,
    Evangelos E. Papalexakis, U Kang, Christos Faloutsos, Nicholas D. Sidiropoulos, Abhay Harpale,
    In IEEE Data Engineering Bulletin - Special Issue on Social Media.
  11. PArallel RAndomly COMPressed Cubes (PARACOMP): A Scalable Distributed Architecture for Big Tensor Decomposition,
    Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos.
    IEEE Signal Processing Magazine - Special Issue on Signal Processing for Big Data
  12. Non-negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition.
    Kejun Huang, Nicholas D. Sidiropoulos, and Ananthram Swami.
    IEEE Trans. on Signal Processing, to appear.
  13. Putting NMF to the Test: A Tutorial Derivation of Pertinent Cramer-Rao Bounds and Performance Benchmarking.
    Kejun Huang, and Nicholas D. Sidiropoulos,
    IEEE Signal Processing Magazine, Special Issue on Source Separation and its Applications [Submitted for publication]

3.4 Tutorials

  1. Factoring Tensors in the Cloud: A Tutorial on Big Tensor Data Analytics Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos, ICASSP 2014.Tutorial web-page

Last updated: Sept. 9, 2014, by Christos Faloutsos and Vagelis Papalexakis