FaloutsosReport

Tensors are multi-dimensional generalizations of matrices, and so can have non-numeric entries. Extremely large and sparse coupled tensors arise in numerous important applications that require the analysis of large, diverse, and partially related data. The effective analysis of coupled tensors requires the development of algorithms and associated software that can identify the core relations that exist among the different tensor modes, and scale to extremely large datasets. The objective of this project is to develop theory and algorithms for (coupled) sparse and low-rank tensor factorization, and associated scalable software toolkits to make such analysis possible. The research in the project is centered on three major thrusts. The first is designed to make novel theoretical contributions in the area of coupled tensor factorization, by developing multi-way compressed sensing methods for dimensionality reduction with perfect latent model reconstruction. Methods to handle missing values, noisy input, and coupled data will also be developed. The second thrust focuses on algorithms and scalability on modern architectures, which will enable the efficient analysis of coupled tensors with millions and billions of non-zero entries, using the map-reduce paradigm, as well as hybrid multicore architectures. An open-source coupled tensor factorization toolbox (HTF- Hybrid Tensor Factorization) will be developed that will provide robust and high-performance implementations of these algorithms. Finally, the third thrust focuses on evaluating and validating the effectiveness of these coupled factorization algorithms on a NeuroSemantics application whose goal is to understand how human brain activity correlates with text reading & understanding by analyzing fMRI and MEG brain image datasets obtained while reading various text passages.
Given triplets of facts (subject-verb-object), like ('Washington' 'is the capital of' 'USA'), can we find patterns, new objects, new verbs, anomalies? Can we correlate these with brain scans of people reading these words, to discover which parts of the brain get activated, say, by tool-like nouns ('hammer'), or action-like verbs ('run')?
We propose a unified "coupled tensor" factorization framework to systematically mine such datasets. Unique challenges in these settings include

tera- and peta-byte scaling issues,
distributed fault-tolerant computation,
large proportions of missing data, and
insufficient theory and methods for big sparse tensors.

The Intellectual Merit of this effort is exactly the solution to the above four challenges.
The Broader Impact is the derivation of new scientific hypotheses on how the brain works and how it processes language (from the never-ending language learning (NELL) and NeuroSemantics projects) and the development of scalable open source software for coupled tensor factorization. Our tensor analysis methods can also be used in many other settings, including recommendation systems and computer-network intrusion/anomaly detection.

3.2. Current Results

Our main results so far are as follows:

We have developed highly scalable tensor decomposition methods for multiple platforms (e.g. Hadoop and Matlab), and for various models and objective functions. We also derive theoretical guarantees for the correctness and identifiability conditions of our methods.
We have developed tensor based methods that are able to perform robust community dection in time evolving graphs, scalable anomaly detection in intrusion detection systems settings, as well as detecting communities and anomalies in location based social networks
We have developed a novel, simple, and effective analytic model, which is able to capture the functional connectivity of the human brain. We have also proposed a novel algorithmic framework for Sparse System Identification, in order to solve the above model.
We have developed scalable, parallelizable algorithms that are able to accelerate, up to 200x, any existing Coupled Matrix-Tensor Factorization algorithm, while producing up to 65x sparser results.
Using CMTF and the aforementioned tool, we have been able to simultaneously cluster brain regions, nouns and semantic features pertaining to those nouns, into insightful and meaningful groups.
We have delivered a tutorial on scalable methods for big tensors, on the ICASSP 2014 conference, the flagship conference for signal processing.
We propose the 'good enough brain model' (GeBM), a linear system that successfully mimicks brain activity under some simple, word-recognition tasks. The method bridges machine learning with control-theory system-identification.

3.3. Publications

The following papers mention the CMU NSF grant, or the UMN counterpart, or both.

Refereed Conferences

Predicting Code-switching in Multilingual Communication for Immigrant Communities, Evangelos E. Papalexakis, Dong Nguyen, Seza Dogruoz. Workshop on Computational Approaches to Code Switching at EMNLP 2014, Doha, Qatar
Good-Enough Brain Model: Challenges, Algorithms and Discoveries in Multi-Subject Experiments, Evangelos E. Papalexakis, Alona Fyshe, Nicholas Sidiropoulos, Partha Pratim Talukdar, Tom Mitchell, Christos Faloutsos. ACM KDD 2014, New York City, USA
MalSpot: Multi^2 Malicious Network Behavior Patterns Analysis, Ching-Hao Mao, Chung-Jung Wu, Kuo-Chen Lee, Evangelos E. Papalexakis, Christos Faloutsos. PAKDD 2014, Tainan, Taiwan
Com2: Fast Automatic Discovery of Temporal (Comet) Communities, Miguel Araujo, Spiros Papadimitriou, Stephan Guennemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. PAKDD 2014, Tainan, Taiwan (Best student paper award (runner-up))
A Parallel Algorithm for Big Tensor Decomposition Using Randomly Compressed Cubes (PARACOMP), Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos. IEEE ICASSP 2014, Florence, Italy
Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x,
Evangelos E. Papalexakis, Tom Mitchell, Nicholas D. Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy. SIAM SDM 2014, Philadelphia, USA. Selected as one of the best papers of SDM'14
FlexiFact: Scalable Flexible Factorization of Coupled Tensors on Hadoop,
Alex Beutel, Abhimanu Kumar, Evangelos E. Papalexakis, Partha Pratim Talukdar, Christos Faloutsos, Eric P. Xing. SIAM SDM 2014, Philadelphia, USA
Spotting Misbehaviors in Location-based Social Networks using Tensors,
Evangelos E. Papalexakis, Kostantinos Pelechrinis, Christos Faloutsos. WWW 2014 Web Science Track, Seoul, Korea
Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization, Evangelos E. Papalexakis, Tom Mitchell, Nicholas D. Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy.
Arxiv preprint.
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos “AutoPlait: Automatic Mining of Co-evolving Time Sequences” ACM SIGMOD Conference, pp. 193-204, Snowbird, Utah, June 2014
M. Gardner, K. Huang, E.E. Papalexakis, P.P. Talukdar, N.D. Sidiropoulos, C. Faloutsos, T. Mitchell, “Translation Invariant Word Embeddings”, in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), Sep. 17-21, 2015, Lisbon, Portugal.
K. Huang, N.D. Sidiropoulos, C. Faloutsos, E.E. Papalexakis, P.P. Talukdar, and T. Mitchell, “Principled Neuro-Functional Connectivity Discovery”, in Proc. SIAM Conference on Data Mining (SDM), Apr. 30 - May 2, 2015, Vancouver, British Columbia, Canada.
K. Huang, N.D. Sidiropoulos, and A.P. Liavas, “Efficient algorithms for universally constrained matrix and tensor factorization”, in Proc. EUSIPCO 2015, Aug. 31 - Sep. 4, 2015, Nice, France.
S. Smith, N. Ravindran, G. Karypis, and N.D. Sidiropoulos, “SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication”, in Proc. 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 25-29, 2015, Hyderabad, INDIA.
A.P. Liavas, and N.D. Sidiropoulos, “Parallel algorithms for large scale constrained tensor decomposition”, in Proc. IEEE ICASSP 2015, April 19-24, 2015, Brisbane, Australia.
N. Ravindran, N.D. Sidiropoulos, S. Smith, and G. Karypis, “Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition”, in Asilomar Conference on Signals, Systems, and Computers, November 2-5, 2014, Pacific Grove, CA.
Inah Jeon, Evangelos E. Papalexakis, U Kang, Christos Faloutsos, HaTen2: Billion-scale Tensor Decompositions IEEE ICDE 2015, Seoul, Korea
Evangelos E. Papalexakis, Christos Faloutsos, Fast Efficient and Scalable Core Consistency Diagnostic for the PARAFAC Decomposition for Big Sparse Tensors IEEE ICASSP 2015, Brisbane, Australia
Rakesh Agrawal, Behzad Golshan, Evangelos E. Papalexakis, Whither Social Networks for Web Search? ACM KDD, Sydney, NSW, Australia
Evangelos E. Papalexakis, Konstantinos Pelechrinis, Christos Faloutsos, Location Based Social Network Analysis Using Tensors and Signal Processing Tools, IEEE CAMSAP 2015, Cancun, Mexico
Bryan Hooi, Hyun-Ah Song, Evangelos E. Papalexakis, Rakesh Agrawal, Christos Faloutsos, Matrices, Compression, Learning Curves: formulation, and the GroupNTeach algorithms, in Proc. of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 19-22 2016, Auckland, New Zealand
Evangelos E. Papalexakis, Automatic Unsupervised Tensor Mining with Quality Assessment, in Proc. SIAM Conference on Data Mining (SDM), May 5-7 2016, Miami, Florida, USA
Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, Hyun-Ah Song, Partha Pratim Talukdar, Nicholas D.Sidiropoulos, Christos Faloutsos, Tom Mitchell, Efficient and Distributed Algorithms for Large-Scale Generalized Canonical Correlations Analysis, in Proc. of International Conference on Data Mining(ICDM), Dec 12-15 2016, Barcelona, Spain
Xiao Fu, Kejun Huang, Otilia Stretcu, Hyun Ah Song, Evangelos E. Papalexakis, Partha Pratim Talukdar, Tom Mitchell, Nicholas D.Sidiropoulos, Christos Faloutsos, Barnabas Poczos, BRAINZOOM: High Resolution Reconstruction from Multi-modal Brain Signals, in SDM 2017, Houston, Texas, April 2017.
Zongge Liu, Hyun Ah Song, Vladimir Zadorozhny, Christos Faloutsos, and Nikos Sidiropoulos, H-Fuse: Efficient Fusion of Aggregated Historical Data, in SDM 2017, Houston, Texas, April 2017.
Hyun Ah Song, Bryan Hooi, Marko Jereminov, Amritanshu Pandey, Larry Pileggi, Christos Faloutsos, PowerCast: Mining and Forecasting Power Grid Sequences in PKDD 2017, Skopje, Macedonia, September 2017.
Hyun Ah Song, Fan Yang, Zongge Liu, Wilbert van Panhuis, Nicholas Sidiropoulos, Christos Faloutsos, Vladimir Zadorozhny, GB-R: A Fast and Effective Gray-Box Reconstruction of Cascade Time-Series In Data Mining Workshop, 2017, ICDM, New Orleans, LA, USA, Nov. 2017.
Miguel Araujo, Pedro Ribeiro, Christos Faloutsos, TensorCast: Forecasting with Context using Coupled Tensors, ICDM, New Orleans, LA, USA, Nov. 2017.
Fan Yang, Hyun Ah Song, Zongge Liu, Christos Faloutsos, Vladimir Zadorozhny, and Nicholas Sidiropoulos, ARES: Automatic Disaggregation of Historical Data, ICDE 2018, Paris, France, April 16-19, 2018.
Bryan Hooi, Hyun Ah Song, Amritanshu Pandey, Marko Jereminov, Larry Pileggi, and Christos Faloutsos, StreamCast: Fast and Online Mining of Power Grid Time Sequences, SIAM Data Mining (SDM), San Diego, CA, USA, May 3-5, 2018 (to appear)

Refereed Journals

E.E. Papalexakis, A. Fyshe, N.D. Sidiropoulos, P.P. Talukdar, T. Mitchell, C. Faloutsos, “Good-Enough Brain Model: Challenges, Algorithms and Discoveries in Multi-Subject Experiments”, Big Data, Dec. 2014 (DOI: 10.1089/big.2014.0044).
E.E. Papalexakis, T. Mitchell, N.D. Sidiropoulos, C. Faloutsos, P.P. Talukdar, B. Murphy, “Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications”, Statistical Analysis and Data Mining (SAM) journal, ’Best of SDM 2014’ special issue, to appear.
E.E. Papalexakis, C. Faloutsos, and N.D. Sidiropoulos, “ParCube: Sparse Parallelizable Tensor Decompositions”, ACM Transactions on Knowledge Discovery from Data, 2015.
K. Huang, N.D. Sidiropoulos, and A.P. Liavas, “A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization”, Journal of Machine Learning Research, submitted May 29, 2015.
X. Fu, K. Huang, W.-K. Ma, N.D. Sidiropoulos, and R. Bro, “Joint Tensor Factorization and Outlying Slab Suppression with Applications”, IEEE Trans. on Signal Processing, Year: 2015, Volume: PP, Issue: 99, Pages: 1 - 1, DOI: 10.1109/TSP.2015.2469642
A.P. Liavas, and N.D. Sidiropoulos, “Parallel Algorithms for Constrained Tensor Factorization via the Alternating Direction Method of Multipliers”, IEEE Trans. on Signal Processing, Year: 2015, Volume: 63, Issue: 20, Pages: 5450 - 5463, DOI: 10.1109/TSP.2015.2454476
O. Mehanna, K. Huang, B. Gopalakrishnan, A. Konar, and N.D. Sidiropoulos, “Feasible Point Pursuit and Successive Approximation of Non-convex QCQPs”, IEEE Signal Processing Letters, Year: 2015, Volume: 22, Issue: 7, Pages: 804 - 808, DOI: 10.1109/LSP.2014.2370033
Large Scale Tensor Decompositions: Algorithmic Developments and Applications, Evangelos E. Papalexakis, U Kang, Christos Faloutsos, Nicholas D. Sidiropoulos, Abhay Harpale. In IEEE Data Engineering Bulletin - Special Issue on Social Media.
PArallel RAndomly COMPressed Cubes (PARACOMP): A Scalable Distributed Architecture for Big Tensor Decomposition, Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos. IEEE Signal Processing Magazine - Special Issue on Signal Processing for Big Data
Non-negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition. Kejun Huang, Nicholas D. Sidiropoulos, and Ananthram Swami, IEEE Trans. on Signal Processing, to appear.
Putting NMF to the Test: A Tutorial Derivation of Pertinent Cramer-Rao Bounds and Performance Benchmarking. Kejun Huang, and Nicholas D. Sidiropoulos. IEEE Signal Processing Magazine, Special Issue on Source Separation and its Applications [Submitted for publication]

Dissertation

Evangelos E. Papalexakis, Mining Large Multi-Aspect Data: Algorithms and Applications Ph.D. Thesis (available as CMU-CS-16-124) Abstract Runner-up for the ACM-KDD Doctoral dissertation award.

3.4 Tutorials

Factoring Tensors in the Cloud: A Tutorial on Big Tensor Data Analytics Nicholas D. Sidiropoulos, Evangelos E. Papalexakis, Christos Faloutsos, ICASSP 2014.Tutorial web-page

3.5 Code

See here Mostly in matlab; requires tensor toolbox.

Last updated: April 2018, by Christos Faloutsos and Hyun Ah Song.

Christos Faloutsos & Tom Mitchell	Phone: (412)-268.1457
Department of Computer Science	Fax : (412)-268.5576
Carnegie Mellon Univ.	Email: {christos, tom} at cs.cmu.edu
Pittsburgh, PA 15213	WWW page: http://www.cs.cmu.edu/~christos

BIGDATA:Mid-Scale:DA:Collaborative research:Big Tensor Mining: Theory, Scalable Algorithms and Applications.

1. GENERAL INFORMATION

1.1. Abstract

1.2. Keywords

1.3. Funding agency

2. PEOPLE INVOLVED

3. RESEARCH

3.1. Project goals

3.2. Current Results

3.3. Publications

3.4 Tutorials

3.5 Code