Machine Learning Speaking Skills Talk

  • Ph.D. Student
  • Machine Learning Department
  • Carnegie Mellon University
Speaking Skills

Learning Mixtures of Multi-Output Regression Models By Correlation Clustering for Multi-View Data

Multi-view data are an increasingly prevalent type of dataset that allows exploitation of relationships between sets of variables. It is often interesting to analyze the correlation between two views via multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, different parts of the data may have their own patterns of correlation, which CCA cannot reveal. To address this challenge, we propose a method called Canonical Least Squares (CLS) clustering. Somewhat like CCA, a single CLS model can be regarded as a multi-output regression model that finds latent variables to connect inputs and outputs. This method, however, also identifies partitions of data that enhance correlations in each partition, which may be useful when different correlation structures appear in different subsets of the data or when nonlinear correlations may be present. Furthermore, we introduce a supervised classification method that relies on CLS clustering. The value of these methods rests in their capability to find interpretable structure in the data to explain their predictions. We demonstrate the potential utility of the proposed approach using an example application in clinical informatics to detect and characterize slow bleeding in patients whose vital signs are monitored at the bedside. We empirically show how the proposed method can help discover and analyze multiple-to-multiple correlations, which could be nonlinear or vary throughout the population, while retaining interpretability of the resulting models.

For More Information, Please Contact: