Abstract
We consider the problem of learning structured-output regression, where the output consists of multiple response variables that are related by a graph and correlated response variables are dependent on the input variables in a sparse but synergistic fashion, rather than independently. Such problems can be found in genetic investigation of causal DNA variations that lead to perturbations of multiple related genes in a module, or in engineering problems, where complex responses must be jointly mapped to causal factors to overcome weak statistical power in independent analysis of individual responses. We propose the graph-guided fused lasso methods, which incorporate couplings among response variables on top of the standard lasso penalty for overall sparsity. Our approach represents the dependency structure among the output variables explicitly as a network, and leverages this network to encode structured regularizations in a multivariate regression model, so that the inputs that jointly influence subgroups of highly correlated outputs can be detected with high sensitivity and specificity. Experimental comparisons with standard lasso are performed on both simulated and real data, and our results show that there is a significant advantage in detecting the true relevant inputs when we incorporate the correlation pattern in outputs using our proposed methods.
Venue, Date, and Time
Venue: GHC 6115
Date: Monday, Oct 5, 2009
Time: 12:00 noon