Simultaneously achieving parsimony and good predictive power in high dimensions is a main challenge in statistics. Non-local priors (NLPs) possess appealing properties for high-dimensional model choice. but their use for estimation has not been studied in detail, mostly due to difficulties in characterizing the posterior on the parameter space. We give a general representation of NLPs as mixtures of truncated distributions. This enables simple posterior sampling and flexibly defining NLPs beyond previously proposed families. We develop posterior sampling algorithms and assess performance in p>>n setups. We observed low posterior serial correlation and notable high-dimensional estimation for linear models. Relative to benchmark and hyper-g priors, SCAD and LASSO, combining NLPs with Bayesian model averaging provided substantially lower estimation error when p>>n. In gene expression data they achieved higher cross-validated R^2 by using an order of magnitude less predictors than competing methods. Remarkably, these results were obtained without the need for methods to pre-screen predictors. Our findings contribute to the debate of whether different priors should be used for estimation and model selection, showing that selection priors may actually be desirable for high-dimensional estimation.
Donatello Telesca is an Assistant Professor in the Department of Biostatistics at UCLA. He received his Ph.D. in Statistics from the University of Washington and spent two years at the University of Texas M.D. Anderson Cancer Center as a postdoctoral fellow. His research interests include Bayesian methods in multivariate statistics, functional data analysis, statistical methods in bio-and nano-informatics.
Note: This is a live broadcast from Santa Monica RAND.
VISITORS to RAND’s Pittsburgh location are welcomed to attend & must RSVP at least one day prior to the seminar. To ensure attendance please donnam [atsymbol] rand.org (RSVP).
donnam [atsymbol] rand.org