Department of Statistics & Data Science Seminar
- Wean Hall
- AADITYA RAMDAS
- Postdoctoral Research
- Departments of Statistics and Electrical Engineering and Computer Science
- University of California at Berkeley
Interactive Algorithms for Multiple Hypothesis Testing
Data science is at a crossroads. Each year, thousands of new data scientists are entering science and technology, after a broad training in a variety of fields. Modern data science is often exploratory in nature, with datasets being collected and dissected in an interactive manner. Classical guarantees that accompany many statistical methods are often invalidated by their non-standard interactive use, resulting in an underestimated risk of falsely discovering correlations or patterns. It is a pressing challenge to upgrade existing tools, or create new ones, that are robust to involving a human-in-the-loop.
In this talk, I will describe two new advances that enable some amount of interactivity while testing multiple hypotheses, and control the resulting selection bias. I will first introduce a new framework, STAR, that uses partial masking to divide the available information into two parts, one for selecting a set of potential discoveries, and the other for inference on the selected set. I will then show that it is possible to flip the traditional roles of the algorithm and the scientist, allowing the scientist to make post-hoc decisions after seeing the realization of an algorithm on the data. The theoretical basis for both advances is founded in the theory of martingales : in the first, the user defines the martingale and associated filtration interactively, and in the second, we move from optional stopping to optional spotting by proving uniform concentration bounds on relevant martingales.
This talk will feature joint work with (alphabetically) Rina Barber, Jianbo Chen, Will Fithian, Kevin Jamieson, Michael Jordan, Eugene Katsevich, Lihua Lei, Max Rabinovich, Martin Wainwright, Fanny Yang and Tijana Zrnic.
Aaditya Ramdas is a postdoctoral researcher in Statistics and EECS at UC Berkeley, advised by Michael Jordan and Martin Wainwright. He finished his PhD in Statistics and Machine Learning at CMU, advised by Larry Wasserman and Aarti Singh, winning the Umesh K. Gavaskar Memorial Thesis Award in Statistics. A lot of his research focuses on modern aspects of reproducibility in science and technology — involving statistical testing and false discovery rate control in static and dynamic settings.