International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
Upcoming Seminar Presentations
All seminars take place Wednesdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.
Wednesday, April 5, 2023 [link to join]
Speaker: Aleksandr (Sasha) Podkopaev (Carnegie Mellon University)
Title: Independence Testing by Betting
Abstract: Nonparametric independence testing --- testing the null hypothesis that the joint distribution of two random variables factorizes into the product of their respective marginals against the alternative that it does not --- is a classical statistical problem that has been extensively studied in the batch setting when an analyst specifies the sample size before collecting data. Sequential independence testing is a complementary approach which allows an analyst to analyze an incoming stream of data online. Following the principle of nonparametric testing by betting, we develop sequential kernelized independence tests (SKITs). Our tests (a) continuously monitor the data while controlling the false alarm rate, (b) are consistent, meaning that they are guaranteed to stop if the null is false, and (c) provably adapt to the complexity of a problem at hand, meaning that they stop earlier on easy tasks (and later on harder ones), exhibiting an interesting empirical-Bernstein behavior in the exponent of the power. In this talk, I will describe the key ideas that underlie our test, illustrate the theoretical and empirical results, and discuss extensions to settings where batch independence tests fail, such as testing the independence null in non-i.i.d., time-varying setups.
Discussant: Will Hartog (Stanford University)
Links: [Relevant papers: paper #1]
Wednesday, April 12, 2023 [link to join]
Speaker: Lucas Janson (Harvard University)
Title: Exact Conditional Independence Testing and Conformal Inference with Adaptively Collected Data
Abstract: Randomization testing is a fundamental method in statistics, enabling inferential tasks such as testing for (conditional) independence of random variables, constructing confidence intervals in semiparametric location models, and constructing (by inverting a permutation test) model-free prediction intervals via conformal inference. Randomization tests are exactly valid for any sample size, but their use is generally confined to exchangeable data. Yet in many applications, data is routinely collected adaptively via, e.g., (contextual) bandit and reinforcement learning algorithms or adaptive experimental designs. In this paper we present a general framework for randomization testing on adaptively collected data (despite its non-exchangeability) that uses a weighted randomization test, for which we also present computationally tractable resampling algorithms for various popular adaptive assignment algorithms, data-generating environments, and types of inferential tasks. Finally, we demonstrate via a range of simulations the efficacy of our framework for both testing and confidence/prediction interval construction. This is joint work with Yash Nair.
Discussant: Jing Lei (Carnegie Mellon University)
Links: [Relevant papers: paper #1]
Wednesday, April 19, 2023 [link to join]
Speaker: Sifan Liu (Stanford University)
Title: An Exact Sampler for Inference after Polyhedral Model Selection
Abstract: Inference after model selection can be computationally challenging when dealing with intractable conditional distributions. Markov chain Monte Carlo (MCMC) is a common method for drawing samples from these distributions, but its slow convergence can limit its practicality. In this work, we propose a Monte Carlo sampler specifically designed for Gaussian distributions and polyhedral selection events. The method uses importance sampling from a suitable proposal distribution with the separation-of-variable property, and employs conditional Monte Carlo and randomized quasi-Monte Carlo for further variance reduction. Compared to MCMC, our proposed estimator of p-values achieves much higher accuracy while providing reliable error estimation. We also develop a method for testing and constructing confidence intervals for multiple parameters using a single batch of samples, reducing the need for repeated sampling. This method provides an efficient and practical solution for conducting selective inference after a polyhedral model selection.
Links: [Relevant papers:]
Wednesday, April 26, 2023 [link to join]
Speaker: Ameer Dharamshi (University of Washington)
Title: Generalized Data Thinning Using Sufficient Statistics
Abstract: Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be "thinned" into independent random variables X(1),…,X(K), such that X=∑Kk=1X(k). In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.
Links: [Relevant papers: paper #1]
Wednesday, May 3, 2023 [link to join]
Speaker: Rajen Shah (University of Cambridge)
Wednesday, June 7, 2023 [link to join]
Speaker: Jonathan Roth (Brown University)
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Rina Barber (University of Chicago)
Will Fithian (UC Berkeley)
Jelle Goeman (Leiden University)
Lihua Lei (Stanford University)
Daniel Yekutieli (Tel Aviv University)
If you have feedback or suggestions or want to propose a speaker, please e-mail us at email@example.com.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!