International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
For announcements and Zoom invitations please subscribe to our mailing list.
All seminars take place Tuesdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.
Tuesday, April 29, 2025 [Link]
Speaker: Yusuf Sale (Ludwig Maximilian University of Munich)
Title: Online Selective Conformal Prediction: Errors and Solutions
Abstract: In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. We evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.
Discussant: Yajie Bao (Nankai University)
Links: [Relevant papers: paper #1]
Tuesday, May 6, 2025 [Link]
Speaker: Li Ma (Duke University)
Title: A partial likelihood approach to tree-based density modeling and its application in Bayesian inference
Abstract: Tree-based priors for probability distributions are usually specified using a predetermined, data-independent collection of candidate recursive partitions of the sample space. To characterize an unknown target density in detail over the entire sample space, candidate partitions must have the capacity to expand deeply into all areas of the sample space with potential non-zero sampling probability. Such an expansive system of partitions often incurs prohibitive computational costs and makes inference prone to overfitting, especially in regions with little probability mass. Thus, existing models typically make a compromise and rely on relatively shallow trees. This hampers one of the most desirable features of trees, their ability to characterize local features, and results in reduced statistical efficiency. Traditional wisdom suggests that this compromise is inevitable to ensure coherent likelihood-based reasoning in Bayesian inference, as a data-dependent partition system that allows deeper expansion only in regions with more observations would induce double dipping of the data. We propose a simple strategy to restore coherency while allowing the candidate partitions to be data-dependent, using Cox's partial likelihood. Our partial likelihood approach is broadly applicable to existing likelihood-based methods and, in particular, to Bayesian inference on tree-based models. We give examples in density estimation in which the partial likelihood is endowed with existing priors on tree-based models and compare with the standard, full-likelihood approach. The results show substantial gains in estimation accuracy and computational efficiency from adopting the partial likelihood.
Discussant: Qian Zhao (University of Massachusetts, Amherst)
Links: [Relevant papers: paper #1]
Tuesday, May 13, 2025 [Link]
Speaker: Hongjian Wang (Carnegie Mellon University)
Title: Anytime-valid FDR control with the stopped e-BH procedure
Abstract: The recent e-Benjamini-Hochberg (e-BH) procedure for multiple hypothesis testing is known to control the false discovery rate (FDR) under arbitrary dependence between the input e-values. This paper points out an important subtlety when applying the e-BH procedure with e-processes, which are sequential generalizations of e-values (where the data are observed sequentially). Since adaptively stopped e-processes are e-values, the e-BH procedure can be repeatedly applied at every time step, and one can continuously monitor the e-processes and the rejection sets obtained. One would hope that the "stopped e-BH procedure" (se-BH) has an FDR guarantee for the rejection set obtained at any stopping time. However, while this is true if the data in different streams are independent, it is not true in full generality, because each stopped e-process is an e-value only for stopping times in its own local filtration, but the se-BH procedure employs a stopping time with respect to a global filtration. This can cause information to leak across time, allowing one stream to know its future by knowing past data of another stream. This paper formulates a simple causal condition under which local e-processes are also global e-processes and thus the se-BH procedure does indeed control the FDR. The condition excludes unobserved confounding from the past and is met under most reasonable scenarios including genomics.
Discussant: Yo Joong Choe (University of Chicago)
Links: [Relevant papers: paper #1]
Tuesday, May 20, 2025 [Link]
Speaker: Aytijhya Saha (Indian Statistical Institute)
Title: Post-detection inference for sequential changepoint localization
Abstract: This talk will focus on a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We study the problem of localizing the changepoint using only the data observed up to a data-dependent stopping time at which a sequential detection algorithm $\mathcal A$ declares a change. We first construct confidence sets for the unknown changepoint when pre- and post-change distributions are assumed to be known. We then extend our framework to composite pre- and post-change scenarios. We impose no conditions on the observation space or on $\mathcal A$ --- we only need to be able to run $\mathcal A$ on simulated data sequences. In summary, this work offers both theoretically sound and practically effective tools for sequential changepoint localization.
Discussant:
Links: [Relevant papers: paper #1]
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Will Fithian (UC Berkeley)
Jelle Goeman (Leiden University)
Nikos Ignatiadis (University of Chicago)
Lihua Lei (Stanford University)
Zhimei Ren (University of Pennsylvania)
Rina Barber (University of Chicago)
Daniel Yekutieli (Tel Aviv University)
If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!