International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
Mailing List
For announcements and Zoom invitations please subscribe to our mailing list.
Upcoming Seminar Presentations
All seminars take place Tuesdays at 8:30 am PT / 11:30 am ET / 3:30 pm London / 5:30 pm Tel Aviv. (Note that US is on DST.) Past seminar presentations are posted here.
Tuesday, March 18, 2025 [Link]
Speaker: Lan Gao (The University of Tennessee Knoxville)
Title: Asymptotic FDR Control with Model-X Knockoffs: Is Moments Matching Sufficient?
Abstract: We propose a unified theoretical framework for studying the robustness of the model-X knockoffs framework by investigating the asymptotic false discovery rate (FDR) control of the practically implemented approximate knockoffs procedure. This procedure deviates from the model-X knockoffs framework by substituting the true covariate distribution with a user-specified distribution that can be learned using in-sample observations. By replacing the distributional exchangeability condition of the model-X knockoff variables with three conditions on the approximate knockoff statistics, we establish that the approximate knockoffs procedure achieves the asymptotic FDR control. Using our unified framework, we further prove that an arguably most popularly used knockoff variable generation method—the Gaussian knockoffs generator based on the first two moments matching—achieves the asymptotic FDR control when the two-moment-based knockoff statistics are employed in the knockoffs inference procedure. For the first time in the literature, our theoretical results justify formally the effectiveness and robustness of the Gaussian knockoffs generator. Simulation and real data examples are conducted to validate the theoretical findings.
Discussant: Abhinav Chakraborty (University of Pennsylvania)
Links: [Relevant papers: paper #1]
Tuesday, March 25, 2025 [Link]
Speaker: Andreas Petrou-Zeniou (MIT)
Title: Inference on Multiple Winners with Applications to Economic Mobility
Abstract: While policymakers and researchers are often concerned with conducting inference based on a data-dependent selection, a strictly larger class of inference problems arises when considering multiple data-dependent selections, such as when selecting on statistical significance or quantiles. Given this, we study the problem of conducting inference on populations selected according to their ranks, which we dub the inference on multiple winners problem. In this setting, we encounter both selective and simultaneous inference problems, making existing approaches either not applicable or too conservative. Instead, we propose a novel, two-step approach to the inference on multiple winners problem, with the first step modeling a key nuisance parameter driving selection, and the second step using this model to derive critical values on the errors of the winners. In simulations, our two-step approach reduces over-coverage error by up to 96% relative to existing approaches. In a stylized example on job training, we demonstrate that existing approaches partially apply, and that our novel two-step approach is broadly applicable and yields informative confidence sets. In a second application, we apply our two-step approach to revisit the winner's curse in the Creating Moves to Opportunity (CMTO) program. We find that, after correcting for the inference on multiple winners problem, we fail to reject the possibility of null effects in the majority of census tracts selected by the CMTO program.
Discussant: Sarah Moon (MIT)
Links: [Relevant papers: paper #1]
Tuesday, April 1, 2025 [Link]
Speaker: Sida Li (University of Chicago)
Title: Prediction-Powered Adaptive Shrinkage Estimation
Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical tasks, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
Discussant:
Links: [Relevant papers: paper #1]
Tuesday, April 8, 2025 [Link]
Speaker: William Hartog (Stanford University)
Tuesday, April 15, 2025 [Link]
Speaker: Suyash Gupta (LinkedIn)
Tuesday, April 22, 2025 [Link]
Speaker: Ying Jin (Harvard University)
Title: Automated Hypothesis Validation with Agentic Sequential Falsifications
Abstract: Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation impractical. Here we propose POPPER, an agentic framework for rigorous automated validation of free-form hypotheses. Guided by Karl Popper’s principle of falsification, POPPER validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications. We employ a sequential testing framework to ensure strict Type-I error control while actively gathering evidence from diverse observations, whether drawn from existing data or newly conducted procedures. We demonstrate POPPER on six domains including biology, economics, and sociology. POPPER delivers robust error control, high power, and scalability. Furthermore, compared to human scientists, POPPER achieved comparable performance in validating complex biological hypotheses while reducing time by 10 folds, providing a scalable, rigorous solution for hypothesis validation.
Discussant:
Links: [Relevant papers: paper #1]
Format
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Organizers
Will Fithian (UC Berkeley)
Jelle Goeman (Leiden University)
Nikos Ignatiadis (University of Chicago)
Lihua Lei (Stanford University)
Zhimei Ren (University of Pennsylvania)
Former organizers
Rina Barber (University of Chicago)
Daniel Yekutieli (Tel Aviv University)
Contact us
If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!