International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
For announcements and Zoom invitations please subscribe to our mailing list. Our seminar (typically) runs on Mondays, at 8:30am PT / 11:30am ET / 4:30pm London / 5:30pm Amsterdam / 6:30pm Tel Aviv.
Wednesday, November 19, 2025 (200-th ISSI seminar) [link to join]
Speaker: Emmanuel Candès (Stanford University)
Title: What Statistics and AI Offer Each Other?
Abstract: This talk will discuss how thinking carefully about AI inputs and outputs yields more powerful, safer AI. By examining several vignettes, we shall answer questions such as: How do we train language models under cost constraints? What happens when we’ve exhausted all available data? If I start a clinical trial using the drug AI thinks is best, will it pan out? How can we ensure high quality products when AI is used in a larger workflow? That is, how do I know whether AI automated a task correctly? AI powered predictions are beginning to substitute for real data when collection of the latter is difficult, slow, or costly. How then should we leverage machine learning predictions both as a substitute for high-quality data and as a tool for guiding real data collection?
Monday, November 24, 2025 [link to join]
Speaker: Wanrong Zhu (UC Irvine)
Title: Conformal prediction after data-dependent model selection
Abstract: Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further splitting the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after data-dependent model selection -- commonly, selecting the model that minimizes the width of the resulting prediction sets. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without invoking additional sample-splitting. We show that our methods yield prediction sets with asymptotically optimal width under certain notion of regularity for the model class. The improvement in the width of the prediction sets constructed by our methods are further demonstrated through applications to synthetic datasets in various settings and a real data example.
Discussant: Ran Xie (Stanford University)
Links: [Relevant papers: paper #1]
Monday, December 8, 2025 [link to join]
Speaker: Patrick Kline (UC Berkeley)
Title: Branching Fixed Effects: A Proposal for Communicating Uncertainty
Abstract: Economists often rely on estimates of linear fixed effects models developed by other teams of researchers. Assessing the uncertainty in these estimates can be challenging. I propose a form of sample plitting for network data that breaks two-way fixed effects estimates into statistically independent branches, each of which provides an unbiased estimate of the parameters of interest. These branches facilitate uncertainty quantification, moment estimation, and shrinkage. Algorithms are developed for efficiently extracting branches from large datasets. I illustrate these techniques using a benchmark dataset from Veneto, Italy that has been widely used to study firm wage effects.
Discussant:
Links: [Relevant papers: paper #1]
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Jelle Goeman (Leiden University)
Nikos Ignatiadis (University of Chicago)
Lihua Lei (Stanford University)
Zhimei Ren (University of Pennsylvania)
Will Fithian (UC Berkeley)
Rina Barber (University of Chicago)
Daniel Yekutieli (Tel Aviv University)
If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!