International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
For announcements and Zoom invitations please subscribe to our mailing list. Our seminar (typically) runs on Thursdays, at 8:30am PT / 11:30am ET / 4:30pm London / 5:30pm Amsterdam / 6:30pm Tel Aviv.
Thursday, February 19, 2026 [Link to join]
Speaker: Adam Jaffe (Columbia University)
Title: Constrained Denoising, Empirical Bayes, and Optimal Transport
Abstract: In the statistical problem of denoising, Bayes and empirical Bayes methods can "overshrink" their output relative to the latent variables of interest. This work is focused on constrained denoising problems which mitigate such phenomena. At the oracle level, i.e., when the latent variable distribution is assumed known, we apply tools from the theory of optimal transport to characterize the solution to (i) variance-constrained, (ii) distribution-constrained, and (iii) general-constrained denoising problems. At the empirical level, i.e., when the latent variable distribution is not known, we use empirical Bayes methodology to estimate these oracle denoisers. Our approach is modular, and transforms any suitable (unconstrained) empirical Bayes denoiser into a constrained empirical Bayes denoiser. We prove explicit rates of convergence for our proposed methodologies, which both extend and sharpen existing asymptotic results that have previously considered only variance constraints. We apply our methodology in two applications: one in astronomy concerning the relative chemical abundances in a large catalog of red-clump stars, and one in baseball concerning minor- and major league batting skill for rookie players.
Discussant: Jake Soloff (University of Michigan)
Links: [Relevant papers: paper #1]
Thursday, February 26, 2026 [Link to join]
Speaker: Yash Nair (Stanford University)
Title: Diversifying Conformal Selections
Abstract: When selecting from a list of potential candidates, it is important to ensure not only that the selected items are of high quality, but also that they are sufficiently dissimilar so as to both avoid redundancy and to capture a broader range of desirable properties. In drug discovery, scientists aim to select potent drugs from a library of unsynthesized candidates, but recognize that it is wasteful to repeatedly synthesize highly similar compounds. In job hiring, recruiters may wish to hire candidates who will perform well on the job, while also considering factors such as socioeconomic background, prior work experience, gender, or race. We study the problem of using any prediction model to construct a maximally diverse selection set of candidates while controlling the false discovery rate (FDR) in a model-free fashion. Our method, diversity-aware conformal selection (DACS), achieves this by designing a general optimization procedure to construct a diverse selection set subject to a simple constraint involving conformal e-values which depend on carefully chosen stopping times. The key idea of DACS is to use optimal stopping theory to adaptively choose the set of e-values which (approximately) maximizes the expected diversity measure. We give an example diversity metric for which our procedure can be run exactly and efficiently. We also develop a number of computational heuristics which greatly improve its running time for generic diversity metrics. We demonstrate the empirical performance of our method both in simulation and on job hiring and drug discovery datasets.
Discussant: Ulysee Gazin (The Laboratoire de Probabilités, Statistique et Modélisation)
Links: [Relevant papers: paper #1]
Thursday, March 5, 2026 [Link to join]
Speaker: Yanjun Han (New York University)
Title: Two Roads to Empirical Bayes: Mean-Field Approximation and Universal Priors
Abstract: In high-dimensional compound decision problems, empirical Bayes seeks to approximate the Bayes decision rule under an unknown prior governing many parameters. This perspective suggests two principled approximation strategies: either approximate the unknown prior by an i.i.d. surrogate and estimate it from the data, or replace it with a prescribed dependent surrogate and approximate its Bayes rule through pretraining.
Under the first approach, we show quantitatively that high-dimensional conditional expectations under a random permutation prior admit a sharp mean-field approximation. Applied to the classical problem of distribution estimation, this analysis yields an estimator that achieves optimal instance-wise risk in a competitive framework and ultimately bests the classical Good--Turing estimator in both theory and practice.
Under the second approach, we formalize recent empirical evidence that transformers pretrained on synthetic data perform strongly on empirical Bayes tasks. Focusing on the Poisson model, we establish the existence of universal priors under which a pretrained estimator achieves near-optimal regret uniformly over arbitrary test distributions. Our analysis interprets the pretrained estimator as performing hierarchical Bayesian inference: adaptation to unknown test priors arises through posterior contraction, and length generalization (when the test sequence exceeds the training length) corresponds to inference under a fractional posterior. Numerical experiments with pretrained transformers support these theoretical predictions.
Thursday, March 12, 2026 [Link to join]
Speaker: Peter Hoff (Duke University)
Title: Selective and marginal selective inference for exceptional groups
Abstract: Statistical analyses of multipopulation studies often use the data to select a particular population as the target of inference. For example, a confidence interval may be constructed for a population only in the event that its sample mean is larger than that of the other populations. We show that for the normal means model, confidence interval procedures that maintain strict coverage control conditional on such a selection event will have infinite expected width. For applications where such selective coverage control is of interest, this result motivates the development of procedures with finite expected width and approximate selective coverage control over a range of plausible parameter values. To this end, we develop selection-adjusted empirical Bayes confidence procedures that use information from the data to approximate an oracle confidence procedure that has exact selective coverage control and finite expected width. In numerical comparisons of the oracle and empirical Bayes procedures to procedures that only guarantee selective coverage control marginally over selection events, we find that improved selective coverage control comes at the cost of increased expected interval width.
Discussant:
Links: [Relevant papers: paper #1]
Thursday, March 19, 2026 [Link to join]
Speaker: Yuval Benjamini (Hebrew University)
Thursday, March 26, 2026 [Link to join]
Speaker: Jiadong Liang (University of Pennsylvania)
Thursday, April 2, 2026 [Link to join]
Speaker: Aureo de Paula (University College London, CeMMAP and Institute for Fiscal Studies)
Title: Prediction Sets and Conformal Inference with Interval Outcomes
Abstract: Given data on a scalar random variable 𝑌, a prediction set for 𝑌 with miscoverage level 𝛼 is a set of values for 𝑌 that contains a randomly drawn 𝑌 with probability 1 − 𝛼, where 𝛼 ∈ (0, 1). Among all prediction sets that satisfy this coverage property, the oracle prediction set is the one with the smallest volume. This paper provides estimation methods of such prediction sets given observed conditioning covariates when 𝑌 is censored or measured in intervals. We first characterise the oracle prediction set under interval censoring and develop consistent estimators for the oracle prediction intervals and prediction sets consisting of multiple disjoint intervals. We use conformal inference to construct a prediction set that achieves finite-sample validity under censoring and maintains consistency as sample size increases, using a conformity score function designed for interval data. The procedure accommodates the prediction uncertainty that is irreducible (due to the stochastic nature of outcomes), the modelling uncertainty due to partial identification and also sampling uncertainty that gets reduced as samples get larger. We conduct a set of Monte Carlo simulations and an application to data from the Current Population Survey. The results highlight the robustness and efficiency of the proposed methods.
Discussant:
Links: [Relevant papers: paper #1]
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Jelle Goeman (Leiden University)
Nikos Ignatiadis (University of Chicago)
Lihua Lei (Stanford University)
Zhimei Ren (University of Pennsylvania)
Will Fithian (UC Berkeley)
Rina Barber (University of Chicago)
Daniel Yekutieli (Tel Aviv University)
If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!