International Seminar on Selective Inference
A weekly online seminar on selective inference, multiple testing, and post-selection inference.
Gratefully inspired by the Online Causal Inference Seminar
Mailing List
For announcements and Zoom invitations please subscribe to our mailing list.
Upcoming Seminar Presentations
All seminars take place Wednesdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.
The seminar is on summer hiatus and will resume in Fall 2023
Wednesday, October 11, 2023 [Recording]
Speaker: Richard Samworth (University of Cambridge)
Title: Isotonic subgroup selection
Abstract: Given a sample of covariate-response pairs, we consider the subgroup selection problem of identifying a subset of the covariate domain where the regression function exceeds a pre-determined threshold. We introduce a computationally-feasible approach for subgroup selection in the context of multivariate isotonic regression based on martingale tests and multiple testing procedures for logically-structured hypotheses. Our proposed procedure satisfies a non-asymptotic, uniform Type I error rate guarantee with power that attains the minimax optimal rate up to poly-logarithmic factors. Extensions cover classification, isotonic quantile regression and heterogeneous treatment effect settings. Numerical studies on both simulated and real data confirm the practical effectiveness of our proposal, which is implemented in the R package ISS.
Discussant: Xinzhou Guo (The Hong Kong University of Science and Technology)
Links: [Relevant papers: paper #1]
Wednesday, October 18, 2023 [Recording]
Speaker: Peter Grünwald (Centrum Wiskunde & Informatica and Leiden University)
Title: Beyond Neyman-Pearson: testing and confidence without setting alpha in advance
Abstract: A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show a major advantage of mentioning an e-value instead. With p-values, we cannot easily use an extreme observation (e.g. p << alpha) for getting better frequentist decisions. With e-values we can, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data --- thereby providing a handle on the age-old "roving alpha" problem in statistics: we obtain robust "Type-I risk bounds" which hold independently of any preset alpha or loss function. The reasoning can be extended to confidence intervals. When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees.
Discussant: Will Hartog (Stanford University)
Wednesday, October 25, 2023 [Recording]
Speaker: Aaditya Ramdas (Carnegie Mellon University)
Title: Recent advances in multiple testing: negative dependence and randomization
Abstract: The multiple testing literature has primarily dealt with three types of dependence assumptions between p-values: independence, positive regression dependence, and arbitrary dependence. In the first part half, I will first summarize what the first theoretical results under various notions of negative dependence. These include the Simes global null test and the Benjamini-Hochberg procedure, which are known experimentally to be anti-conservative under negative dependence. We prove that the anti-conservativeness of these procedures is bounded by factors smaller than that under arbitrary dependence (in particular, by factors independent of the number of hypotheses tested). In the second half, I will show that the famous Benjamini-Yekutieli procedure for FDR control under arbitrary dependence can be improved (usually strictly) via a simple external randomization. Along the way, we will improve other procedures as well, like the e-BH procedure for FDR control with e-values.
Discussant: Sanat K. Sarkar (Temple University)
Wednesday, November 8, 2023 [Recording]
Speaker: Emmanuel Candès (Stanford University)
Wednesday, November 15, 2023 [Recording]
Speaker: Chiara Sabatti (Stanford University)
Wednesday, November 29, 2023 [Recording]
Speaker: Trambak Banerjee (University of Kansas)
Wednesday, December 6, 2023 [Recording]
Speaker: Pierre Neuvial (Institut de Mathématiques de Toulouse (IMT))
Wednesday, December 13, 2023 [Recording]
Speaker: Lucas Janson (Harvard University)
Wednesday, December 20, 2023 [Recording]
Speaker: Vladimir Vovk (Royal Holloway, University of London)
Format
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
Organizers
Rina Barber (University of Chicago)
Will Fithian (UC Berkeley)
Jelle Goeman (Leiden University)
Lihua Lei (Stanford University)
Daniel Yekutieli (Tel Aviv University)
Contact us
If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!