International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.

  • Thursday, October 21, 2021 [Link to join]

    • Speaker: Yao Zhang (University of Cambridge)

    • Title: Multiple conditional randomization tests

    • Abstract: We propose a general framework for (multiple) conditional randomization tests that incorporate several important ideas in the recent literature. We establish a general sufficient condition on the construction of multiple conditional randomization tests under which their p-values are "independent", in the sense that their joint distribution stochastically dominates the product of uniform distributions under the null. Conceptually, we argue that randomization should be understood as the mode of inference precisely based on randomization. We show that under a change of perspective, many existing statistical methods, including permutation tests for (conditional) independence and conformal prediction, are special cases of the general conditional randomization test. The versatility of our framework is further illustrated with an example concerning lagged treatment effects in stepped-wedge randomized trials.

    • Discussant: Panos Toulis (University of Chicago)

    • Links: [Relevant papers: paper #1]

  • Thursday, October 28, 2021 [Link to join]

    • Speaker: Chiara Sabatti (Stanford University)

    • Title: Searching for consistent associations with a multi-environment knockoff filter

    • Abstract: This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across diverse environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations consistently replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is flexible and can be deployed in a wide range of applications, this paper highlights its relevance to genome-wide association studies, in which consistency across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.

    • Discussant: Niklas Pfister (University of Copenhagen)

    • Links: [Relevant papers: paper #1]

  • Thursday, November 4, 2021 [Link to join]

    • Speaker: Kai Zhang (The University of North Carolina at Chapel Hill)

    • Title: BEAUTY Powered BEAST

    • Abstract: We study nonparametric dependence detection with the proposed binary expansion approximation of uniformity (BEAUTY) approach, which generalizes the celebrated Euler's formula, and approximates the characteristic function of any copula with a linear combination of expectations of binary interactions from marginal binary expansions. This novel theory enables a unification of many important tests through approximations from some quadratic forms of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a robust power, we study test statistics with data-adaptive weights, referred to as the binary expansion adaptive symmetry test (BEAST). By utilizing the properties of the binary expansion filtration, we show that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle provides a benchmark of feasible power against any alternative by leading all existing tests with a substantial margin. To approach this oracle power, we develop the BEAST through a regularized resampling approximation of the oracle test. The BEAST improves the empirical power of many existing tests against a wide spectrum of common alternatives and provides clear interpretation of the form of dependency when significant.

    • Discussant: Bhaswar Bhattacharya (University of Pennsylvania)

    • Links: [Relevant papers: paper #1]

  • Thursday, November 11, 2021 [Link to join]

    • Speaker: Shuangning Li (Stanford University)

    • Title: Deploying the Conditional Randomization Test in High Multiplicity Problems

    • Abstract: This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even though the p-values are not independent. We show in simulations that our novel procedure indeed controls the FDR and are competitive with -- and sometimes outperform -- state-of-the-art alternatives in terms of power. Finally, we apply our methodology to a breast cancer dataset with the goal of identifying biomarkers associated with cancer stage.

    • Discussant: Jingyi Jessica Li (UCLA)

    • Links: [Relevant papers: paper #1]


The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.


Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!