International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here


  • Thursday, January 21, 2021 [Link to join]

    • Speaker: Etienne Roquain (Sorbonne Université)

    • Title: Structured multiple testing: can one mimic the oracle?

    • Abstract: Knowing the model structure can significantly help to perform a multiple testing inference. Hence, a general aim is to build a procedure mimicking the performances of the oracle, that is, of a benchmark procedure that knows (and uses) this structure. As a case in point, classical structures are derived from the famous two-group model or its extensions, by specifying particular assumptions on the corresponding parameters, as the null/alternative distributions, or the false/null occurrence process. We will discuss the issue of mimicking the oracle for the three following structures and various multiple testing error rates:
      (1) structure = Gaussian null distribution family, error rate= FDR (see https://arxiv.org/abs/1912.03109, joint work with Nicolas Verzelen and https://arxiv.org/abs/1809.08330, joint work with Alexandra Carpentier, Sylvain Delattre and Nicolas Verzelen)
      (2) structure = stochastic block model for the false/null occurrence process, error rate = FDR (see https://arxiv.org/abs/1907.10176, joint work with Tabea Rebafka and Fanny Villers)
      (3) structure = hidden Markov model for the false/null occurrence process, error rate = FDP confidence post hoc bound (preprint to come, joint work with Marie Perrot-Dockès, Gilles Blanchard and Pierre Neuvial) We will emphasize the work (1) above, and show that building a confidence region for the structure parameter can be fruitful to know whether mimicking the oracle is possible and how to mimic it when it is possible.

    • Links: [Relevant papers: paper #1, paper #2, paper #3]


  • Thursday, January 28, 2021 [Link to join]

    • Speaker: Ali Shojaie (University of Washington)

    • Title: Nonparametric Inference for Infinite-Dimensional Parameters via a Generalized Score Test

    • Abstract: Infinite-dimensional parameters that can be defined as the minimizer of a population risk arise naturally in many applications. Classic examples include the conditional mean function and the density function. Though there is extensive literature on constructing consistent estimators for infinite-dimensional risk minimizers, there is limited work on quantifying the uncertainty associated with such estimates via, e.g., hypothesis testing and construction of confidence regions. We propose a general inferential framework for infinite-dimensional risk minimizers as a nonparametric extension of the score test. We illustrate that our framework requires only mild assumptions and is applicable to a variety of estimation problems. In examples, we specialize our proposed methodology to estimation of regression functions with continuous outcomes and also consider a partially additive model as an extension of the more classical partially linear model.


  • Thursday, February 4, 2021 [Link to join]

    • Speaker: Arian Maleki (Columbia University)

    • Title: Comparing Variable Selection Techniques Under a High-Dimensional Asymptotic

    • Abstract: In this talk, we discuss the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations, n, grows at the same rate as the number of predictors, p. We consider two-stage variable selection techniques (TVS) in which the first stage obtains an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the “important” predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated, and their optimality will be discussed.



  • Thursday, February 18, 2021 [Link to join]

    • Speaker: Tijana Zrnic (UC Berkeley)

    • Title: Title: Post-Selection Inference via Algorithmic Stability

    • Abstract: Modern approaches to data analysis make extensive use of data-driven model selection. The resulting dependencies between the selected model and data used for inference invalidate statistical guarantees derived from classical theories. The framework of post-selection inference (PoSI) has formalized this problem and proposed corrections which ensure valid inferences. Yet, obtaining general principles that enable computationally-efficient, powerful PoSI methodology with formal guarantees remains a challenge. With this goal in mind, we revisit the PoSI problem through the lens of algorithmic stability. Under an appropriate formulation of stability---one that captures closure under post-processing and compositionality properties---we show that stability parameters of a selection method alone suffice to provide non-trivial corrections to classical z-test and t-test intervals. Then, for several popular model selection methods, including the LASSO, we show how stability can be achieved through simple, computationally efficient randomization schemes. Our algorithms offer provable unconditional simultaneous coverage and are computationally efficient; in particular, they do not rely on MCMC sampling. Importantly, our proposal explicitly relates the magnitude of randomization to the resulting confidence interval width, allowing the analyst to tune interval width to the loss in utility due to randomizing selection. This is joint work with Michael I. Jordan.


Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!