International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here


  • Thursday, October 29, 2020 [Link to join]

    • Speaker: Robert Lunde (University of Texas, Austin)

    • Title: Resampling for Network Data

    • Abstract: Network data, which represent complex relationships between different entities, have become increasingly common in fields ranging from neuroscience to social network analysis. To address key scientific questions in these domains, versatile inferential methods for network-valued data are needed. In this talk, I will discuss our recent work on network analogs of the three main resampling methods: subsampling, the jackknife, and the bootstrap. While network data are generally dependent, under the sparse graphon model, we show that these resampling procedures exhibit similar properties to their IID counterparts. I will also discuss related theoretical results, including central limit theorems for eigenvalues and a network Efron-Stein inequality. This is joint work with Purnamrita Sarkar and Qiaohui Lin.

    • Discussant: Liza Levina (University of Michigan)

    • Links: [Relevant papers: paper #1, paper #2, paper #3]


  • Thursday, November 5, 2020 [Link to join]

    • Speaker: Gilles Blanchard (Université Paris Sud)

    • Title: Agnostic post hoc approaches to false positive control

    • Abstract: Classical approaches to multiple testing grant control over the amount of false positives for a specific method prescribing the set of rejected hypotheses. In practice many users tend to deviate from a strictly prescribed multiple testing method and follow ad-hoc rejection rules, tune some parameters by hand, compare several methods and pick from their results the one that suits them best, etc. This will invalidate standard statistical guarantees because of the selection effect. To compensate for any form of such ”data snooping”, an approach which has garnered significant interest recently is to derive ”user-agnostic”, or post hoc, bounds on the false positives valid uniformly over all possible rejection sets; this allows arbitrary data snooping from the user. We present two contributions: starting from a common approach to post hoc bounds taking into account the p-value level sets for any candidate rejection set, we analyze how to calibrate the bound under different assumptions concerning the distribution of p-values. We then build towards a general approach to the problem using a family of candidate rejection subsets (call this a reference family) together with associated bounds on the number of false positives they contain, the latter holding uniformly over the family. It is then possible to interpolate from this reference family to find a bound valid for any candidate rejection subset. This general program encompasses in particular the p-value level sets considered earlier; we illustrate its interest in a different context where the reference subsets are fixed and spatially structured. (Joint work with Pierre Neuvial and Etienne Roquain.)

    • Links: [Relevant paper]












Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!