International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.


  • Friday, December 9, 2022 (STAMPS-ISSI joint seminar, 10:30 am PT/1:30 pm ET / 6:30 pm London / 8:30 pm Tel Aviv) (the Zoom link will be shared via the mailing list)

    • Speaker: Rebecca Willett (University of Chicago)

    • Title: Machine Learning for Inverse Problems in Climate Science

    • Abstract: Machine learning has the potential to transform climate research. This fundamental change cannot be realized through the straightforward application of existing off-the-shelf machine learning tools alone. Rather, we need novel methods for incorporating physical models and constraints into learning systems. In this talk, I will discuss inverse problems central to climate science — data assimilation and simulator model fitting — and how machine learning yields methods with high predictive skill and computational efficiency. First, I will describe a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, our methods leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Second, I will describe learning emulators of high-dimensional climate forecasting models targeting parameter estimation with uncertainty estimation. We assume access to a computationally complex climate simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters that best fit data. Our framework learns feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. These methods build upon insights from inverse problems, data assimilation, stochastic filtering, and optimization, highlighting how theory can inform the design of machine learning systems in the natural sciences.


  • Thursday, December 15, 2022 [link to join]

    • Speaker: Stephen Bates (UC Berkeley)

    • Title: Principal-Agent Hypothesis Testing

    • Abstract: Consider the relationship between the FDA (the principal) and a pharmaceutical company (the agent). The pharmaceutical company wishes to sell a product to make a profit, and the FDA wishes to ensure that only efficacious drugs are released to the public. The efficacy of the drug is not known to the FDA, so the pharmaceutical company must run a costly trial to prove efficacy to the FDA. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested pharmaceutical company; a lower standard of statistical evidence incentivizes the pharmaceutical company to run more trials for drugs that are less likely to be effective, since the drug may pass the trial by chance, resulting in large profits. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial to understanding this system and designing protocols with high social utility. In this work, we discuss how the principal and agent can enter into a contract with payoffs based on statistical evidence. When there is stronger evidence for the quality of the product, the principal allows the agent to make a larger profit. We show how to design contracts that are robust to an agent's strategic actions, and derive the optimal contract in the presence of strategic behavior.

    • Discussant:

    • Links: [Relevant papers: paper #1]


  • Wednesday, February 1, 2023 [link to join]

    • Speaker: Ruth Heller (Tel Aviv University)

    • Title: Replicability Across Multiple Studies

    • Abstract: Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by a signal in a single study, and thus non-replicable. The lack of replicability of scientific findings has been of great concern following the influential paper of Ioannidis (2005). Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.

    • Discussant:

    • Links: [Relevant papers: paper #1]

Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!