International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here

  • Thursday, May 21, 2020 [Link to join]

    • Speaker: Yoav Benjamini (Tel Aviv University)

    • Title: Confidence Intervals for selected parameters

    • Abstract: Practical or scientific considerations may lead to selecting a subset of parameters as ‘important’. Inferences about the selected parameters often are based on the same data used for selection. We present a taxonomy of error-rates for selective confidence intervals then focus on controlling the probability that one or more intervals for selected parameter do not cover–the simultaneous over the selected (SoS) error-rate. We use two approaches to construct SoS-controlling confidence intervals for k location parameters out of m, deemed most important because their estimators are the largest. The new intervals improve substantially over Sidak intervals when k<<m, and approach Bonferroni corrected when k is close to m. (Joint work with Yotam Hechtlinger and Philip Stark)

    • Discussant: Aaditya Ramdas (Carnegie Mellon University)

    • Links: [Relevant paper] [Slides]

  • Thursday May 28, 2020 [Link to join]

    • Speaker: Jingshu Wang (University of Chicago)

    • Title: Detecting Multiple Replicating Signals using Adaptive Filtering Procedures

    • Abstract: Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, study populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple studies. In many contemporary applications, ex. comparing multiple high-throughput genetic experiments, a large number M of PC nulls need to be tested simultaneously, calling for a multiple comparison correction. However, standard multiple testing adjustments on the M PC p-values can be severely conservative, especially when M is large and the signals are sparse. We introduce AdaFilter, a new multiple testing procedure that increases power by adaptively filtering out unlikely candidates of PC nulls. We prove that AdaFilter can control FWER and FDR as long as data across studies are independent, and has much higher power than other existing methods. We illustrate the application of AdaFilter with three examples: microarray studies of Duchenne muscular dystrophy, single-cell RNA sequencing of T cells in lung cancer tumors and GWAS for metabolomics.

    • Discussant: Eugene Katsevich (Carnegie Mellon University)

    • Links: [Relevant paper]

  • Thursday, June 4, 2020 [Link to join]

    • Speaker: Saharon Rosset (Tel Aviv University)

    • Title: Optimal multiple testing procedures for strong control and for the two-group model

    • Abstract: Multiple testing problems are a staple of modern statistics. The fundamental objective is to reject as many false null hypotheses as possible, subject to controlling an overall measure of false discovery, like family-wise error rate (FWER) or false discovery rate (FDR). We formulate multiple testing of simple hypotheses as an infinite-dimensional optimization problem, seeking the most powerful rejection policy which guarantees strong control of the selected measure. We show that for exchangeable hypotheses, for FWER or FDR and relevant notions of power, these problems lead to infinite programs that can provably be solved. We explore maximin rules for complex alternatives, and show they can be found in practice, leading to improved practical procedures compared to existing alternatives. We derive explicit optimal tests for FWER or FDR control for three independent normal means. We find that the power gain over natural competitors is substantial in all settings examined. We apply our optimal maximin rule to subgroup analyses in systematic reviews from the Cochrane library, leading to an increased number of findings compared to existing alternatives.
      As time permits I will also review our follow-up work on optimal rules for controlling FDR or positive FDR in the two-group model, in high dimension and under arbitrary dependence. Our results show substantial and interesting differences between the standard approach for controlling the mFDR and our new solutions, in particular we attain substantially increased power (expected number of true rejections).
      Joint work with Ruth Heller, Amichai Painsky and Udi Aharoni
      .

    • Discussant: Wenguang Sun (University of Southern California)

    • Links: [Relevant papers: paper #1, paper #2]

  • Thursday, June 11, 2020 [Link to join]

    • Speaker: Dongming Huang (Harvard University)

    • Title: Controlled Variable Selection with More Flexibility

    • Abstract: The recent model-X knockoffs method selects variables with provable and non-asymptotical error control and with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known distribution. In this talk, I will show that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as Ω(np) parameters, where p is the dimension and n is the number of covariate samples (including unlabeled samples if available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models, conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. I will demonstrate how to do this for medium-dimensional Gaussian models, high-dimensional Gaussian graphical models, and discrete graphical models. Simulations show the new approach remains powerful under the weaker assumptions. This talk is based on joint work with Lucas Janson.

    • Links: [Relevant paper]

  • Thursday, June 18, 2020 [Link to join]
    (Seminar hosted jointly with the CIRM-Luminy meeting on Mathematical Methods of Modern Statistics 2)

    • Speaker: Weijie Su (University of Pennsylvania)

    • Title: Gaussian Differential Privacy

    • Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. In this talk, we propose a relaxation of DP that we term "f-DP", which has a number of appealing properties and avoids some of the difficulties associated with prior relaxations. First, f-DP preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single-parameter family of definitions within our class that is termed "Gaussian Differential Privacy", based on hypothesis testing of two shifted normal distributions. We prove that this family is focal to f-DP by introducing a central limit theorem, which shows that the privacy guarantees of any hypothesis-testing based definition of privacy (including differential privacy) converge to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. This is joint work with Jinshuo Dong and Aaron Roth.

    • Discussant: TBD

    • Links: [Relevant paper]

  • Thursday, June 25, 2020 [Link to join]

  • Thursday, July 2, 2020 [Link to join]

  • Thursday, July 9, 2020 [Link to join]

  • Thursday, July 23, 2020 [Link to join]

  • Thursday, July 30, 2020 [Link to join]

Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!