International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.

  • Thursday, June 24, 2021 [Link to join]

    • Speaker: Jason Hsu (The Ohio State University)

    • Title: Confident Directional Selective Inference, from Multiple Comparisons with the Best to Precision Medicine

    • Abstract: MCB (multiple comparisons with the best, 1981, 1984), comparing treatments to the best without knowing which one is the best, can be considered an early example of selective inference. With the thinking that "there is only one true best", the relevance of MCB to this presentation is it led to the Partitioning Principle, which is essential for deriving confidence sets for stepwise tests. Inference based on confidence sets control the directional error rate, inference based on tests of equalities may not.

The FDA gave Accelerated Approval to Aduhelm^{TM} (aducanumab) for Alzheimer's Disease (AD) on 8 June 2021, based on its reduction of beta-amyloid plaque (a surrogate biomarker endpoint). When clinical efficacy of a treatment for the overall population is not shown, genome-wide association studies (GWAS) are often used to discover SNPs that might predict efficacy in subgroups. In the process of working on GWAS with real data, we came to realization that, if one causal SNP makes its zero-null hypothesis false, then all other zero-null hypotheses are statistically false as well. While the majority of no-association null hypotheses might well be true biologically, statistically they are false (if one is false) in GWAS. I will indeed illustrate this with a causal SNP for the ApoE gene which is involved in the clearance of beta-amyloid plaque in AD. We suggest our confidence interval CE4 approach instead.

Targeted therapies such as OPDIVO and TECENTRIQ naturally have patient subgroups, already defined by the extent to which the drug target is present or absent in them, subgroups that may derive differential efficacy. An additional danger of testing equality nulls in the presence of subgroups is that the illusory logical relationships among efficacy in subgroups and their mixtures created by exact quality nulls leads to too drastic a stepwise multiplicity reduction, resulting in inflated directional error rates, as I will explain. Instead, Partition Tests, which would be called Confident Direction methods in the language of Tukey, might be safer to use.

  • Thursday, July 1, 2021 [Link to join]

    • Speaker: Xiao Li (UC Berkeley)

    • Title: Whiteout: when do fixed-X knockoffs fail?

    • Abstract: A core strength of knockoff methods is their virtually limitless customizability, allowing an analyst to exploit machine learning algorithms and domain knowledge without threatening the method’s robust finitesample false discovery rate control guarantee. While several previous works have investigated regimes where specific implementations of knockoffs are provably powerful, negative results are more difficult to obtain for such a flexible method. In this work we recast the fixed-X knockoff filter for the Gaussian linear model as a conditional post-selection inference method that adds user-generated Gaussian noise to the ordinary least squares estimator βˆ to obtain a “whitened” estimator β˜ with uncorrelated entries, and performs inference using sgn(β˜j ) as the test statistic for Hj : βj = 0. We prove equivalence between our whitening formulation and the more standard formulation based on negative control predictor variables, showing how the fixed-X knockoffs framework can be used for multiple testing on any problem with (asymptotically) multivariate Gaussian parameter estimates. Relying on this perspective, we obtain the first negative results that universally upper-bound the power of all fixed-X knockoff methods, without regard to choices made by the analyst. Our results show roughly that, if the leading eigenvalues of Var(βˆ) are large with dense leading eigenvectors, then there is no way to whiten βˆ without irreparably erasing nearly all of the signal, rendering sgn(β˜j ) too uninformative for accurate inference. We give conditions under which the true positive rate (TPR) for any fixed-X knockoff method must converge to zero even while the TPR of Bonferroni-corrected multiple testing tends to one, and we explore several examples illustrating this phenomenon.

    • Discussant: Asher Spector (Harvard University)

  • Thursday, July 8, 2021 [Link to join]

    • Speaker: Zheng (Tracy) Ke (Harvard University)

    • Title: Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

    • Abstract: As the power of FDR control methods for high-dimensional variable selections has been mostly evaluated empirically, we focus here on theoretical power analyses of two recent such methods, the knockoff filter and the Gaussian mirror. We adopt the Rare/Weak signal model, popular in multiple testing and variable selection literature, and characterize the rate of convergence of the number of false positives and the number of false negatives of FDR control methods for particular classes of designs. Our analyses lead to several noteworthy discoveries. First, the choice of the symmetric statistic in FDR control methods crucially affects the power. Second, with a proper symmetric statistic, the operation of adding “noise” to achieve FDR control yields almost no loss of power compared with its prototype, at least for some special classes of designs. Third, the knockoff filter and Gaussian mirror have comparable power for orthogonal designs, but they behave differently for non-orthogonal designs. We study the block-wise diagonal designs and show that the knockoff filter has a higher power when the regression coefficient vector is extremely sparse, and the Gaussian mirror has a higher power when the coefficient vector is moderately sparse.

    • Links: [Relevant papers: paper #1]

  • Thursday, August 12, 2021 [Link to join]

    • Speaker: Sanat K. Sarkar (Temple University)

    • Title: Adjusting the Benjamini-Hochberg method for controlling the false discovery rate in knockoff-assisted variable selection

    • Abstract: The knockoff-based multiple testing setup of Barber & Candès (2015) for variable selection in multiple regression where sample size is as large as the number of explanatory variables is considered. The Benjamini-Hochberg method based on ordinary least squares estimates of the regression coefficients is adjusted to the setup, transforming it to a valid p-value based FDR controlling method not relying on any specific correlation structure of the explanatory variables. Simulations and real data applications show that our proposed method that is agnostic to $\pi_0$, the proportion of unimportant explanatory variables, and a data-adaptive version of it that uses an estimate of $\pi_0$ are powerful competitors of the FDR controlling methods in Barber & Candès (2015).

    • Links: [Relevant papers: paper #1]


The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.


Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!