International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.


  • Thursday, June 30, 2022 [Link to join]

    • Speaker: Zhanrui Cai (Carnegie Mellon University)

    • Title: Robust Cross Validation with Confidence

    • Abstract: Cross validation is one of the most popular tools for model selection and tunning parameter selection in the modern statistics and machine learning community. By dividing the sample into K-folds, cross validation first train the models on $K-1$ folds of data, and test the prediction error on the remaining dataset. Then it chooses the model / tunning parameter that has the smallest test error. Recent studies aim to improve the confidence level for the models selected by cross validation (Lei, 2020), but may not be suitable for skewed/ heavy tailed data, or data with outliers. In this paper, we propose a robust cross validation method. Instead of comparing the mean of the prediction error, we propose to compare the quantiles of the test error due to its skewness nature. We illustrate the necessity of rank-sum comparison through motivating examples, and demonstrate the advantage of the proposed robust cross validation method through extensive simulation and real data analysis. In order to study the limiting distribution of the evaluation criterion, we develop the Gaussian approximation theory for high dimensional two sample U-statistics, which may be of independent interest.

    • Discussant: Morgane Austern (Harvard University)

    • Links: [Relevant papers: ]


  • Thursday, July 14, 2022 (100-th ISSI seminar) [Link to join]

    • Speaker: Yoav Benjamini (Tel Aviv University)

    • Title: Trends and challenges in research about selective inference and its practice

    • Abstract: The international seminar on selective inference gives us an opportunity to identify trends in this important research area, discuss common topics of interest and raise some challenges. I’ll try to use this opportunity for these purposes, but obviously the challenges will reflect my own point of view.


  • Thursday, July 21, 2022 [Link to join]

    • Speaker: Dacheng Xiu (University of Chicago)

    • Title: Prediction When Factors are Weak

    • Abstract: Principal component analysis (PCA) has been the most prevalent approach to the recovery of factors. Nevertheless, the theoretical justification of the PCA-based approach often relies on a convenient and critical assumption that factors are pervasive. To incorporate information from weaker factors in the context of prediction, we propose a new procedure based on supervised PCA, which iterates over selection, PCA, and projection. The selection step finds a subset of predictors most correlated with the prediction target, whereas the projection step permits multiple weak factors of distinct strength. We justify our procedure in an asymptotic scheme where both the sample size and the cross-sectional dimension increase at potentially different rates. Our empirical analysis highlights the role of weak factors in predicting inflation.

    • Discussant:

    • Links: [Relevant papers: ]


  • Thursday, July 28, 2022 [Link to join]

    • Speaker: Trambak Banerjee (University of Kansas)

    • Title: Nonparametric Empirical Bayes Estimation On Heterogeneous Data

    • Abstract: The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the ``Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. Our theoretical results show that NEST has strong asymptotic properties and in our simulation studies NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns.

    • Discussant:

    • Links: [Relevant papers: paper #1]

Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!