Upcoming Seminar Presentations
All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here
Thursday, January 21, 2021 [Link to join]
Speaker: Etienne Roquain (Sorbonne Université)
Title: Structured multiple testing: can one mimic the oracle?
Abstract: Knowing the model structure can significantly help to perform a multiple testing inference. Hence, a general aim is to build a procedure mimicking the performances of the oracle, that is, of a benchmark procedure that knows (and uses) this structure. As a case in point, classical structures are derived from the famous two-group model or its extensions, by specifying particular assumptions on the corresponding parameters, as the null/alternative distributions, or the false/null occurrence process. We will discuss the issue of mimicking the oracle for the three following structures and various multiple testing error rates:
(1) structure = Gaussian null distribution family, error rate= FDR (see https://arxiv.org/abs/1912.03109, joint work with Nicolas Verzelen and https://arxiv.org/abs/1809.08330, joint work with Alexandra Carpentier, Sylvain Delattre and Nicolas Verzelen)
(2) structure = stochastic block model for the false/null occurrence process, error rate = FDR (see https://arxiv.org/abs/1907.10176, joint work with Tabea Rebafka and Fanny Villers)
(3) structure = hidden Markov model for the false/null occurrence process, error rate = FDP confidence post hoc bound (preprint to come, joint work with Marie Perrot-Dockès, Gilles Blanchard and Pierre Neuvial) We will emphasize the work (1) above, and show that building a confidence region for the structure parameter can be fruitful to know whether mimicking the oracle is possible and how to mimic it when it is possible.
Thursday, January 28, 2021 [Link to join]
Speaker: Ali Shojaie (University of Washington)
Title: Nonparametric Inference for Infinite-Dimensional Parameters via a Generalized Score Test
Thursday, February 4, 2021 [Link to join]
Speaker: Arian Maleki (Columbia University)
Title: Comparing Variable Selection Techniques Under a High-Dimensional Asymptotic
Abstract: In this talk, we discuss the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations, n, grows at the same rate as the number of predictors, p. We consider two-stage variable selection techniques (TVS) in which the first stage obtains an estimate of the regression coefficients, and the second stage simply thresholds this estimate to select the “important” predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated, and their optimality will be discussed.
Thursday, February 18, 2021 [Link to join]
Speaker: Tijana Zrnic (UC Berkeley)
Title: Title: Post-Selection Inference via Algorithmic Stability
Abstract: Modern approaches to data analysis make extensive use of data-driven model selection. The resulting dependencies between the selected model and data used for inference invalidate statistical guarantees derived from classical theories. The framework of post-selection inference (PoSI) has formalized this problem and proposed corrections which ensure valid inferences. Yet, obtaining general principles that enable computationally-efficient, powerful PoSI methodology with formal guarantees remains a challenge. With this goal in mind, we revisit the PoSI problem through the lens of algorithmic stability. Under an appropriate formulation of stability---one that captures closure under post-processing and compositionality properties---we show that stability parameters of a selection method alone suffice to provide non-trivial corrections to classical z-test and t-test intervals. Then, for several popular model selection methods, including the LASSO, we show how stability can be achieved through simple, computationally efficient randomization schemes. Our algorithms offer provable unconditional simultaneous coverage and are computationally efficient; in particular, they do not rely on MCMC sampling. Importantly, our proposal explicitly relates the magnitude of randomization to the resulting confidence interval width, allowing the analyst to tune interval width to the loss in utility due to randomizing selection. This is joint work with Michael I. Jordan.
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!