Upcoming Seminar Presentations
All seminars take place Wednesdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.
Wednesday, December 6, 2023 [Link to join]
Speaker: Pierre Neuvial (Institut de Mathématiques de Toulouse (IMT))
Title: Selective inference after convex clustering with ℓ1 penalization
Abstract: Classical inference methods notoriously fail when applied to data-driven test hypotheses or inference targets. Instead, dedicated methodologies are required to obtain statistical guarantees for these selective inference problems. Selective inference is particularly relevant post-clustering, typically when testing a difference in mean between two clusters. In this paper, we address convex clustering with ℓ1 penalization, by leveraging related selective inference tools for regression, based on Gaussian vectors conditioned to polyhedral sets. In the one-dimensional case, we prove a polyhedral characterization of obtaining given clusters, than enables us to suggest a test procedure with statistical guarantees. This characterization also allows us to provide a computationally efficient regularization path algorithm. Then, we extend the above test procedure and guarantees to multi-dimensional clustering with ℓ1 penalization, and also to more general multi-dimensional clusterings that aggregate one-dimensional ones. With various numerical experiments, we validate our statistical guarantees and we demonstrate the power of our methods to detect differences in mean between clusters. Our methods are implemented in the R package poclin.
Discussant: Yiqun Chen (Stanford University)
Links: [Relevant papers: paper #1]
Wednesday, December 13, 2023 [Link to join]
Speaker: Lucas Janson (Harvard University)
Title: Leveraging sparsity in the Gaussian linear model for improved inference
Abstract: We develop novel LASSO-based methods for coefficient testing, confidence interval construction, and variable selection in the Gaussian linear model with n ≥ p that have the same finite-sample guarantees as their ubiquitous ordinary-least-squares-t-test-based analogues, yet have substantially higher power when the true coefficient vector is sparse. Empirically, our method often performs like the 1-sided t-test (despite not being given any information about the sign), and in particular our confidence intervals are typically about 20% shorter than the standard t-test based intervals. Our single coefficient testing framework trivially allows for exact adjustment conditional on LASSO selection for post-selection inference, and subsequently applying standard multiple testing procedures to the resulting post-selection-valid p-values again provides significant power gains over existing methods. None of our methods require resampling or Monte Carlo estimation. We perform a variety of simulations and a real data analysis on an HIV drug resistance data set to demonstrate the benefits of our methods over existing work. In the course of developing these methods, we also derive novel properties of the LASSO in the Gaussian linear model that are of independent interest. Finally, we argue, and in some cases demonstrate, that the principles we develop can be extended beyond Gaussian linear models with n ≥ p. This is joint work with Souhardya Sengupta.
Discussant: Zhimei Ren (University of Pennsylvania)
Links: [Relevant papers:]
Wednesday, December 20, 2023 [Link to join]
Speaker: Vladimir Vovk (Royal Holloway, University of London)
Title: The diachronic Bayesian
Abstract: It is well known that a Bayesian probability forecast for the future observations should form a probability measure in order to satisfy a natural condition of coherence. The topic of this paper is the evolution of the Bayesian probability measure over time. We model the process of updating the Bayesian’s beliefs in terms of prediction markets. The resulting picture is adapted to forecasting several steps ahead and making almost optimal decisions.
Discussant: Philip Dawid (University of Cambridge)
Links: [Relevant papers: paper #1]
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!