Past Seminar Presentations


  • Thursday, May 14, 2020 [Link to join]

    • Speaker: Malgorzata Bogdan (Uniwersytet Wroclawski)

    • Title: Adaptive Bayesian Version of SLOPE

    • Abstract: Sorted L-One Penalized Estimation (SLOPE) is a convex optimization procedure for identifying predictors in large data bases. It extends the popular Least Absolute Shrinkage and Selection Estimator (LASSO) by replacing the L1 norm penalty with the Sorted L-One Norm. It provably controls FDR under orthogonal designs and yields asymptotically minimax estimators of regression coefficients in sparse high-dimensional regression. In this talk I will briefly introduce the method and explain problems with FDR control under correlated designs. We will then discuss a novel adaptive Bayesian version of SLOPE (ABSLOPE), which addresses these issues and allows for simultaneous variable selection and parameter estimation, despite the missing values. We will also discuss a strong screening rule for discarding predictors for SLOPE, which substantially speeds up the SLOPE and ABSLOPE algorithms .

    • Discussant: Cynthia Rush (Columbia University)

    • Links: [Slides] [Relevant papers: paper #1, paper #2, paper #3] [Recording]


  • Thursday, May 7, 2020 [Link to join]

    • Speaker: Aldo Solari (University of Milano-Bicocca)

    • Title: Exploratory Inference for Brain Imaging

    • Abstract: Modern data analysis can be highly exploratory. In brain imaging, for example, researchers often highlight patterns of brain activity suggested by the data, but false discoveries are likely to intrude into this selection. How confident can the researcher be about a pattern that has been found, if that pattern has been selected from so many potential patterns?
      In this talk we present a recent approach - termed 'All-Resolutions Inference' (ARI) - that delivers lower confidence bounds to the number of true discoveries in any selected set of voxels. Notably, these bounds are simultaneously valid for all possible selections. This allows a truly interactive approach to post-selection inference, that does not set any limits on the way the researcher chooses to perform the selection.

    • Discussant: Genevera Allen (Rice University)

    • Links: [Recording][Relevant papers: paper #1, paper #2, paper #3] [Slides]


  • Thursday, Apr 30, 2020 [Link to join]

    • Speaker: Yingying Fan (University of Southern California)

    • Title: Universal Rank Inference via Residual Subsampling with Application to Large Networks

    • Abstract: Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual subsampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual matrix after extracting the spiked components. The test statistic converges in distribution to the standard normal under the null hypothesis, and diverges to infinity with asymptotic probability one under the alternative hypothesis. The effectiveness of RIRS procedure is justified theoretically, utilizing the asymptotic expansions of eigenvectors and eigenvalues for large random matrices recently developed in Fan et al. (2019a) and Fan et al. (2019b). The advantages of the newly suggested procedure are demonstrated through several simulation and real data examples. This work is joint with Xiao Han and Qing Yang.

    • Discussant: Yuekai Sun (University of Michigan)

    • Links: [Recording] [Relevant paper] [Slides]


  • Thursday, Apr 23, 2020

    • Speaker: Aaditya Ramdas (Carnegie Mellon University)

    • Title: Ville’s inequality, Robbins’ confidence sequences, and nonparametric supermartingales

    • Abstract:

Standard textbook confidence intervals are only valid at fixed sample sizes, but scientific datasets are often collected sequentially and potentially stopped early, thus introducing a critical selection bias. A "confidence sequence” is a sequence of intervals, one for each sample size, that are uniformly valid over all sample sizes, and are thus valid at arbitrary data-dependent sample sizes. One can show that constructing the former at every time step guarantees false coverage rate control, while constructing the latter at each time step guarantees post-hoc familywise error rate control. We show that at a price of about two (doubling of width), pointwise asymptotic confidence intervals can be extended to uniform nonparametric confidence sequences. The crucial role of some beautiful nonnegative supermartingales will be made transparent in enabling “safe anytime-valid inference".
This talk will mostly feature joint work with Steven R. Howard (Berkeley, Voleon), Jon McAuliffe (Berkeley, Voleon), Jas Sekhon (Berkeley, Bridgewater) and recently Larry Wasserman (CMU) and Sivaraman Balakrishnan (CMU). I will also cover interesting historical and contemporary contributions to this area.


  • Thursday, Apr 16, 2020

    • Speaker: Emmanuel Candès (Stanford University)

    • Title: Causal Inference in Genetic Trio Studies

    • Abstract:

We introduce a method to rigorously draw causal inferences inferences immune to all possible confounding — from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a novel conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed Digital Twin Test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional non-trio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes.


  • Thursday, Apr 30, 2020

    • Speaker: Yingying Fan (University of Southern California)

    • Title: Universal Rank Inference via Residual Subsampling with Application to Large Networks

    • Abstract: Determining the precise rank is an important problem in many large-scale applications with matrix data exploiting low-rank plus noise models. In this paper, we suggest a universal approach to rank inference via residual subsampling (RIRS) for testing and estimating rank in a wide family of models, including many popularly used network models such as the degree corrected mixed membership model as a special case. Our procedure constructs a test statistic via subsampling entries of the residual matrix after extracting the spiked components. The test statistic converges in distribution to the standard normal under the null hypothesis, and diverges to infinity with asymptotic probability one under the alternative hypothesis. The effectiveness of RIRS procedure is justified theoretically, utilizing the asymptotic expansions of eigenvectors and eigenvalues for large random matrices recently developed in Fan et al. (2019a) and Fan et al. (2019b). The advantages of the newly suggested procedure are demonstrated through several simulation and real data examples. This work is joint with Xiao Han and Qing Yang.

    • Discussant: Yuekai Sun (University of Michigan)

    • Links: [Recording] [Relevant paper] [Slides]

  • Thursday, Apr 23, 2020

    • Speaker: Aaditya Ramdas (Carnegie Mellon University)

    • Title: Ville’s inequality, Robbins’ confidence sequences, and nonparametric supermartingales

    • Abstract:

Standard textbook confidence intervals are only valid at fixed sample sizes, but scientific datasets are often collected sequentially and potentially stopped early, thus introducing a critical selection bias. A "confidence sequence” is a sequence of intervals, one for each sample size, that are uniformly valid over all sample sizes, and are thus valid at arbitrary data-dependent sample sizes. One can show that constructing the former at every time step guarantees false coverage rate control, while constructing the latter at each time step guarantees post-hoc familywise error rate control. We show that at a price of about two (doubling of width), pointwise asymptotic confidence intervals can be extended to uniform nonparametric confidence sequences. The crucial role of some beautiful nonnegative supermartingales will be made transparent in enabling “safe anytime-valid inference".
This talk will mostly feature joint work with Steven R. Howard (Berkeley, Voleon), Jon McAuliffe (Berkeley, Voleon), Jas Sekhon (Berkeley, Bridgewater) and recently Larry Wasserman (CMU) and Sivaraman Balakrishnan (CMU). I will also cover interesting historical and contemporary contributions to this area.

  • Thursday, Apr 16, 2020

    • Speaker: Emmanuel Candès (Stanford University)

    • Title: Causal Inference in Genetic Trio Studies

    • Abstract:

We introduce a method to rigorously draw causal inferences inferences immune to all possible confounding — from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a novel conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed Digital Twin Test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional non-trio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes.