International Seminar on Selective Inference

A weekly online seminar on selective inference, multiple testing, and post-selection inference.

Gratefully inspired by the Online Causal Inference Seminar

NEW ZOOM LINK

The Zoom link for the seminar has changed. Please use the new one here: https://berkeley.zoom.us/j/99278296389

Mailing List

For announcements and Zoom invitations please subscribe to our mailing list.

Upcoming Seminar Presentations

All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here

  • Thursday, July 30, 2020 [Link to join]

    • Speaker: Kathryn Roeder (Carnegie Mellon University)

    • Title: Adaptive approaches for augmenting genetic association studies with multi-omics covariates

    • Abstract: To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new selective inference methodologies could improve power by enabling exploration of test statistics with covariates for informative weights while retaining desired statistical guarantees. We explore one such framework, adaptive p-value thresholding (AdaPT), in the context of genome-wide association studies (GWAS) under two types of regimes: (1) testing individual single nucleotide polymorphisms (SNPs) for schizophrenia (SCZ) and (2) the aggregation of SNPs into gene-based test statistics for autism spectrum disorder (ASD). In both settings, we focus on enriched expression quantitative trait loci (eQTLs) and demonstrate a substantial increase in power using flexible gradient boosted trees to account for covariates constructed with GWAS statistics from genetically-correlated phenotypes, as well as measures capturing association with gene expression and coexpression subnetwork membership. We address the practical challenges of implementing AdaPT in high-dimensional -omics settings, such as approaches for tuning gradient boosted trees without compromising error-rate control as well as handling the subtle issues of working with publicly available summary statistics (e.g., p-values reported to be exactly equal to one). Specifically, because a popular approach for computing gene-level p-values is based on an invalid approximation for the combination of dependent two-sided test statistics, it yields an inflated error rate. Additionally, the resulting improper null distribution violates the mirror-conservative assumption required for masking procedures. We believe our results are critical for researchers wishing to build new methods in this challenging area and emphasize that our pipeline of analysis can be implemented in many different high-throughput settings to ultimately improve power. This is joint work with Ronald Yurko, Max G’Sell, and Bernie Devlin.

    • Discussant: Chiara Sabatti (Stanford University)

    • Links: [Relevant paper] [Slides]


  • Thursday, August 6, 2020 --- no seminar (JSM)

  • Thursday, August 13, 2020 [Link to join]

    • Speaker: Lucy Gao (University of Washington)

    • Title: TBD

  • Thursday, August 20, 2020 [Link to join]


Format

The seminars are held on Zoom and last 60 minutes:

  • 45 minutes of presentation

  • 15 minutes of discussion, led by an invited discussant

Moderators collect questions using the Q&A feature during the seminar.

How to join

You can attend by clicking the link to join (there is no need to register in advance).

More instructions for attendees can be found here.

Organizers

Contact us

If you have feedback or suggestions or want to propose a speaker, please e-mail us at selectiveinferenceseminar@gmail.com.

What is selective inference?

Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:

  • Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)

  • Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences

  • Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions

  • Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!