Past Seminar Presentations


Previously, Candès et al. (2023) introduced a novel method based on CP to generate valid and efficient lower predictive bounds on survival times. This paper considers a different problem: that of generating an upper predictive bound (in addition to a lower predictive bound). We propose a new method using CP that generates two-sided or one-sided prediction intervals for survival times. Specifically, the method provides both lower and upper predictive bounds for individuals deemed sufficiently similar to the non-censored population, while returning only a lower bound for others. The prediction intervals offer finite-sample coverage guarantees, requiring no distributional assumptions other than the sampled data points are independent and identically distributed. The performance of the procedure is assessed using both synthetic and real-world datasets. Joint work with Chris Holmes (Dep. Of Statistics, Oxford University)


In this setting, our emphasis is on obtaining FDP confidence bounds that both have non-asymptotic coverage and are asymptotically accurate in a specific sense, as the number m of tested hypotheses grows. Namely, we introduce and study the property (which we call m-consistency) that the confidence bound converges to or below the desired level α when applied to a specific reference α-level false discovery rate (FDR) controlling procedure.

With this perspective in mind, we derive new bounds that provide improvements over existing ones, both theoretically and practically, and are suitable for situations where at least a moderate number of rejections is expected. In particular, the improvement is significant for knockoff p-values, which shows the impact of the method for a practical use. These improvements are illustrated with numerical experiments and real data examples.






























In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions---covariate, label and concept shift---as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions. This is joint work with Hongxiang David Qiu and Eric Tchetgen Tchetgen.



























Our theory and experiments suggest that conformal prediction with noisy labels and commonly used score functions conservatively covers the clean ground truth labels except in adversarial cases.



















While tests based on average coverage intervals do not control size in the usual frequentist sense, certain results on false discovery rate (FDR) control of multiple testing procedures continue to hold when applied to such tests.  In particular, the Benjamini and Hochberg (1995) step-up procedure still controls FDR in the asymptotic regime with many weakly dependent $p$-values, and certain adjustments for dependent $p$-values such as the Benjamini and Yekutieli (2001) procedure continue to yield FDR control in finite samples.


Then, I will present a simple, yet powerful, idea: using e-values as unnormalized weights in multiple testing. Most standard weighted multiple testing methods require the weights to deterministically add up to the number of hypotheses being tested (equivalently, the average weight is unity). But this normalization is not required when the weights are e-values obtained from independent data. This could result in a massive increase in power, especially if the non-null hypotheses have e-values much larger than one. More broadly, we study how to combine an e-value and a p-value, and design multiple testing procedures where both e-values and p-values are available for some hypotheses. A case study with RNA-seq and microarray data will demonstrate the practical power benefits.

These are joint works with Ruodu Wang, Neil Xu and Nikos Ignatiadis.






This is joint work with Will Fithian and Lihua Lei.













This is joint work with Daniel Xiang and Will Fithian.


Our work also has implications for multiple testing in sequential settings, since it applies at stopping times to continuously-monitored confidence sequences and multi-armed bandit sampling.





















Joint work with Luella Fu, Alessio Saretto, and Wenguang Sun.



This talk is based upon joint work with Peter W. Macdonald and  Daniel Kessler.






The FDA gave Accelerated Approval to Aduhelm^{TM} (aducanumab) for Alzheimer's Disease (AD) on 8 June 2021, based on its reduction of beta-amyloid plaque (a surrogate biomarker endpoint). When clinical efficacy of a treatment for the overall population is not shown, genome-wide association studies (GWAS) are often used to discover SNPs that might predict efficacy in subgroups. In the process of working on GWAS with real data, we came to realization that, if one causal SNP makes its zero-null hypothesis false, then all other zero-null hypotheses are statistically false as well. While the majority of no-association null hypotheses might well be true biologically, statistically they are false (if one is false) in GWAS. I will indeed illustrate this with a causal SNP for the ApoE gene which is involved in the clearance of beta-amyloid plaque in AD. We suggest our confidence interval CE4 approach instead.

Targeted therapies such as OPDIVO and TECENTRIQ naturally have patient subgroups, already defined by the extent to which the drug target is present or absent in them, subgroups that may derive differential efficacy. An additional danger of testing equality nulls in the presence of subgroups is that the illusory logical relationships among efficacy in subgroups and their mixtures created by exact quality nulls leads to too drastic a stepwise multiplicity reduction, resulting in inflated directional error rates, as I will explain. Instead, Partition Tests, which would be called Confident Direction methods in the language of Tukey, might be safer to use.










(a) The stationary points of the objective are automatically sparse (i.e. performs selection) -- no explicit ℓ1 penalization is needed.

(b) All stationary points of the objective exclude noise variables with high probability.

(c) Guaranteed recovery of all signal variables without needing to reach the objective's global maxima or special stationary points.

The second and third properties mean that all our theoretical results apply in the practical case where one uses gradient ascent to maximize the metric learning objective. While not all metric learning objectives enjoy good statistical power, we design an objective based on ℓ1 kernels that does exhibit favorable power: it recovers (i) main effects with n∼logp samples, (ii) hierarchical interactions with n∼logp samples and (iii) order-s pure interactions with n∼p^{2(s−1)}logp samples.













































 



Standard textbook confidence intervals are only valid at fixed sample sizes, but scientific datasets are often collected sequentially and potentially stopped early, thus introducing a critical selection bias. A "confidence sequence” is a sequence of intervals, one for each sample size, that are uniformly valid over all sample sizes, and are thus valid at arbitrary data-dependent sample sizes. One can show that constructing the former at every time step guarantees false coverage rate control, while constructing the latter at each time step guarantees post-hoc familywise error rate control. We show that at a price of about two (doubling of width), pointwise asymptotic confidence intervals can be extended to uniform nonparametric confidence sequences. The crucial role of some beautiful nonnegative supermartingales will be made transparent in enabling “safe anytime-valid inference".
This talk will mostly feature joint work with Steven R. Howard (Berkeley, Voleon), Jon McAuliffe (Berkeley, Voleon), Jas Sekhon (Berkeley, Bridgewater) and recently Larry Wasserman (CMU) and Sivaraman Balakrishnan (CMU). I will also cover interesting historical and contemporary contributions to this area.


We introduce a method to rigorously draw causal inferences inferences immune to all possible confounding — from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a novel conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed Digital Twin Test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional non-trio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes.