Upcoming Seminar Presentations
All seminars take place Thursdays at 8:30 am PT / 11:30 am ET / 4:30 pm London / 6:30 pm Tel Aviv. Past seminar presentations are posted here.
Thursday, January 20, 2022 [Link to join]
Speaker: Richard Samworth (University of Cambridge)
Title: Optimal subgroup selection
Abstract: In clinical trials and other applications, we often see regions of the feature space that appear to exhibit interesting behaviour, but it is unclear whether these observed phenomena are reflected at the population level. Focusing on a regression setting, we consider the subgroup selection challenge of identifying a region of the feature space on which the regression function exceeds a pre-determined threshold. We formulate the problem as one of constrained optimisation, where we seek a low-complexity, data-dependent selection set on which, with a guaranteed probability, the regression function is uniformly at least as large as the threshold; subject to this constraint, we would like the region to contain as much mass under the marginal feature distribution as possible. This leads to a natural notion of regret, and our main contribution is to determine the minimax optimal rate for this regret in both the sample size and the Type I error probability. The rate involves a delicate interplay between parameters that control the smoothness of the regression function, as well as exponents that quantify the extent to which the optimal selection set at the population level can be approximated by families of well-behaved subsets. Finally, we expand the scope of our previous results by illustrating how they may be generalised to a treatment and control setting, where interest lies in the heterogeneous treatment effect.
Discussant: Charles Doss (University of Minnesota)
Links: [Relevant papers:]
Thursday, January 27, 2022 [Link to join]
Speaker: Richard Berk (University of Pennsylvania)
Title: Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets
Abstract: In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the prospects for improved fairness when members of a less privileged group are treated by a risk algorithm as if they are members of a more privileged group. We combine a machine learning classifier trained in a novel manner with an optimal transport adjustment for the relevant joint probability distributions, which together provide a constructive response to claims of bias-in-bias-out. A key distinction is between fairness claims that are empirically testable and fairness claims that are not. We then use confusion tables and conformal prediction sets to evaluate achieved fairness for projected risk. Our data are a random sample of 300,000 offenders at their arraignments for a large metropolitan area in the United States during which decisions to release or detain are made. We show that substantial improvement in fairness can be achieved consistent with a Pareto improvement for protected groups.
Discussant: Emmanuel Candès (Stanford University)
Links: [Relevant papers:]
The seminars are held on Zoom and last 60 minutes:
45 minutes of presentation
15 minutes of discussion, led by an invited discussant
Moderators collect questions using the Q&A feature during the seminar.
How to join
You can attend by clicking the link to join (there is no need to register in advance).
More instructions for attendees can be found here.
What is selective inference?
Broadly construed, selective inference means searching for interesting patterns in data, usually with inferential guarantees that account for the search process. It encompasses:
Multiple testing: testing many hypotheses at once (and paying disproportionate attention to rejections)
Post-selection inference: examining the data to decide what question to ask, or what model to use, then carrying out one or more appropriate inferences
Adaptive / interactive inference: sequentially asking one question after another of the same data set, where each question is informed by the answers to preceding questions
Cheating: cherry-picking, double dipping, data snooping, data dredging, p-hacking, HARKing, and other low-down dirty rotten tricks; basically any of the above, but done wrong!