series event

Causal Machine Learning for Biomarker Subgroup Discovery in Randomised Trials

Exploring three causal machine learning methods for responder subgroup detection

DASh event card

Decreasing costs of high-throughput ‘omics, as well as new technologies such as the Olink proteomics platform, has driven wider application in clinical trials, for example to inform precision medicine strategies. However, data-driven characterisation of patient subgroups with enhanced (or weaker) treatment effect remains a challenging problem, particularly when searching over high-dimensional biomarkers. With growing recognition that traditional approaches (e.g. exhaustive biomarker-treatment interaction testing) are sub-optimal, several promising methods have recently emerged that combine machine learning tools with concepts from causal inference. In principle, they offer greater power through less conservative multiplicity control, and the ability to capture complex multivariate signatures which may be missed during one-at-a-time testing. The speaker will describe three causal machine learning methods for responder subgroup detection; the “Modified covariate Lasso”1, “Causal Forests”2, and the “X-Learner”3. He will compare and assess their performance in a modest simulation study motivated by real biomarker trial datasets being generated within GlaxoSmithKline. He will then share some early (gene-anonymised) results from an on-going application of these methods to detect and predict responder subgroups from transcriptomic data measured in two Phase 3 Lupus trials. The speaker will close with a discussion on the benefits, and limitations, that he found with existing methods in this space. 


Tian L, Alizadeh AA, Gentles AJ, Tibshirani R. A Simple Method for Estimating Interactions Between a Treatment and a Large Number of Covariates. Journal of the American Statistical Association 2014; 109(508): 1517-1532 

Wager S, Athey S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association 2018; 113(523): 1228-1242 

Kunzel SR, Sekhon JS, Bickel PJ, Yu B. Metalearners for estimating heterogeneous treatment effects using machine learning. PNAS 2019; 116(1): 4156–65


Paul Newcombe

Paul Newcombe, GlaxoSmithKline 

Paul completed his PhD in statistical genetics at LSTHM in 2009, after which he spent several years in the GSK statistical genetics group. Paul then returned to academia to spend eight years at the MRC Biostatistics Unit, Cambridge University, developing (mainly) Bayesian methodology for quantitative ‘omics problems. Paul then moved back to industry in 2020: Originally for a role within a precision medicine department at AstraZeneca, and more recently to join a statistical innovations group at GSK. Throughout this work, Paul has maintained a strong interest in drug development, precision medicine, and the role there-in of statistics and modern biomarkers.

Event notices

  • Please note that you can join this event in person or you can join the session remotely


Free and open to all. No registration required.