Dr Alexandra Lewin

Associate Professor in Epidemiology

United Kingdom

My research lies at the interface between statistics and machine learning, developing new statistical methods and software for large-scale data analysis. Most of my work is on highly-structured, high-dimensional Bayesian models for statistical genomics and genetic epidemiology.

My background is a Maths degree from Cambridge University and PhD in Cosmology from Imperial College London. I taught myself Bayesian statistics during my PhD, and moved into Biostatistics straight after, working briefly on spatial data analysis in the Small Area Health Statistics Unit at Imperial College London, followed by several years on new statistical methods for gene expression data and other 'omics' data in the Department of Epidemiology and Biostatistics at Imperial and in the Maths department at Brunel University. I joined LSHTM in 2018, to work with people across the School on methods and applications using 'omics' data in epidemiology.


Department of Medical Statistics
Faculty of Epidemiology and Population Health


Centre for Data and Statistical Science for Health


I lead the Machine Learning module on the MSc in Health Data Science (with Pierre Masselot from the Faculty of Public Health and Policy) and the Bayesian modules on the MSc in Medical Statistics (with Tim Russell from the Centre for Mathematical Modelling).



High-throughput molecular biology

I have worked for several years on Bayesian integrative models in molecular epidemiology. I have been involved in the development of new statistical methodology for several different types of high-throughput molecular biology data, including gene expression microarrays (Lewin et al. 2006; Lewin et al. 2007; Turro et al. 2010), RNA-seq (Turro et al. 2011), proteomics (Kirk et al. 2013), metabolomics (Lewin et al. 2015, Bottolo et al. 2021, Scott et al. 2023) and microbiome (Scott et al. 2023). The emphasis in all of this work is on integrative modelling, using fully Bayesian models to account for the complex correlation structures in the data and propagate uncertainty on model estimates.


Multi-trait analysis in genetics and genomics

Quantitative Trait Loci (QTLs) are genetic variants which are statistically associated with a phenotype of interest. In molecular biology, high-throughput technologies have enabled us to find QTLs for multivariate molecular phenotypes (for example multivariate gene expression (eQTLs), proteomics (pQTLs) and metabolomics (mQTLs)).

Traditional analysis approaches consider each molecular variable separately, despite these data showing extremely high correlations. We have developed Bayesian models for detecting QTLs for multivariate molecular outcomes, and have used these to detect eQTLs and mQTLs. Joint modelling of genomics and metabolomics data for eQTL/mQTL detection (Lewin et al. 2015, Bottolo et al. 2021), joint modelling of microbiome and metabolomics data (Scott et al. 2023) and multi-omics data integration in drug-resistance studies (Zhao et al. 2021, 2023).

An extension of this work into causal modelling is Verena Zuber's paper on Mendelian Randomisation: here we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses (Zuber et al. 2023).

Darren Scott recently completed his PhD with me, working on models linking multivariate molecular outcomes with microbiome data. Microbiome data is compositional, meaning that features are expressed as proportions of a whole. Standard supervised learning models cannot be used for compositional data as they treat feature as independent. We have developed models for univariate (Scott et al. 2023) and multivariate outcomes (manuscript in progress) using microbiome as compositional predictors.


Machine Learning in Health Data Research

I am currently co-investigator on InflAIM, an NIHR-funded project led by the University of East Anglia to investigate multimorbidity using AI methods. We will be using multi-state models and Bayesian networks to study links and risk factors for multimorbidity.

I recently wrote a "Lessons Learnt" paper for the Centre for Impact Evaluation on machine learning methods used in causal inference for impact evaluation, in particular with respect to investigating heterogeneous treatment effects and mechanisms (Lewin et al. 2023).

I am currently supervising an MSc dissertation on interpretable AI methods. We are investigating the reliability of explanation methods for complex black box models in the context of observational epidemiology. I am also co-supervising a PhD student surveying the use of machine learning methods for large-scale disease surveillance using online social media data.


Causal Inference Methodology

I am supervising two NIHR pre-doctoral Fellows (Lauren Rengger and Jenni Banks) working on causal mediation analysis. We are investigation causal mechanisms of the association between eczema and cardiovascular outcomes, using recently developed methods for causal inference.

Mendelian Randomisation for multivariate outcomes: here we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses (Zuber et al. 2023).


Bayesian Evidence Synthesis

I am working with Darren Scott (AstraZeneca) on Bayesian models for using historical data to improve efficiency of randomised trials analysis (Scott and Lewin 2024 arxiv paper).

I work with Joy Lawn's group in LSHTM MARCH (Centre for Maternal, Adolescent, Reproductive, & Child Health) advising on Bayesian evidence synthesis methods used to produce global and country-specific estimates of disease burden and adverse birth outcomes (Gonçalves et al. 2021, Gonçalves et al. 2022, Ohuma et al. 2023, Okwaraji et al. 2023).

Research Area
Bayesian Analysis

Selected Publications

Multi-response Mendelian randomization: Identification of shared and distinct exposures for multimorbidity and multiple related disease outcomes
Zuber, V; LEWIN, A; Levin, MG; Haglund, A; Ben-Aicha, S; Emanueli, C; Damrauer, S; Burgess, S; Gill, D; Bottolo, L;
American journal of human genetics
Bayesian compositional regression with microbiome features via variational inference.
Scott, DA V; Benavente, E; Libiseller-Egger, J; Fedorov, D; PHELAN, J; Ilina, E; Tikhonova, P; Kudryavstev, A; Galeeva, J; CLARK, T; LEWIN, A;
BMC bioinformatics
Group B streptococcus infection during pregnancy and infancy: estimates of regional and global burden.
Gonçalves, BP; PROCTER, SR; PAUL, P; CHANDNA, J; LEWIN, A; Seedat, F; Koukounari, A; Dangor, Z; Leahy, S; Santhanam, S; John, HB; Bramugy, J; Bardají, A; Abubakar, A; Nasambu, C; Libster, R; Sánchez Yanotti, C; Horváth-Puhó, E; Sørensen, HT; Van de Beek, D; Bijlsma, MW; Gardner, WM; Kassebaum, N; Trotter, C; Bassat, Q; ... CHAMPS team,
The Lancet. Global health
Exploring the causal effect of maternal pregnancy adiposity on offspring adiposity: Mendelian randomisation using polygenic risk scores.
Bond, TA; Richmond, RC; Karhunen, V; Cuellar-Partida, G; Borges, MC; Zuber, V; Couto Alves, A; Mason, D; Yang, TC; Gunter, MJ; Dehghan, A; Tzoulaki, I; Sebert, S; Evans, DM; LEWIN, AM; O'Reilly, PF; Lawlor, DA; Järvelin, M-R;
BMC medicine
Estimation of country-level incidence of early-onset invasive Group B Streptococcus disease in infants using Bayesian methods.
Gonçalves, BP; PROCTER, SR; CLIFFORD, S; Koukounari, A; PAUL, P; LEWIN, A; JIT, M; LAWN, J;
PLoS computational biology
A computationally efficient Bayesian seemingly unrelated regressions model for high-dimensional quantitative trait loci discovery.
Bottolo, L; Banterle, M; Richardson, S; Ala-Korpela, M; Järvelin, M-R; LEWIN, A;
Journal of the Royal Statistical Society: Series C (Applied Statistics)
<b>BayesSUR</b>: An <i>R</i> Package for High-Dimensional Multivariate Bayesian Variable and Covariance Selection in Linear Regression
Zhao, Z; Banterle, M; Bottolo, L; Richardson, S; LEWIN, A; Zucknick, M;
Journal of Statistical Software
The association between partner bereavement and melanoma: cohort studies in the U.K. and Denmark.
WONG, AY S; Frøslev, T; Dearing, L; FORBES, HJ; MULICK, A; MANSFIELD, KE; Silverwood, RJ; Kjaersgaard, A; Sørensen, HT; SMEETH, L; LEWIN, A; Schmidt, SA J; LANGAN, SM;
British Journal of Dermatology
GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI.
Couto Alves, A; De Silva, NM G; Karhunen, V; Sovio, U; Das, S; Taal, HR; Warrington, NM; LEWIN, AM; Kaakinen, M; Cousminer, DL; Thiering, E; Timpson, NJ; Bond, TA; Lowry, E; Brown, CD; Estivill, X; Lindi, V; Bradfield, JP; Geller, F; Speed, D; Coin, LJ M; Loh, M; Barton, SJ; Beilin, LJ; Bisgaard, H; ... Early Growth Genetics (EGG) Consortium,
Science Advances
Bayesian Methods for Gene Expression Analysis
LEWIN, A; Bottolo, L; Richardson, S;
See more information