PhD Studentships in Pharmacoepidemiology

The London School of Hygiene & Tropical Medicine is pleased to invite applications for two PhD studentships in pharmacoepidemiology, funded for 3.5 years by GlaxoSmithKline, starting in September 2023.

The award will cover a tax-free stipend of GBP 31,516 per year and tuition fees (at the home or international rate).

The studentships will be based in the Faculty of Epidemiology and Population Health. The Faculty is multi-disciplinary and encompasses epidemiologists, medical statisticians, medical demographers, nutritionists, social scientists and public health practitioners. The successful candidates will also spend time at the offices of GlaxoSmithKline in West London.

There is a choice of possible topic areas, described below. The exact focus of the PhD will be developed with the successful candidate and will depend on their interests and prior expertise and interests. Applicants are asked to contact the supervisor of the project they are most interested in for an informal discussion prior to applying.

Project list

Overarching theme: Optimising methodology to minimise bias in real world studies, increasing the acceptability of real world evidence for decision making.

1) Study designs for evaluating novel treatment decisions using longitudinal observational data



Pharmacoepidemiological studies are increasingly interested in answering questions surrounding dynamic treatment patterns and key areas are deprescribing, treatment switching, and adding additional treatments. In the case of deprescribing, for example, this can be thought of as considering the contrast comparing a patient on Drug A who is deprescribed versus a patient who continues on Drug A. There is growing popularity in the use of sequential trial designs to address questions surrounding treatment initiation but these designs could be extended to accommodate other questions, such as deprescribers. Sequential trial designs offer the possibility of alleviating computational burden compared to the existing frameworks, e.g. the prevalent new user designs.


The student would apply sequential trial designs and prevalent new user designs to answer novel questions around dynamic treatment use in the setting of Cystic Fibrosis (CF), where patients use many treatments in different combinations over their lifetimes. The introduction of new disease-modifying treatments (CFTR modulators) has resulted in new questions about deprescribing older treatments, which have been highlighted as a key research priority. This project would investigate methods for studying the impacts of treatment switching and deprescribing in CF using longitudinal data from the UK CF registry, including investigating differences in patient subgroups underrepresented in the trials. Outputs would include applied work, guidance for investigating these types of questions more generally and software/analytical pipelines. Depending on the student's interests, there could also be focused methodological work on the implications of these designs on missing data and quantitative bias analysis.


Completed Medical Statistics PhDs using Cystic Fibrosis data (supervised by Ruth Keogh):

2) Missing data in causal analyses of EHR data



The issue of missing data inevitably arises in any study using routinely collected health data, including electronic health records which raises questions about the validity of causal inferences drawn from the data. A complicating factor is that, in many instances, observations are made only when a patient comes into contact with the healthcare system. This process is non-random and reflects the patient's (potentially transient) state of health at that time, so the mechanisms of missing data can be complex and health-dependent. This aspect is rarely taken into consideration in analyses of electronic health record data; the extent to which this could induce bias remains unknown.

While a huge body of theoretical work exists around handling missing data, a number of issues have not yet been resolved. First, how missing data impacts the selection of confounders in a causal analysis involving a large number of variables, such as a high-dimensional propensity score analysis. Second, what analytic techniques can optimally handle this missingness. Third, how the data-dependent sampling impacts values of confounders recorded, whether this can introduce bias into causal estimates. Finally, any bias introduced by data-dependent sampling can be removed or reduced by advanced statistical methods such as joint modelling of the missing data process and the outcome of interest.

In this project, the student will take a pharmaco-epidemiological study undertaken in electronic health record data and create a set of simulations based around those data. They would then use the simulated data to explore impacts of missingness on confounder selection and compare different ways of handling the missingness, including novel imputation techniques incorporating high-dimensional propensity score covariates. They would impose different missingness mechanisms, reflecting the patient's underlying health status, and explore the ability of different techniques to appropriately correct for this. They would then apply their methods to a real analysis. Finally, they would develop re-usable software to implement any new techniques developed so that GSK researchers could use these methods on subsequent studies.

3) Machine learning and AI in pharmacoepidemiology



Machine learning and AI have shown excellent performance for prediction. However, these techniques are not often used in pharmacoepidemiology. This project would explore the use of various machine and AI techniques in electronic health record data in two contexts: (1) causal analyses with a large number of confounders and (2) development of risk prediction models.

In causal analyses, the high dimensional propensity score has become popular to define and select a large number of confounders to adjust for. However, other approaches could be used and have not been thoroughly explored. For example, natural language processing techniques could be used, treating codes in EHR data as “words”, to define confounders. Artificial neural networks could be implemented to “predict” outcomes under both treatment conditions to obtain the treatment effect. Ensemble methods, such as the super-learner, could be used for adjustment. Several different machine learning approaches could be used for the variable selection aspect. This project would involve a comprehensive comparison of the many available approaches, to begin to understand which methods work best and when.

Similarly, these techniques could all be applied to the development of risk prediction models within electronic health record data. Depending on the student's interests, and those of GSK, this project could instead (or as well!) apply those same methods in the context of risk prediction models, to explore which methods work best.

4) A comparison of observational study designs for estimating the safety of multi-dose vaccines



Electronic health records are often used to identify potential associations between vaccines and adverse events, as randomized controlled trials may not have enough power to reliably study rare outcomes. However, people who are vaccinated differ from those who are not, and such 'health vaccinee' effects can introduce serious bias in observational studies of vaccine efficacy and safety. Self-controlled case series (SCCS) were partially developed to address this, as they automatically correct for confounding factors which do not vary over the observation time. However, SCCS are also subject to strong assumptions which may be violated in many multi-dose vaccine settings, where an adverse event might cause delays to or contraindicate future vaccine doses. Extensions of the SCCS based on estimating equations have been developed to handle these scenarios.

There have been relatively few studies into how available design and analytical options for identifying vaccine safety signals compare, and this PhD would address this by comparing different methods for studying the safety of multi-dose vaccines. We anticipate that the methods to be compared would be a matched cohort study designed and implemented using a “target trial” framework, and different self-controlled designs. It is anticipated that there would be at least two empirical investigations, including one of COVID-19 vaccines, as well as a simulation study.

Eligibility requirements

Applicants must hold, or expect to obtain before the start of the PhD, a relevant MSc awarded with good grades, or have a combination of relevant qualifications and experience which demonstrates equivalent ability and attainment.

The PhD programme

Students will be mentored by their supervisory team made up of 2-3 academics and an Advisory Committee consisting of at least two other people, who can be from outside the School.  Each Advisory Committee will also have at least one epidemiologist or statistician from GlaxoSmithKline. Students are expected to take part in the academic life of their department and can also be members of other Academic Centres - e.g. Centre for the Mathematical Modelling of Infectious Diseases, the Clinical Trials Unit, the Malaria Centre, The TB Centre, the MARCH centre for maternal and child health, and the Centre for Statistical Methodology. All research seminars and journal clubs are open to PhD students from across the School. Students are able to take up to four Master's level Study Modules per academic year, subject to approval from their supervisor.

Support for research students' future career development is covered through the supervision process, through the Transferable Skills Programme (in the School and the Bloomsbury Postgraduate Skills Network) and the School's Careers Service. Also important for career development are the opportunities for students to network and establish professional contacts. The School also facilitates national and international conference attendance by students in the later stages which provides networking opportunities.

How to apply

Information about the MPhil/PhD programme structure at LSHTM, as well as application guidance and a link to the portal, can be found on the School's Research Degrees and Doctoral College pages.

To apply for this studentship, applicants should submit an application for research degree study via the LSHTM application portal. Please write 'PhD Studentships in Pharmacoepidemiology' in the Funding Section on the application form. The research proposal should identify a specific research question or hypothesis, expanding on one of the topics listed on the website, summarise the relevant background information (with no more than 5 key references) and should outline an appropriate research methodology by which the question can be addressed.

Applications for this project will only be reviewed and processed after the deadline. All complete applications that are submitted before the deadline will be considered equally, regardless of submission date.

Only applications in the correct format will be considered.

The deadline for applications is 12 April 2023 at 10:00am BST.