Overview

Content

Overview - Statistical analysis with missing data using multiple imputation

Paragraph

The course runs from 23 to 25 June 2026.

A short course taught online by statisticians from LSHTM, and part of the School’s Centre for Data and Statistical Science for Health.

Missing data frequently occurs in both observational and experimental research. They lead to a loss of statistical power, but more importantly, may introduce bias into the analysis. In this course, we adopt a principled approach to handling missing data, in which the first step is a careful consideration of suitable assumptions regarding the missing data for a given study. Based on this, appropriate statistical methods can be identified that are valid under the chosen assumptions.

The overall aim of this course is for participants to learn about how the method of multiple imputation can be used to handle missing data in statistical analyses and to understand the assumptions under which this is valid. In addition to introducing the method in more standard settings, we will explore its use in a range of more advanced situations, including in the presence of non-linearities and interactions, propensity score analysis, prognostic model development, and for performing sensitivity analyses.

Who should apply?

Epidemiologists, biostatisticians, and other health researchers have strong quantitative skills and experience in statistical analysis. In particular, we will expect familiarity with regression models, such as linear and logistic regression, and interpretation of their results. Computer practicals will use the statistical software package R, so participants should be familiar with using R for performing statistical analyses. Full R code solutions will be provided. Stata practical materials will also be made available to participants.

Intended learning outcomes

Understand the impacts of missing data on statistical inferences and assumptions about missingness mechanisms, including missing completely at random, missing at random, and missing not at random.
Understand the assumptions under which multiple imputations can be used to provide valid inferences from a partially observed dataset, and be able to apply it appropriately using modern statistical software.
Understand how multiple imputations can be applied in various advanced settings, including non-linearities and interactions, missing data sensitivity analysis, propensity score analysis, and prognostic model development.

Teaching format

The course is delivered online across 3 days. In each morning and afternoon session, a 1-hour lecture followed by a 1.5-hour computer practical.