“Building the ship while it’s sailing”: the challenge of evaluating programmes that change over time

The Centre for Evaluation recently held a lunchtime seminar, in which we heard from five speakers who are undertaking evaluations of interventions that have changed. In this blog, we highlight the key challenges that change poses to traditional evaluation approaches.

There is increasing recognition that a flexible approach to intervention design and implementation is needed to address complex public health issues. Both DfID and USAID are investing in adaptive management in which an intervention and the implementation strategy are anticipated to evolve over time. There is also increasing use of human-centered design (HCD) or design thinking, a flexible and iterative approach, during the development and implementation of programmes. Programmes that change over time pose a challenge to traditional evaluation approaches, which are based on the assumption of a stable, well-defined intervention. This seminar brought together five speakers, each undertaking evaluations of interventions with both intended and unintended changes. The five interventions and the associated changes were:

  • Adolescent 360 aims to increase uptake of modern contraception among girls aged 15-19 in three countries. The programme is using principles of design thinking to develop interventions that are adapted to the local context – as such different approaches are anticipated in each country. The evaluation is being designed and baseline data collection conducted in parallel to intervention design.
  • The Tanzanian national sanitation campaign is an ongoing multi-component intervention that aims to encourage individuals to upgrade their toilet. The intervention is continually evolving and includes mass media campaigns and television shows. The evaluation, led by the EHG group at LSHTM, has just started.
  • The Expanded Quality Management Using Information Power (EQUIP) initiative involved groups of quality improvement teams testing new implementation strategies to increase the coverage of selected essential interventions for maternal and newborn care. At the start of implementation, a defined number of essential intervention were selected. Ultimately, not all of the essential interventions were included. The change happened after the evaluation had started as a response to local context.
  • The Safe Care Saving Lives (SCSL) initiative supported the implementation of 20 evidence-based maternal and newborn care practices. The intervention was intended to be rolled out in three phases with the second and third phase planned to be randomised to allow for a strong evaluation design. However, the implementation strategy was adapted in several steps, and operational needs made the original randomisation unfeasible. Further, the intervention received insufficient support from policy makers and health facilities. As a result, the second phase was never fully implemented and the third phase cancelled. The evaluation now seeks rather to understand why implementation was so difficult in the respective context.  
  • The Gombe partnership for maternal and newborn health (MNH) aims to strengthen health services through a package of interventions delivered by four NGOs. The intervention has started to be scaled up, including to comparison facilities, while the IDEAS-led evaluation is ongoing. In addition, components of the intervention have changed in response to emerging learning and changes in the national strategy for MNH.

Key challenges identified in the seminar to undertaking an impact evaluation in the context of change are (1) defining and re-defining the intervention (2) identifying the target population and an appropriate counterfactual; (3) determining outcome measures; and (4) purpose and timing of the evaluation.

Challenge 1: defining and re-defining the intervention

Describing the intervention was often challenging, especially where evaluation decisions are being made before an intervention and its implementation strategy have been finalised. In some cases, the evaluation team were the first to articulate a theory of change, mapping the different components of the intervention and its intended outcomes and impacts.

Both planned and unplanned changes to the intervention over the course of the evaluation were perceived not to be clearly documented and the task of capturing changes fell to the evaluation team, often piecing together retrospectively what had happened. In all cases a close working relationship with the implementer and the use of mixed methods were seen as essential to be able to articulate a theory of change and capture changes in implementation. The Adolescent 360 study includes a process evaluation, and the evaluators have found direct observation of team meetings and programming activities to be the most important method to capture changes as the design process is iterative and the process has not been well documented.

The question was raised during the discussion whether a detailed understanding of the intervention was needed. The response: ‘it depends’! The need to understand the intervention was considered to depend on: whether the desired change is the same in each setting and population sub-group; whether outcomes were anticipated to change over time; and the level of detail needed for the analysis.

Challenge 2: identifying the target population and an appropriate counterfactual

Changes to the implementation strategy had a big impact on the ability to select comparison sites and ultimately the study design.

Changes to the implementation strategy of SCSL meant implementers did not adhere to implementation sites that had been selected through randomisation. The evaluation considered a ‘dose-response’ analysis (measure the strength of the intervention delivered, to explore the association between implementation strength and outcomes). However, it proved challenging to define the ‘dose’ due to the intervention approach not being fully defined at the outset and evolving during implementation. Adolescent 360 plans to strengthen its pre-post analysis with a dose-response analysis. This method was considered, by those present, to warrant further consideration.

For a number of the evaluations it was not clear upfront where the intervention would be implemented. For example, the evaluators of Adolescent 360 considered approaches such as stepped wedge and regression discontinuity designs, but insufficient details on the implementation sites and the target population at the time the evaluation was being designed meant a pre-post cross-sectional analysis was the most appropriate approach.

In the case of the Tanzanian sanitation programme evaluation, the intervention is highly diffuse, different populations are anticipated to be exposed to different components and new components are continually being added. The evaluation team is hoping to follow individuals over time who self-report whether, and which, components of the intervention they have been exposed to, and correlate that with modifications to their sanitation facilities at home.

In the majority of the settings a further challenge identified was that countries have weak routine monitoring systems, which makes selection of retrospective controls challenging. Could a country-wide evaluation platform - as proposed by Victora et al. in 2011, Heidkamp 2017, Keita et al 2019 and others - in which different existing databases are integrated in a continuous manner, be a potential solution?

Challenge 3: selecting the right outcome measures

As evaluators we seek to understand whether an intervention had the intended outcomes. It is standard practice to publish an evaluation protocol, which specifies the key outcomes and the analysis plans at a study’s outset, ideally before the intervention has been implemented.

Where the evaluation is being designed in parallel to the intervention it can be challenging to identify the relevant outcomes. In the case of the Adolescent 360 project, the evaluation team had to grapple with designing an impact evaluation and undertaking baseline data collection before the intervention was designed. As a result, the baseline survey had to include a more comprehensive set of questions in an attempt to capture multiple potential intervention outcomes.

The EQUIP study worked with the implementation partners as the intervention was being finalised to identify four key outcomes. The study found an impact for only one of the four outcomes. This was, in part, explained by fact that the interventions that related to two of the outcomes were never implemented. This experience raised methodological challenges: should the analysis be based on what was intended at the study outset or should it be changed to reflect what the implementation team actually ended up doing? Ultimately the evaluation stuck to an intention to treat analysis and drew cautious conclusions.

It might be desirable to allow outcome measures to be added or dropped over time. However, a constant set of indicators is vital for understanding whether a programme achieved its intended goals. In his blog, Julien Barr suggests identifying bedrock outcomes, which don’t change over the course of the programme as well as including a basket of output indicators that can change.

Challenge 4: the right evaluation at the right time?

Are we trying to support decision makers to make day-to-day decisions or are we trying to examine the impact and cost-effectiveness of a programme? These were seen as two distinct aims requiring different approaches to evaluation. Evaluations are frequently commissioned for the latter but in practice are frequently needed for the former.

There was discussion during the course of the event that evaluations might need to do more to support implementation. As a result of implementation challenges the evaluation of SCSL placed greater emphasis on understanding why the intervention was not taken up coherently and implementation diluted. A close working relationship with the implementers and funders was seen as pivotal to this shift and ensuring the utility of the evaluation.

This led to questions around whether we are doing impact evaluations too early? Should we be waiting for concepts to have been designed and piloted before measuring impact? Evaluation could first help implementers dynamically ‘crawl the design space’ by simultaneously testing alternative interventions and adapting the programme sequentially before scaling up and evaluating. Taking a stepped approach to evaluation would slow the process down and increase the costs and it was questioned whether there would be support for such an approach from implementers and funders. Alternative options such as ‘hybrid designs’, seek to blend effectiveness and implementation research to enhance the usefulness and policy relevance of clinical research.

Key messages

  • Evaluators should work with implementing partners to document changes, define the intervention at different stages and refine the theory of change accordingly. Mixed method approaches are most suitable to capturing change.
  • Clarify role of the evaluation for the implementers and the donors. Have honest discussions about the most appropriate timing of the evaluation and reassess the evaluation regularly as change occurs.
  • Clearer guidelines are needed on the most appropriate analysis in the context of change.

Further reading

Watch relevant seminars online