“Building the ship while it’s sailing”: the challenge of evaluating programmes that change over time
January 7, 2020
The Centre for Evaluation recently held a lunchtime seminar, in which we heard from five speakers who are undertaking evaluations of interventions that have changed. In this blog, we highlight the key challenges that change poses to traditional evaluation approaches.
There is increasing recognition that a flexible approach to intervention design and implementation is needed to address complex public health issues. Both DfID and USAID are investing in adaptive management in which an intervention and the implementation strategy are anticipated to evolve over time. There is also increasing use of human-centered design (HCD) or design thinking, a flexible and iterative approach, during the development and implementation of programmes. Programmes that change over time pose a challenge to traditional evaluation approaches, which are based on the assumption of a stable, well-defined intervention. This seminar brought together five speakers, each undertaking evaluations of interventions with both intended and unintended changes. The five interventions and the associated changes were:
- Adolescent 360 aims to increase uptake of modern contraception among girls aged 15-19 in three countries. The programme is using principles of design thinking to develop interventions that are adapted to the local context – as such different approaches are anticipated in each country. The evaluation is being designed and baseline data collection conducted in parallel to intervention design.
- The Tanzanian national sanitation campaign is an ongoing multi-component intervention that aims to encourage individuals to upgrade their toilet. The intervention is continually evolving and includes mass media campaigns and television shows. The evaluation, led by the EHG group at LSHTM, has just started.
- The Expanded Quality Management Using Information Power (EQUIP) initiative involved groups of quality improvement teams testing new implementation strategies to increase the coverage of selected essential interventions for maternal and newborn care. At the start of implementation, a defined number of essential intervention were selected. Ultimately, not all of the essential interventions were included. The change happened after the evaluation had started as a response to local context.
- The Safe Care Saving Lives (SCSL) initiative supported the implementation of 20 evidence-based maternal and newborn care practices. The intervention was intended to be rolled out in three phases with the second and third phase planned to be randomised to allow for a strong evaluation design. However, the implementation strategy was adapted in several steps, and operational needs made the original randomisation unfeasible. Further, the intervention received insufficient support from policy makers and health facilities. As a result, the second phase was never fully implemented and the third phase cancelled. The evaluation now seeks rather to understand why implementation was so difficult in the respective context.
- The Gombe partnership for maternal and newborn health (MNH) aims to strengthen health services through a package of interventions delivered by four NGOs. The intervention has started to be scaled up, including to comparison facilities, while the IDEAS-led evaluation is ongoing. In addition, components of the intervention have changed in response to emerging learning and changes in the national strategy for MNH.
Key challenges identified in the seminar to undertaking an impact evaluation in the context of change are (1) defining and re-defining the intervention (2) identifying the target population and an appropriate counterfactual; (3) determining outcome measures; and (4) purpose and timing of the evaluation.
Challenge 1: defining and re-defining the intervention
Describing the intervention was often challenging, especially where evaluation decisions are being made before an intervention and its implementation strategy have been finalised. In some cases, the evaluation team were the first to articulate a theory of change, mapping the different components of the intervention and its intended outcomes and impacts.
Both planned and unplanned changes to the intervention over the course of the evaluation were perceived not to be clearly documented and the task of capturing changes fell to the evaluation team, often piecing together retrospectively what had happened. In all cases a close working relationship with the implementer and the use of mixed methods were seen as essential to be able to articulate a theory of change and capture changes in implementation. The Adolescent 360 study includes a process evaluation, and the evaluators have found direct observation of team meetings and programming activities to be the most important method to capture changes as the design process is iterative and the process has not been well documented.
The question was raised during the discussion whether a detailed understanding of the intervention was needed. The response: ‘it depends’! The need to understand the intervention was considered to depend on: whether the desired change is the same in each setting and population sub-group; whether outcomes were anticipated to change over time; and the level of detail needed for the analysis.
Challenge 2: identifying the target population and an appropriate counterfactual
Changes to the implementation strategy had a big impact on the ability to select comparison sites and ultimately the study design.
Changes to the implementation strategy of SCSL meant implementers did not adhere to implementation sites that had been selected through randomisation. The evaluation considered a ‘dose-response’ analysis (measure the strength of the intervention delivered, to explore the association between implementation strength and outcomes). However, it proved challenging to define the ‘dose’ due to the intervention approach not being fully defined at the outset and evolving during implementation. Adolescent 360 plans to strengthen its pre-post analysis with a dose-response analysis. This method was considered, by those present, to warrant further consideration.
For a number of the evaluations it was not clear upfront where the intervention would be implemented. For example, the evaluators of Adolescent 360 considered approaches such as stepped wedge and regression discontinuity designs, but insufficient details on the implementation sites and the target population at the time the evaluation was being designed meant a pre-post cross-sectional analysis was the most appropriate approach.
In the case of the Tanzanian sanitation programme evaluation, the intervention is highly diffuse, different populations are anticipated to be exposed to different components and new components are continually being added. The evaluation team is hoping to follow individuals over time who self-report whether, and which, components of the intervention they have been exposed to, and correlate that with modifications to their sanitation facilities at home.
In the majority of the settings a further challenge identified was that countries have weak routine monitoring systems, which makes selection of retrospective controls challenging. Could a country-wide evaluation platform – as proposed by Victora et al. in 2011, Heidkamp 2017, Keita et al 2019 and others – in which different existing databases are integrated in a continuous manner, be a potential solution?
Challenge 3: selecting the right outcome measures
As evaluators we seek to understand whether an intervention had the intended outcomes. It is standard practice to publish an evaluation protocol, which specifies the key outcomes and the analysis plans at a study’s outset, ideally before the intervention has been implemented.
Where the evaluation is being designed in parallel to the intervention it can be challenging to identify the relevant outcomes. In the case of the Adolescent 360 project, the evaluation team had to grapple with designing an impact evaluation and undertaking baseline data collection before the intervention was designed. As a result, the baseline survey had to include a more comprehensive set of questions in an attempt to capture multiple potential intervention outcomes.
The EQUIP study worked with the implementation partners as the intervention was being finalised to identify four key outcomes. The study found an impact for only one of the four outcomes. This was, in part, explained by fact that the interventions that related to two of the outcomes were never implemented. This experience raised methodological challenges: should the analysis be based on what was intended at the study outset or should it be changed to reflect what the implementation team actually ended up doing? Ultimately the evaluation stuck to an intention to treat analysis and drew cautious conclusions.
It might be desirable to allow outcome measures to be added or dropped over time. However, a constant set of indicators is vital for understanding whether a programme achieved its intended goals. In his blog, Julien Barr suggests identifying bedrock outcomes, which don’t change over the course of the programme as well as including a basket of output indicators that can change.
Challenge 4: the right evaluation at the right time?
Are we trying to support decision makers to make day-to-day decisions or are we trying to examine the impact and cost-effectiveness of a programme? These were seen as two distinct aims requiring different approaches to evaluation. Evaluations are frequently commissioned for the latter but in practice are frequently needed for the former.
There was discussion during the course of the event that evaluations might need to do more to support implementation. As a result of implementation challenges the evaluation of SCSL placed greater emphasis on understanding why the intervention was not taken up coherently and implementation diluted. A close working relationship with the implementers and funders was seen as pivotal to this shift and ensuring the utility of the evaluation.
This led to questions around whether we are doing impact evaluations too early? Should we be waiting for concepts to have been designed and piloted before measuring impact? Evaluation could first help implementers dynamically ‘crawl the design space’ by simultaneously testing alternative interventions and adapting the programme sequentially before scaling up and evaluating. Taking a stepped approach to evaluation would slow the process down and increase the costs and it was questioned whether there would be support for such an approach from implementers and funders. Alternative options such as ‘hybrid designs’, seek to blend effectiveness and implementation research to enhance the usefulness and policy relevance of clinical research.
- Evaluators should work with implementing partners to document changes, define the intervention at different stages and refine the theory of change accordingly. Mixed method approaches are most suitable to capturing change.
- Clarify role of the evaluation for the implementers and the donors. Have honest discussions about the most appropriate timing of the evaluation and reassess the evaluation regularly as change occurs.
- Clearer guidelines are needed on the most appropriate analysis in the context of change.
- Challenges and opportunities in evaluating programmes incorporating human-centred design: lessons learnt from the evaluation of Adolescents 360
- Evaluating the impact of an intervention to increase uptake of modern contraceptives among adolescent girls (15-19 years) in Nigeria, Ethiopia and Tanzania: the Adolescents 360 quasi-experimental study protocol
- Effects of the EQUIP quasi-experimental study testing a collaborative quality improvement approach for maternal and newborn health care in Tanzania and Uganda
- Measuring implementation strength: lessons from the evaluation of public health strategies in low- and middle-income settings
- The rise of impact evaluations and challenges which CEDIL is to address
- Timely evaluation in international development
- Evaluation of the Safe Care Saving Lives quality improvement collaborative for neonatal health in Telangana and Andhra Pradesh, India: a study protocol
Watch relevant seminars online:
- Symposium on potential methods to support a more timely approach to evaluation in international development
- In a recent CEDIL talk Professor Charlotte Watts discussed the challenges of selecting a primary outcome and methodological advances need for evaluating complex interventions.
- Liz Allen discusses the potential for statistical process control to be used to support monitoring of output indicators during an evaluation.
- All of the evaluations used/are using, quasi-experimental approaches. To learn more about quasi-experimental approaches watch our series on Evaluation Using Observational Data.
Weekly links | Week of 1st July
July 5, 2019
Phew – it’s actually hot outside. If like me you’re not sure how to cope when summer finally arrives, perhaps these links will give you something to read in the shade.
- The hugely ambitious and hotly anticipated popART trial has been in the news (NPR), with an interesting take on the who, the whys, and the what nexts. The trial found that universal test and treat for HIV can reduce population-level incidence of HIV by meaningful levels.
- Alicia McCoy offers some candid thoughts on life as an evaluator in an NGO. She touches on something that seems to come up in discussions in the Centre: what is evaluation for? Evaluations, she suggests, are often seen as being about accountability, but should be used for more than that, including learning how to design better interventions next time. That’s more difficult, however.
- We’re a broad church, in the Centre for Evaluation, but as a rule most of us don’t explicitly use methods from economics and econometrics. Although sometimes suffering from language barriers, there’s a lot to like and learn from the econometrics field. For a sense of what’s out there, David Mckenzie has put together a list of technical topics.
- Twitter is a total waste to time that makes you feel bad about yourself, right? Wrong! A small corner of Twitter is bucking the trend and engaging in nice, supportive, and informative discussions. Yes, you guessed it, it’s the epidemiologists. Search #epitwitter to pull up posts on causal reasoning, coping with PhD stress, paper writing, and dogs. There’s even a #epibookclub where denizens of Epi Twitter are reading Nancy Krieger’s Epidemiology and the People’s Health together over the next couple of months (only just started). She’s even agreed to answer questions about the book at the end of the summer.
Weekly links | Week of the 17th June
June 21, 2019
Like this start-stop-start(?) summer we’re having in London, we’ve not always been able to throw together the Weekly Links blog post. We’ll try as hard as we can to put one out each Friday, but only when we’re found enough interesting material. Hopefully you’ll agree the five links below were worth the wait:
- We though that this blog was interesting, on the challenges of evaluating integrated care in UK. Eilís Keeble refers to multiple issues: data not always capturing the same populations, definitions of indicators changing over time, challenge identifying comparison places in a settings where multiple things are going on, etc. Probably sounds familiar to some of you.
- We’ve recently discovered a website called ‘Changeroo’ for developing theories of change. From the animated video on the homepage, and video of the software being used, it looks promising. Please be in touch if you have used it or are planning to. they also have a ‘ToC-Academy‘ with free tools and resources to develop and refine theories of change.
- From @fp2p on Twitter: “When will we get a report on your findings?”, reflections on researcher accountability from the DRC by Christian Chiza Kashurha. Lots to think about, including this scene: One day, I was passing back through [a research] community when suddenly I came across two of our former respondents. After some greetings, their words grew blunt: “Manake mulikuyaka tu tupondeya muda na kukamata maoni yetu nanjo muka poteya! Ju mpaka sai hatuya onaka mutu ana kuya tuambiya bili ishiaka wapi.” (“So basically, you just came here to waste our time collecting our opinions – and then that’s that: you disappeared! Because since then, we’ve never had anyone come back to tell us the outcome or results of what you were doing here.”) One of the two was very blunt indeed: “Si mulishaka kula zenu, basi muna weza tu kumbuka siye benye tulitumaka muna pata hizo makuta.” (“Now that you’ve gotten your food [i.e. been paid for your research], couldn’t you at least remember those of us who made that possible for you?”)
- Speaking of communicating results, Alexander Coppock has produced a paper on visualisation for randomised controlled trials, using R. He’s even published the code for the paper, here.
- By discussing the example of the Teen Pregnancy Prevention programme in the USA, and the low impact of the interventions, the Straight Talk on Evidence blog touch on a more general issue. They say the low impact was because the method for choosing interventions was not good. They contrast this with a different, more rigorous, approach. How interventions are chosen, and for what places, seems somewhat under researched, despite potentially huge influence over the effects observed.
Weekly links | Week of the 20th May
May 24, 2019
Whoops! Missed a week… sorry. Here are five stories of interest from the evaluation world.
- The IFS launched the IFS Deaton Review, with Nobel Laureate Angus Deaton (gettit?). The idea is to bring together people from many disciplines to ‘build a comprehensive understanding of inequalities in the twenty-first century’, as well as ‘to provide solutions’.
- However, not everyone was too impressed. 40 researchers criticised the make up of the panel, and Faiza Shaheen gave a more personal take on non-white exclusion (and the terms for inclusion) on panels such as these.
- Annette Brown has been looking at the literature on gender bias in grant proposal reviewing — summarising her thoughts here.
- With possible implications for other evaluation research, a new paper from IJE concludes that when estimating non-specific effects of vaccines there can be bias from right or let censoring, which needs to be accounted for.
- A review in JAMA Oncology looked at 143 anticancer drug approvals by the FDA and found that 17% of those approved had ‘suboptimal’ control arms. The implication being that the effectiveness of the drugs were being overestimated. This review emphasises the equal importance of understanding the intervention and the control arm; evaluations too often neglect to describe the control arms in much detail.
As always, please send ideas to email@example.com
Have a good long weekend!
May 15, 2019
In the last few years, a new field of ‘implementation science’ has emerged, which focuses on bridging the gap between efficacy and impact. Prof. James Hargreaves, Former Director of the Centre for Evaluation and Professor of Epidemiology and Evaluation at LSHTM, has been invited on different occasions to give a presentation on implementation science in HIV research.
|Demystifying Implementation Science||This introductory video, produced for the ViiV AIDS 2018 Pre-Conference Workshop in Amsterdam, aims to demystify the term 'Implementation Science' and introduces examples of work that address key questions in this area.||Watch the video online|
|What is Implementation Science?||This talk, presented at the inaugural Implementation Science Network meeting, which took place at the 9th International AIDS Society Conference in Paris, describes how implementation science is defined in the HIV literature, how implementation science questions should be framed, and methods that could be used to yield rigorous results.||Watch the video online|
|Implementation Science Trials: Do the rules of RCTs apply?||This presentation, from the Conference on Retroviruses and Opportunistic Infections (CROI) in Seattle, outlines the aims of implementation science, the rules of Randomised Controlled Trials (RCTs), and identifies four adaptations to the conduct of RCTs that are relevant in the Implementation Science setting.||Watch the video online|
Weekly links | Week of the 6th May
May 10, 2019
It’s been a short week here in the UK, but with four seasons in four days it feels like a while since Monday. Here are five evaluation-related links to start the weekend.
- Canada’s International Development Research Centre (IDRC) have announced a new tool for assessing evidence that gives more weight to research that accounts for context articulates dimensions of quality. The idea is that this will move away from traditional metrics that preference literature in American and Western European journals and raise the profile of Southern-only research. They published an article on their approach in Nature.
- Ultra-poor graduation programmes have been causing a lot of discussion for a while in the development sector. There’s been a lot of good evidence produced through rigorous evaluation. To add some depth to the findings, qualitative research is being published, and is described here on the World Bank’s blog.
- Difference in difference models are special cases of lagged regression — or are they? See blog and comments for discussion.
- A (very sad) cautionary tale: warning people not to drink arsenic-contaminated water in Bangladesh may have increased child mortality by 45%; other options were contaminated with human waste. A reminder that interventions have potential to do harm, which should be captured in evaluation.
- Using evidence from 16 studies, researchers have found that people will often object to randomisation to see which of two policies is better, even when there is no reason to pick one policy over another. As they say, ‘This experimentation aversion may be an important barrier to evidence-based practice.’
Weekly links | Week of the 29th April
May 3, 2019
Hi again! Last ‘week’ was just three days long because of LSHTM’s generous Easter break, and there wasn’t quite enough time to get through emails, let alone collect five links to share on the blog. This week, however, there’s a lot to have a look at:
- First, and most important, is the Centre for Evaluation termly newsletter — check it out here, and have a think about sharing your own work with our members in the next newsletter in a few months.
- Taking Twitter to the next level, @statsepi Tweeted a thread that uses simulations to show that when adjusting for covariates in randomised trials it’s not the baseline imbalance that’s important but the degree to which covariates predict the outcome at the end.
- Although perhaps a minority in the Centre for Evaluation, epidemiologists have a big influence on how we conduct research at the School. Which is why you might find a collection of think-pieces on the Future of Epidemiology in the American Journal of Epidemiology interesting. There’s one about teaching, which thinks about how we talk about causation in epi teaching, which has resonance with the evaluation field.
- Researchers Julian Kolev, Yuly Fuentes-Medel, and Fiona Murray have looked at gender disparities in appraisals of ‘innovative research grant proposals submitted to the Gates Foundation from 2008-2017’. They found that ‘despite blinded review, female applicants receive significantly lower scores, which cannot be explained by reviewer characteristics, proposal topics, or ex-ante measures of applicant quality’. They attribute this to differences in communication styles (and presumably preferences for particular styles on the side of the reviewers).
- Finally: would we be more productive in monasteries? While academia used to be associated with religious orders, now our daily lives are far from the quiet introspection and isolation that used to be practiced. Cal Newport wonders if the lengths taken to concentrate fully on spiritual insights were actually necessary to overcome our natural limitations, and that constant email and open-plan offices might be keeping us from work satisfaction.
Weekly Links | Week of the 8th April
April 12, 2019
This week we have a couple of papers from Epidemiology, an intro/thoughts on coding in qualitative research, an invitation to join the School’s R-users group, and more. Please send all suggestions to firstname.lastname@example.org before 9am on Friday.
- Ever the engaging speaker, David Speigelhalter has a recorded talk at LSE called Learning from Data: the art of statistics. Definitely worth a listen when on a coffee break.
- Eleanor Murray and team at Harvard are proposing guidelines for more informative causal inference in pragmatic trials. They are looking for feedback on draft guidelines so be sure to click on the link and send your thoughts.
- Sonja Swanson has an engagingly-written piece on the threats of bias when using instrumental variables in Epidemiology, and Sam Harper takes a Bayesian approach to evaluating seat-belt policy and its potential to reduce road deaths (he describes the typical frequentist approach to this problem as ’empirically absurd, given what is already known from prior studies’ — intrigued?)
- At the BetterEvaluation blog, Helen Marshall shares some practical insights into coding while doing qualitative research.
- The statistical software/coding language R has many benefits for evaluation, such as beautiful charts, interactive maps, reproducible reports, and oh — it’s free. The School now has a flourishing ‘R Users Group’ with over 200 subscribers to the mailing list. They meet once a month and one or two members share features of R. The group is for advanced users and total novices alike — if you’re at all interested, follow the link above and sign up!
Weekly links | Week of the 1st April
April 5, 2019
Here are five blogs or papers that we’ve found interesting this week. Remember — please send us whatever you’re reading so we can share with the Centre members. Have a great weekend!
- ‘Development interventions have similarities to medical treatments: if you treat superficial symptoms rather than the underlying pathology, or if you give the wrong medicine, you will not cure the illness.’ This, and much more excellent advice/reminders from Marie Gaarder in a new commentary.
- An older paper from Penelope Hawe looked in-depth at how a control group in an Australian trial understood being ‘controlled’, and how this might have biased the results (towards no effect).
- Synthesis Theme Leader, Kathryn Oliver, has just published an article on the nuances of ‘co-production’ in research, asking: do the costs outweigh the benefits?
- More thoughts on the statistical-significance debates. Andrew Gelman wonders why he’s bothering to weigh-in, when this debate has raged on for many decades. He thinks he, and his collaborators, have alternatives to offer.
- On the World Bank Impact blog, Markus Goldstein summarises some of the work economists have been doing to understand how ‘edutainment’ can reduce (attitudes about) intimate-partner violence.
Weekly links | Week of the 25th March
March 29, 2019
Another week, another list of evaluation links. If you have anything you think others might be interested in, please send them over to Calum.Davey@lshtm.ac.uk and I’ll include them in the following week:
- In a post at the World Bank’s Development Impact blog, Berk Özler takes a skeptical look at modern econometric attempts to create ‘synthetic’ control groups instead of randomly allocated ones.
- David Fetterman is interviewed for the BetterEvaluation site, arguing that evaluations should be ‘unboxed’ (skip over the clunky YouTube ‘unboxing videos’ analogy) by empowering the ‘community’. Seems there is a lot going on between the lines here, with reference to a debate going back to 1993!
- As Centre members have argued in the recent past, Robert Crease discusses the care needed to ensure that science and evidence has appropriate authority in the days of climate and vaccine denialism.
- In addition to the crises of p-values, Brexit, and climate change, the Cochrane Collaboration is having its own internal schisms. Where better to read about academics falling out than in a paper on the matter that ‘begins from the philosophical position that reality is multifaceted and multilayered’? To be continued, I’m sure.
- A ‘stakeholder survey’ (that includes you!) is being conducted between 22 March and 05 April (follow this link to take part) regarding the draft update to the MRC guidance on Developing and Evaluating Complex Interventions. The MRC guidance was a big topic of discussion at retreat last year (read a summary here)
Have a good 47 hour weekend, and enjoy the weather!
Weekly links | Week of the 18th March
March 22, 2019
A few weeks have passed since the last update, apologies. Here are four interesting evaluation-related links to finish the week:
- Big claims from Caroline Heider, Director General Evaluation at the World Bank Group, about a ‘Copernican’ moment for evaluation methods in international development as the famous ‘DAC’ criteria are reconsidered.
- Many 100s of researchers (including some of LSHTM’s leading statisticians) — led by a group including Modern Epidemiology‘s Sander Greenland — have signed a declaration rejecting statistical significance, although not everyone is convinced that this is how science is supposed to work.
- Part of recognising Feminist Issues in Evaluation, the American Evaluation Association published a series of blog posts, including ‘Data is Not Objective: Feminist Data Analysis by Heather Krause’.
- A new (free) book describes the ‘Qualitative Impact Protocol’, which promises methods to attribute impact without comparison groups.
Reflections on the Biennial Retreat: Places, Spaces, and Contexts
March 18, 2019
The Centre for Evaluation (CfE) held its annual retreat on Thursday, December 6th, focusing on the theme of “Places, Spaces and Contexts.” Held “off campus” in the Hatton Gardens area, the retreat brought CfE members together for a day of discussions, networking, and musings on all and any issues raised regarding evaluation theory and practice at LSHTM.
We started the day with a set of 8 “speed talks” through which CfE members presented ongoing existing evaluation work and described how they accounted for or were challenged by context. For example, interventions developed through “human centred design” by definition will vary in each location where implemented, yet a standardised evaluation is expected across programme sites. The presenters described similar experiences of really needing to “drill down” into what happens at local level through participatory and process-oriented methods. Yet methods themselves need to be carefully matched to the social environment in which they are meant to capture key variables – this was illustrated by a study in which girls’ school attendance was an important outcome measure. While researchers initially thought girls might over-report attendance at school, in reality some girls hid from the researchers and marked themselves absent if they were worried they hadn’t completed study-related tasks, thus potentially leading to under-estimates of their attendance. In all the cases presented, strong formative work and mixed method process evaluations were highlighted as ways to track the realities of intervention delivery on the ground.
In the next session, three LSHTM researchers gave insights into different “place” related influences on research practice. First, Chris Grundy’s talk, “Why maps matter” grounded health evaluation in physical geography. The diseases and social phenomena we often try to measure have spatial distributions that can deepen understanding of how they work, and rapidly developing technology provides new opportunities (but also new ethical dilemmas) to what can be mapped and visualised. Unfortunately, while interest in the use of GIS, including open source data and electronic data collection methods increase, funding for ensuring there is a GIS specialist involved in health evaluation projects does not reflect this.
Next, Catherine Pitt gave an overview of “Economic Evaluations of Geographically Targeted Interventions,” highlighting the importance of good costing data to help prioritise use of scarce resources. Yet economic costs are highly context-specific, making it difficult to transfer findings from an economic evaluation in one place to targeted programmes elsewhere. She gave examples of how carefully designed cost modelling could be used to tackle the “transferability challenge” by showing the relative merits of different configurations of health packages.
Finally, Kathryn Oliver talked about the gap between evaluation evidence and its uptake and use by policy makers. In her talk entitled “What Makes Evidence Credible?” she highlighted the way that researchers and policymakers speak different languages and value different types of evidence and styles of persuasion. She illustrated how researchers can sometimes become ‘Rapunzel in the Ivory Tower’, feeling uncomfortable with the idea of making health information more anecdotal, emotional or engaging. Using analysis from her work on policymakers’ use of evidence in decision making, she urged researchers to become better at accepting political structures and processes, and make “professional friendships” to bridge the evidence-policy divide.
Following a massive and delicious lunch, Cicely Marston and Chris Bonell ensured we didn’t get too sleepy by engaging in a lively debate about the need for structured process evaluations. Although both admitted they agreed more than they disagreed, they encouraged discussion within the group by positing the use of pre-defined process evaluation frameworks (such as those developed by the MRC and realist evaluations) against a less structured and positivist and more participatory approach. Drawing on examples from their own work, Chris and Cicely talked about the merits of defining and identifying constructs such as context, mechanism, and outcomes versus working with communities most affected by interventions shape the nature of the research questions, and highlighted how poorly designed process evaluations can risk collecting too much data that then never gets analysed properly, or superficially conducting qualitative research without the requisite skills for meaningful analysis. This session encouraged discussion across all retreat participants about timing, design, and use of good process evaluations.
In our final session, we hosted guest speaker Rachel Glennerster, Chief Economist at the Department for International Development. Rachel presented her work on the “Generalizability Puzzle” – considering how data on successful interventions from one setting might usefully be applied to successful implementation elsewhere, without the need for conducting expensive and time-consuming randomized controlled trials in every possible context. Based on published work, Dr Glennerster emphasised the need for rigorous evaluation trials to demonstrate “proof of concept” and lead to the development of theories about human behaviour and effectiveness of development programmes and policy, but that these theories need not be tested in every local environment. Instead, successful application of theoretically-driven interventions requires good understanding of local institutions and social organisation, but general patterns and trends gleaned from global knowledge should be trusted as broadly generalizable. Her proposed “generalisability framework” calls for more exploratory and descriptive research to check whether the conditions for any given theory behind a successful intervention are present in a new context for its implementation, and to gather the necessary data for both refining and then evaluating local intervention design. This final session was moderated by Professor Anne Mills, and was followed by a social reception for more informal discussions and wrapping up the day.
Weekly links | week of the 11th of Feb
February 15, 2019
Seven days have flown by; here are this week’s links:
1. In a short post, Kylie Hutchinson offers some pithy tips on developing a post-evaluation action plan, with links to resources.
2. Taken from the weekly links on the World Bank’s Development Impact blog, Rachel Glennerster reflects on one year as DFIDs chief economist. Echoing Geoffrey Rose, she notes the importance of effect at scale: ‘it is far better to achieve a 10% improvement for 1 million people than a 50% improvement for 1,000’. This idea reflects the original — epidemiological — use of ‘impact’ (as opposed to effect), which is a function of the effect of an exposure and its prevalence, that has been lost in the modern use in ‘impact evaluations’ used to estimate programme effects.
3. Speaking of effects, Judea Pearl and Dana Mackenzie have a relatively new book out, The Book of Why, reviewed here in the NYT. Pearl and colleagues have spent the last few decades developing a complete language of causal analysis. Writing for computer scientists and mathematicians (as well as public health researcher, sometimes), their published academic work can be quite impenetrable, so it’s exciting to see them produce an introductory book.
4. In a self-reflective blog, three evaluation experts reflect on how equity can be a leading principle in evaluation. There are helpful links to further reading at the end.
5. Finally: can statistics indict President Trump? Maybe: the authors of this blog post use simulations to show that it is unlikely (given their model) that the payments we know were paid to Stormy Daniels didn’t come from the Trump campaign (note the double negative). How is this related to evaluation? Tenuously; but it reminded me of a an interesting paper in AIDS, by Marie-Claude Boily et al., that used mathematical models to investigate the plausibility that observed changes in the prevalence of HIV at antenatal clinics was due to interventions with sex workers in Karnataka, India. Even with ‘optimistic prevention parameters’ their results suggested that the changes couldn’t be entirely due to the sex-worker interventions. I’ve not seen many examples of this kind of plausibility testing with mathematical models.
Have a good weekend!
Weekly links | week of the 4th Feb 2019
February 8, 2019
We’re trying something new: five links to interesting articles, blogs, videos, and other resources about evaluation every Friday, every week.
We hope you’ll learn something; if you come across anything that you would like to share, please send a message to email@example.com with the title ‘links’ — thanks! Here’s this week’s list:
- Authors from Cardiff University, with our own Chris Bonell, rethink evaluation with complex systems in mind. With lots of examples, they make practical recommendations, including arguing ‘that … acknowledgment of complexity does not mean that evaluations must be complex, or investigate all facets of complexity’.
- Along similar lines, the team at Better Evaluation have a short blog on ‘demystifying systemic thinking’. They cite Professor Thomas Schwandt as saying that our evaluations are happing in ‘post-normal’ times, which is less about straight-forward problem-solving and more about embracing complexity, plurality, democracy, and context responsiveness. The blog refer to a new resource: Inclusive Systemic Evaluation for Gender Equality, Environments and Marginalized voices (ISE4GEMs) that offers ‘offers an alternative way of thinking and planning about evaluation practice and its application to complex (messy/wicked) problems.’
- In the journal Epidemiology, authors David Rehkopf and Sanjay Basu explain the synthetic control method for quantitative case-study impact evaluation. The method has been used by CfE member Aurélia Lepine and co-authors to estimate the impact of removing user charges for health care in Zambia.
- On the World Bank Impact blog, David Evans reviews The Goldilocks Challenge by Mary Kay Gugerty and Dean Karlan. The book discusses the balance between traditional monitoring and evaluation for NGOs and more recent trends towards impact evaluation.
- There’s been a fair amount of discussion of Angus Deaton and Nancy Cartwright’s Social Science and Medicine paper on the limitations of randomised controlled trials for informing policy. The journal published a large number of commentaries on the paper, all of which are interesting to read, although counterintuitively perhaps the best way of getting into the debates is to read the authors’ response to the commentaries first to get a sense of the debate.
That’s it for this week! Please do send anything that you would like to share.
Process Evaluation skills-building workshop, Zambia
January 14, 2019
The Centre for Evaluation is launching a series of skills-building workshops around the world, organised in partnership with our collaborators, and held in locations where LSHTM staff are based or regularly visit. In the first instance, the thematic focus is Conducting Process Evaluations. The Centre has developed a short curriculum comprising 3 powerpoint presentations, 3 case studies and related group work exercises – these can be adapted to fit local contexts and interests, and it is expected that local researchers will also present their work and facilitate discussion. We have two sample agendas to help structure 1- or 2-day workshops. Process evaluation theme lead Bernadette Hensen recently delivered a workshop in collaboration with Zambart in Lusaka, Zambia.
The first Centre for Evaluation supported skills-building workshop on Process Evaluation was held on 6 December 2018 in Lusaka, Zambia. The workshop was run in collaboration with Zambart, a research organisation established in 2004 and LSHTM collaborating partner. To keep costs to a minimum, the workshop capitalised on my being in Lusaka to work with colleagues on a formative research study.
The workshop saw 12 participants, from Zambart, the University of Zambia, other stakeholders and researchers, come together to build new skills and share experiences of process evaluation. Musonda Simwinga and I ran the workshop, with support from Ginny Bond. In the morning, participants discussed what process evaluation is, why process evaluations are useful and when they can be used. We also discussed evaluation frameworks, including logic models, as useful models to map intervention pathways and guide process evaluation design. Zambart has extensive experience running large cluster-randomised trials, including ZAMSTAR (Zambia and South Africa TB Reduction Study) and the HPTN (071) PopART (Population Effects of Antiretroviral Therapy to Reduce HIV Transmission) trial. In these trials, rich quantitative and qualitative data was collected alongside data to evaluate the impact of the interventions on primary outcomes. These data were collected to understand, for example, the effect stigma had on participants engagement with the intervention, intervention acceptability and the important relationship lay counsellors have established with participants including the influence this has on introducing innovations, such as HIV self-testing. We discussed how much of the research conducted by Zambart within trials is implicitly about process, and that making this more explicit would provide a guiding framework for exploring assumptions about how interventions works and valuable information for scaling-up interventions that are successful and how to modify interventions where there is no impact. It would also ensure a more genuine trans-disciplinary engagement. Two participants were experienced in realist evaluation, including Chama Mulubwa, manager for a series of HIV self-testing case studies called STAR and research degree student at UMEA University and Dr. Joseph Zulu, Assistant Dean at the School of Public Health, University of Zambia. The participants discussed the overlap between realist and process evaluation, and where these two fields were distinct.
After discussing process evaluation more generally, we discussed indicators and tools. In this discussion, process evaluation was seen as heavily quantitative, particularly when measuring intervention implementation and reach. We discussed where along a logic model and how qualitative concepts and data collection tools can complement quantitative data collection. In this session, Mwelwa Phiri, a prospective LSHTM research degree student, presented her ideas for a process evaluation to be embedded within a trial of sexual and reproductive health services for adolescents and young people in Lusaka. The workshop ended with a discussion about when and how to analyse data arising from a process evaluation, and how process evaluation could be embedded more explicitly in our work.
Having worked with Zambart since 2009, this workshop was the first workshop I’ve organised with Zambart. It was a great opportunity to discuss thinking related to this concept and debate how it aligned with work already ongoing at Zambart, but also with related concepts such as Monitoring & Evaluation and Realist Evaluation, and whether qualitative data could be defined as “routine”. The workshop was stimulating and engaging, albeit a bit rushed with the workshop held over one day. We also wished that more colleagues from the School of Public Health at the University of Zambia could have attended. We are hopeful that the workshop may lead to a similar seminar or workshop being held at the University of Zambia, and will expand to include a workshop on quantifying impact using quasi-experimental designs. We also hope to carry out a process evaluation on a joint proposal and grant.
Similar workshops are being planned for Ethiopia, South Africa, Zimbabwe and beyond. These are designed to be low-cost, taking advantage of existing LSHTM presence and travel, and sharing logistical costs with local partners. If you are interested in organising a Process Evaluation workshop (or designing a different skills-building event) please get in touch (firstname.lastname@example.org)!
October 3, 2018
Members of the Centre attended and presented at the Health Systems Conference in Liverpool in October. The focus was on advancing health systems for all in the sustainable development goals (SDGs) era. The Centre produced a conference the map to highlight key sessions on quantifying impact, understanding implementation and evidence synthesis. The map includes session from different organisation and across different health topics.
We pick out some key publications from the week:
Alliance for Health Policy and Systems Research’s new methods guide for synthesizing evidence from health policy and systems research (HPSR) to support health policy-making and health systems strengthening.
Alliance for Health Policy and Systems Research’s health policy analysis reader: The politics of policy change in low- and middle-income countries.
Perez MC (2018) Comparison of registered and published intervention fidelity assessment in cluster randomised trials of public health interventions in low- and middle-income countries: systematic review.
Theobald S (2018) Implementation research: new imperatives and opportunities in global health.
The Lancet Global Health Commission on High Quality Health Systems in the SDG era.
Complex Interventions? Insights through Process Evaluation
March 9, 2018
By: Queena Luu
LSHTM MSc Public Health, Health Service Research, student
Public health deals with a variety of interventions that interact not only with the healthcare system but the broader social, political, and economic contexts. Thus, large scale health interventions can be inherently complex with a range of components, outcomes, and stakeholders involved.
Assessing the effectiveness of interventions directs attention to measurements of outcomes. Process evaluation expands on such findings by answering questions beyond ‘does this intervention work?’ to ‘what components of the intervention lead to the results?’ and, ‘how implementation strategies affect outcome measures?’
Dr. Stefanie Dringus from LSHTM gave a one-day training on operationalizing process evaluation. Attendees were introduced to the key domains of process evaluations: building strategies to address context, implementation, and mechanisms of impact. The training led to discussions about the degree to which interventions are adapted to the local context and how focus groups may reveal what aspects (‘active ingredients’) of multi-component interventions were most impactful for participants.
After reviewing the theory, attendees were given the chance to simulate the development of a process evaluation for a hypothetical community role model-based intervention. The intervention had the goal of reducing risk factors associated with sexual behavior among young people. In small groups, attendees developed logic models and discussed methods to evaluate the process through which the intervention was implemented.
Groups brought up various issues associated with measuring primary and secondary outcomes because the intervention had many seminar-based components. My group also touched upon the need to use mix methods to understand the mechanisms of implementation including using focus groups and in-depth interviews to understand how receptive young people were to coach-led discussions that addressed sensitive topics. We also discussed how the coverage of the intervention, and how data is collected, can influence the outcome measures.
After reconvening, various groups presented their logic models. Each group had approached the process evaluation design from different angles – from constructing detailed tables for each of the research domains to using highlighters to cross reference different research questions and their associated process evaluation methods that need to be explored.
At the conclusion of the workshop, there was a rich discussion on how the roll-out and data collection for intervention studies need to be placed in the local context. Complex interventions are embedded into various layers of the community, and it is important to be critical about what questionnaires are administered, how rapport affects responses, and how the process evaluation team, the outcome evaluation team, and the community are communicating with each other.
Process evaluation is one key component into understanding how and why interventions work – this is especially important given the increasingly complex interventions that are being implemented to address determinants of health.
Blog reports on a student workshop on process evaluation organized by the Centre for Evaluation’s. Read more about process here.
Data II Action – Beginning of the road. Putting data to work means saving lives
February 8, 2018
By: Yasmin Hussain Al-Haboubi
LSHTM MSc Global Mental Health Student
Population Service International (PSI) carries out their health interventions much like a fortune 500 company would go about their business. Whilst this may seem counterintuitive to the compassionate nature of health, it is an approach that works well.
“Compassion without knowledge is ineffective”; Frederik Weisskopf’s 1998 quote still stands today, despite our methods of knowledge appraisal having jolted significantly over the past 10 years.
On 1st February, Christina Luissiana connected via skype to LSHTM. Her clear, engaging talk took the room through PSI’s approach to the use of routine data to evaluate health interventions. PSI were early adopters of the latest health informatic system. The District Health Information Software (DHIS2) is an open source, multi-platform system created by the University of Oslo that enables governments and NGO’s to collect and analyse health intervention data. It is provided without a licensing fee, meaning 47 countries currently use it as their main health management system. PSI’s ‘data to action’ approach aims to inform Ministries of Health of the appropriate policy recommendations based on real time data. Christina presented a Malaria Case Surveillance case study from the Greater Mekong Subregion, where a visually clear and user-friendly app has been developed to track malaria cases in non-public health facilities.
Once notified of a potential malaria case, the data collector responds (smartphone in hand) to the health centre. The app collects seven items of patient information; age, sex, malaria test result, if they had received treatment, travel history, occupation (at risk occupations were noted) and phone number. Once the data is generated and uploaded to DHIS2 dashboard, it serves two purposes:
- Mapping malaria cases and helping identify ‘hot spots’. In the Greater Mekong Subregion the highest number of potential cases have been found close to the borders. From such geographic mapping, governments have the possibility to make informed decisions about the supply and distribution of anti-malarial drugs. This acts as a potentially cost-effective measure to LMIC Health Ministries. This has yet to be monitored fully but could be a good scope for future research.
- Generation of information in real time can be used to identify emerging outbreaks.
This proves a key feature of DHIS2, there is a balance between customisation and standardisation. The interoperability of the app means it can overlay other platforms and software systems, particularly systems that link back to each Ministry of Health.
The rapidity of the data analytics means that there is scope for DHIS2 to be used in humanitarian emergencies and disease epidemics. PSI do not do so themselves, however Christina pointed to the use of DHIS2 in Madagascar, in fighting seasonal epidemics of the plague (bubonic, septacemic and pneumonic). According to WHO, 95% of those in contact with the plague, as identified by DHIS2, were provided with antibiotics.
Following the presentation there was a lively debate surrounding DHIS2 use ranging from the pragmatic to the ethical.
Logistically, in LMIC the reality remains that remote and rural areas will often have problems with both mobile connectivity and electricity. Both being paramount to the collection of raw data. To address this the University of Oslo’s web architects and configurators created a platform wherein the app can be used offline in low connectivity areas, and when reconnected will upload the information to the platform. And if all else fails, pen and paper have served people well for generations.
However, there were questions raised by the audience surrounding how PSI works to ensure ownership of data for the Ministries themselves. This was proven much harder to answer than the logistic questions.
It was suggested that potentially Ministries of Health need to be persuaded to want to use data. Public sector workers in LMIC’s are already overburdened, so they may not want to engage in data visualisations, as this is another layer to their already heavy workload. To combat this, and make health intervention data engaging and interesting, PSI have taken lessons from social media – #majorkey. In the dashboard, people can have conversations about the data, they can also ‘tag’ others. This capacity building tactic has seen slow but certain progress in data engagement and interest.
PSI are working to push data out of their DHIS2 systems, to national health systems. This brings the country back into focus, rather than PSI itself. The contextual framework of the nation’s health system should always be taken into account. Action looks different in different settings, and evidence-based healthcare should reflect this.
Blog reports on the Centre for Evaluation’s Student Seminar, Delivering evidence-based health interventions. Population Services International’s (PSI) unique approach to development, on 1st February 2018.
Monitoring and evaluation: an insider’s guide to the skills you’ll need
November 2, 2016
In international development, everyone knows that good intentions are simply not enough. It is critical to agree on appropriate aims and then make sure that these can be achieved efficiently.
There are several different ways to achieve development goals. Take malaria, for example: approaches might include investing in vector control (reducing numbers of malaria-carrying mosquitoes); ensuring that people can access bednets; providing education on how to avoid contracting the disease; making chemoprophylaxis (prevention medication) more accessible; or treating malaria cases with better drugs, to name just a few.
We know that some ways of dealing with development challenges, such as malaria, will be more successful than others. Some approaches will have unintended consequences, they will vary in cost and will work in certain places but not in others. So how can those designing interventions decide which approaches to choose?
This is where evaluation studies come in: they aim to help development actors make the best choices. Evaluations can be used to improve programmes as they roll out and/or can try to estimate whether and how particular aims were achieved and whether this was better and more cost effective than other courses of action.
In order to design, run or interpret evaluations, budding development professionals need an understanding of the following.
1. Research study design, outcome measurement and statistical methods
Development programmes are often complex, but this does not mean that scientific methods such as experiments and careful analysis can’t aid a better understanding of whether programmes achieve their desired impact.
2. Social science methods
Development interventions depends on the complex interaction of multiple stakeholders and institutions. People’s goals and incentives differ, power is exercised and resisted in myriad ways, and choices are constrained by poverty or gender inequalities. Social science methods are required to make sense of these complexities to enable more effective implementation.
3. Cost-benefit analysis
When deciding how best to allocate limited resources, those designing interventions must be able to estimate the costs as well as the consequences of different programmes to ensure they get value for money. Cost-benefit analysis can also be used to compare programmes across different sectors, for instance, comparing heath and education interventions.
4. Evidence-based decision making
Understanding what is already known is essential to avoid duplication. Synthesising evidence means pulling together all that has been said about a subject, making judgments about what bits of information are most useful, summarising this evidence, and planning new studies that focus on the most important contributions.
Teams of development professionals will need all these skills to varying degrees. For instance, evaluation experts need to be able to design and implement evaluation studies, while programme managers offer the best perspective on what interventions may be feasible and need to know how to commission and interpret evaluations.
But it is not only development workers who need evaluation skills. Evaluation is about accountability, identifying waste and avoiding harmful effects, and so these skills will also be essential to enable civil society, democratic representatives, and government officials to hold NGOs and other development actors to account.
Where can you learn these skills?
Over the past few years, evaluation courses have mushroomed in institutions all over the world, ranging from full degrees to short courses, face-to-face or via distance learning, at various levels of difficulty. Some examples are listed below.
Evaluation skills are also developed and championed within organisations through on-the-job and peer-to-peer learning. It is great to see growing commitment within international development organisations and donor agencies to developing key evaluation skills for their staff. After all, as management consultant Peter Drucker said: “What gets measured gets managed”, and development matters too much to not be properly managed.
Some examples of training courses in impact evaluation – the list is not exhaustive.
- Evaluation for development programmes at London International Development Centre
- Impact evaluation design at Institute of Development Studies
- Impact evaluation for evidence-based policy in development, University of East Anglia
- Planning, monitoring and evaluation for complex development programmes, University of Bologna
- Building skills to evaluate development interventions, International Programme for Development Evaluation Training, Ottowa, Canada
- Impact evaluation collaborative, University of California, Berkeley
- Impact evaluation of interventions addressing social determinants of health, London School of Hygiene & Tropical Medicine
- MSc impact evaluation for international development, University of East Anglia
- Diploma in public policy and programme evaluation, Carleton University
- Graduate certificate in project monitoring and evaluation: course descriptions, American University
Conferences and seminars
- J-Pal workshops
- Annual colloquium, Campbell Collaboration
- Making impact evaluation matter, Asian Development Bank and 3ie
- 3ie monthly seminar series: Delhi, Washington and London
Lively Discussions and Engaging Ideas at the LIDC and The Guardian Debate on Aid
November 8, 2016
On Thursday 27th October the first debate, organized by the London International Development Centre and The Guardian, of the Development Debate Series took place discussing the theme of aid and asking- are we getting aid right?