How many times do you have to test a program before you’re confident it will work somewhere else?

I heard this question at an impact evaluation training event a few weeks ago. I’ve heard some variation on it many times. Wouldn’t it be grand if there were a magic number? “5 times. If it works 5 times, it will work anywhere.” Alas, ’tis not so.

But Mary Ann Bates and Rachel Glennerster have a good answer in their new essay in the Stanford Social Innovation Review:

Must an identical program or policy be replicated a specific number of times before it is scaled up? One of the most common questions we get asked is how many times a study needs to be replicated in different contexts before a decision maker can rely on evidence from other contexts. We think this is the wrong way to think about evidence. There are examples of the same program being tested at multiple sites: For example, a coordinated set of seven randomized trials of an intensive graduation program to support the ultra-poor in seven countries found positive impacts in the majority of cases. This type of evidence should be weighted highly in our decision making. But if we only draw on results from studies that have been replicated many times, we throw away a lot of potentially relevant information.

Read the whole essay or my blog post on other aspects of the essay.

Researchers as healers or witches?

“A researcher [mtafiti] is an important person because he indeed is the one who discovers everything [anayegundua kila kitu].” – Mzee Thomas Inyassi

Melissa Graboyes describes how research participants in Tanzania see the medical researchers who come to them for samples and information. On the one hand, “East Africans noted the similarity between researchers and doctors: they both gave out medicine and helped the sick recover.” On the other hand…

As healers and witches are understood to rely on the same skills, once researchers were compared with healers, it was not such a stretch to compare them to witches. … Witch doctors often work at night and want blood. … Researchers also worked at night, collecting blood samples by going door to door or collecting night-biting mosquitos by walking around in the bush. For both witches and researchers, blood was valued above all other substances and its use was shrouded in secrecy.

This, from Graboyes’ intriguing book The Experiment Must Continue: Medical Research and Ethics in East Africa, 1940-2014.

Lest you think this is limited only to medical research, consider the following passage from Kremer, Miguel, and Thornton’s randomized evaluation of a girls’ scholarship program in western Kenya:

There is also a tradition of suspicion of outsiders in Teso, and this has at times led to misunderstandings with NGOs there. A government report noted that indigenous religious beliefs, traditional taboos, and witchcraft practices remain stronger in Teso than in Busia (Were, 1986).

Events that occurred during the study period appear to have interacted in an adverse way with these preexisting factors in Teso district. In June 2001 lightning struck and severely damaged a Teso primary school, killing 7 students and injuring 27 others. Although that school was not in the scholarship program, the NGO had been involved with another assistance program there. Some community members associated the lightning strike with the NGO, and this appears to have led some schools to pull out of the girls’ scholarship program. Of 58 Teso sample schools, 5 pulled out immediately following the lightning strike, as did a school located in Busia with a substantial ethnic Teso population. (Moreover, one girl in Teso who won the ICS scholarship in 2001 later refused the scholarship award, reportedly because of negative views toward the NGO.)

Witches or healers?

One take away from this is that researchers need to do more to make sure participants understand what they are participating in.

The Promise of Teacher Coaching and Peril of Going to Scale

This, from Matt Barnum’s review of a recent meta-analysis on teacher coaching, by Kraft, Hogan, and Blazar.

First, what is coaching?

Teacher coaching involves “instructional experts work[ing] with teachers to discuss classroom practice in a way that is” individualized, intensive, sustained, context-specific, and focused.

What’s the finding?

“We find large positive effects of coaching on teachers’ instructional practice,” the authors write… Similarly, the 21 papers that looked at student achievement found notable positive results, on average.

But before you get too excited

When the research examines large-scale programs (with more than 100 teachers involved), the benefits, relative to small coaching initiatives, are cut roughly in half. 

Read the whole article or the meta-analysis itself.

Cash Transfers and Health: Evidence from Tanzania

My paper, “Cash Transfers and Health: Evidence from Tanzania” (with Holtemeyer & Kosec of IFPRI) has been published in the World Bank Economic Review. The paper is attached. Here is the abstract:

How do cash transfers conditioned on health clinic visits and school attendance impact health-related outcomes? Examining the 2010 randomized introduction of a program in Tanzania, this paper finds nuanced impacts. An initial surge in clinic visits after 1.5 years—due to more visits by those already complying with program health conditions and by non-compliers—disappeared after 2.5 years, largely due to compliers reducing above-minimal visits. The study finds significant increases in take-up of health insurance and the likelihood of seeking treatment when ill. Health improvements were concentrated among children ages 0–5 years rather than the elderly, and took time to materialize; the study finds no improvements after 1.5 years, but 0.76 fewer sick days per month after 2.5 years, suggesting the importance of looking beyond short-term impacts. Reductions in sick days were largest in villages with more baseline health workers per capita, consistent with improvements being sensitive to capacity constraints. These results are robust to adjustments for multiple hypothesis testing.

This is a deep analysis of the health investments and impacts stemming from cash transfers in Tanzania. Here are some other resources from the same experiment:

  1. An open access working paper version of the attached paper is available here (which is substantively the same as the published version), and a summary blog post is here.
  2. A broader analysis of program impacts (beyond health) is available here. A quick summary of those results is available here.
  3. All of the data from the Tanzania community-based conditional cash transfer evaluation are available here.

Why scaling up fails

Scaling up a successful small-scale program involves changes to the program. In some cases, that includes a shift in providers, from individuals employed by a private agency to civil servants. This could have both a quality effect (moving from specialists to generalist civil servants) and a motivation effect — both intrinsic, since specialist agencies might employ people who care more, and extrinsic, since non-government agencies might find it easier to fire people.

Lisa Cameron and Manisha Shah have a new paper that examines the scale-up of a sanitation program in Indonesia. From the abstract:

This paper evaluates the effectiveness of a widely used sanitation intervention, Community-Led Total Sanitation (CLTS), using a randomized controlled trial. The intervention was implemented at scale across rural East Java in Indonesia. CLTS increases toilet construction, reduces roundworm infestations, and decreases community tolerance of open defecation… We also examine the program’s scale up process which included local governments taking over implementation of CLTS from professional resource agencies. The results suggest that all of the sanitation and health benefits accrue from villages where resource agencies implemented the program, while local government implementation produced no discernible benefits. [emphasis added]

Okay, so when the government took over, the program didn’t work. Why? They explore a number of mechanisms. The data suggest that the problem is NOT the quality of the facilitators: 

In the field one hears a lot about the importance of the “quality” of the facilitator. In order to test whether the RA facilitators are “better” than the LG facilitators, we collected information from respondents on their perceptions of how charismatic/persuasive the facil- itators were… There is no significant difference in the average reported persuasiveness of the facilitators.

What else?

The intensity of implementation is…greater in RA villages (driven by facilitators making more visits).

More people had heard about the program in RA villages, and “RAs appear to be more effective at reducing tolerance to open defamation.”
This points to motivation rather than quality, in this particular case.

The paper reminds me of Kerwin & Thornton’s work on teacher training in Uganda, in which a “full” version of the program had large impacts on student learning, whereas a lower cost version used government employees (whose job is to train teachers) as well as fewer materials: 

A cost-effectiveness comparison of the two programs reveals the low-cost version to be slightly more cost-effective than the full-cost one… However, focusing on the “headline” measure of letter name knowledge hides significant drawbacks to the low-cost version of the program: the cost-effectiveness result is reversed when considering the overall reading score index, and the low-cost version of the program causes a small (but statistically-insignificant) decline in students’ English speaking ability… Most concerningly, the low-cost program causes large and statistically-significant reductions in several aspects of writing ability – of about 0.3 SDs – relative to the control group. In contrast, the full-cost version of the program improves writing scores across the board, with the effects on several exam components being statistically significant.

In that case, reduced inputs may also play a role. But it is potentially additional evidence that a shift in implementer can have a major impact. 

Bold et al. find something related when examining the scale-up of a contract teacher intervention in Kenya: 

What constraints arise when translating successful NGO programs to improve public services in developing countries into government policy? We report on a randomized trial embedded within a nationwide reform to teacher hiring in Kenyan government primary schools. New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher charac- teristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms.
[emphasis added]

Last week, on the World Bank’s Development Impact blog, I wrote about an experience with scaling up an education pilot in Kenya where the pilot was explicitly implemented using existing government programs, where government actors are playing roles already included in their job descriptions. The pilot was effective, and results on the scale up come in later this year. Fingers crossed.

What do researchers owe their participants?

Back in the 1990s, a group of sex workers in Nairobi, Kenya, seemed to be immune to HIV. Scientists have drawn repeatedly on that group in ongoing efforts to develop a vaccine. Today I read a news piece by Melanie Gosling, reporting that, “Nairobi sex workers…want to establish a code of conduct for researchers in an attempt to get some benefit from the decades of studies they have taken part in.”

It highlights the tension in research, which principally benefits people other than those providing data (whether through words or blood samples). One researcher quoted is Gosling’s piece put it bluntly: “Researchers are not there to solve their problems.”

And yet, those providing data don’t always have a clear understanding of that. As historian Melissa Graboyes has said, “People regularly mistake the idea that they are participating in an experiment and it’s designed to benefit them personally, rather than the experiment is designed to generate data that can be used to answer important questions and hopefully get us closer to solving some important problems.”

That’s consistent with a quote from the piece on Kenyan sex workers:

A male sex worker, who wanted to be known as Jonathan, said one of several problems they had with the research was that the consent forms were difficult to understand.
“So you sign contracts you don’t understand. Then maybe they give you drugs that make you drowsy or dizzy, but you didn’t know this,” he said.

Part of the solution to that immediate problem of understanding is simple, clear consent forms. As Rachel Glennerster, Executive Director of J-PAL, tweeted: “I favor clear simple forms, not long complex ones that U.S. health regulations require” (edited slightly to translate from Twitter-ese).

But this points to more than that. Participants in studies provide data. Hopefully that data benefits someone besides the researchers involved. Hopefully it contributes to better policies. But chances are that it won’t benefit the participant directly, at least not much. An obvious solution is to pay participants in some form. But how do you identify the right amount?

Some researchers ask her [one of the sex workers] for help, such as “mobilise five or ten ladies and we will pay you”. After spending time doing this, she is given “a small token, maybe enough to find food to eat that day. But we are making their work easy”.

The alternative that Nairobi’s sex workers are proposing is a novel one: Organize. Then they can collectively negotiate what they see as appropriate compensation for participating in research.

How do you believe that researchers should benefit their research participants?

How development economists think about development vs how other economists think about development?

In a recent EconTalk episode, Russ Roberts interviews Chris Blattman about his experiment with Stefan Dercon on sweatshops in Ethiopia. 

This exchange amused me.

BLATTMAN: Getting a bad shock when you’re poor means–

ROBERTS: Death.

BLATTMAN: –can mean really terrible things. For these guys, not death. If you have a Grade 8 education in Ethiopia and you have a family that can support you, they’re outside option in the end is living at home and not having anything to do and not being able to contribute to the family, not having any spending money, and maybe having a harder time finding a husband or a wife. Maybe also bad things happen in the household. Maybe you’re contributing to your younger brother going to private school. But these people are not on the margins of death. This isn’t who sweatshops are hiring, at least in this case.

This is NOT to critique the great work that Russ Roberts does on the EconTalk podcast.

But it’s a reminder that many choices in developing countries are not about life or death, but that doesn’t mean they don’t have major implications for human well-being.