If you want School Report Cards to improve learning, try sharing results on the whole local education market

Over at Let’s Talk Development, I give my take on an interesting new study using school report cards.

Better information to improve service delivery: New evidence

Countries around the world have experimented with “school report cards”: providing parents with information about the quality of their school so that they can demand higher quality service for their children. The results have been mixed. Andrabi, Das, and Khwaja bring a significant contribution to that literature in last month’s American Economic Review with their article, “Report Cards: The Impact of Providing School and Child Test Scores on Educational Markets.”

Here’s the abstract: “We study the impact of providing school report cards with test scores on subsequent test scores, prices, and enrollment in markets with multiple public and private providers. A randomly selected half of our sample villages (markets) received report cards. This increased test scores by 0.11 standard deviations, decreased private school fees by 17 percent, and increased primary enrollment by 4.5 percent. Heterogeneity in the treatment impact by initial school test scores is consistent with canonical models of asymmetric information. Information provision facilitates better comparisons across providers, and improves market efficiency and child welfare through higher test scores, higher enrollment, and lower fees.”

Read my take at the original post!


Even if technology improves literacy, is it worth the cost?

Ben Piper reports on insightful work that he and co-authors have done comparing various education technology intervention in Kenya in terms of both effectiveness (do they improve reading ability?) and the cost-effectiveness (what’s the cost per reading gain?).

I recommend his full post (or the research paper it’s based on). Here are a couple of highlights:

When compared to traditional literacy programs, the more intensive ICT interventions did not produce large enough gains in learning outcomes to justify the cost. This is not to say that each of the ICT interventions did not produce improvements in students’ reading ability…. [But] the cost-effectiveness of all of these programs might still be significantly lower than a clear investment in high quality literacy programs…. In additional to monetary cost, an opportunity cost existed…. Many of the teachers, tutors, and students lacked exposure to technology and the time and energy spent on learning how to use the technology reduced the amount of time for instructional improvement activities. 

When costs are considered, there are non-ICT interventions that could have larger impacts on learning outcomes with reduced costs; one such option could include assigning the best teachers to the first grade when children are learning how to read, rather than to the end of primary school as many schools do.

Economists will disagree with the standard errors if I understand the specification right: Randomization is at the district level and I don’t believe the authors cluster the standard errors. 

But I don’t think that will change the fundamental message here: Even if there are some gains from education technology, we have to ask when they will be most likely to be worth the cost.

How many times do you have to test a program before you’re confident it will work somewhere else?

I heard this question at an impact evaluation training event a few weeks ago. I’ve heard some variation on it many times. Wouldn’t it be grand if there were a magic number? “5 times. If it works 5 times, it will work anywhere.” Alas, ’tis not so.

But Mary Ann Bates and Rachel Glennerster have a good answer in their new essay in the Stanford Social Innovation Review:

Must an identical program or policy be replicated a specific number of times before it is scaled up? One of the most common questions we get asked is how many times a study needs to be replicated in different contexts before a decision maker can rely on evidence from other contexts. We think this is the wrong way to think about evidence. There are examples of the same program being tested at multiple sites: For example, a coordinated set of seven randomized trials of an intensive graduation program to support the ultra-poor in seven countries found positive impacts in the majority of cases. This type of evidence should be weighted highly in our decision making. But if we only draw on results from studies that have been replicated many times, we throw away a lot of potentially relevant information.

Read the whole essay or my blog post on other aspects of the essay.

Researchers as healers or witches?

“A researcher [mtafiti] is an important person because he indeed is the one who discovers everything [anayegundua kila kitu].” – Mzee Thomas Inyassi

Melissa Graboyes describes how research participants in Tanzania see the medical researchers who come to them for samples and information. On the one hand, “East Africans noted the similarity between researchers and doctors: they both gave out medicine and helped the sick recover.” On the other hand…

As healers and witches are understood to rely on the same skills, once researchers were compared with healers, it was not such a stretch to compare them to witches. … Witch doctors often work at night and want blood. … Researchers also worked at night, collecting blood samples by going door to door or collecting night-biting mosquitos by walking around in the bush. For both witches and researchers, blood was valued above all other substances and its use was shrouded in secrecy.

This, from Graboyes’ intriguing book The Experiment Must Continue: Medical Research and Ethics in East Africa, 1940-2014.

Lest you think this is limited only to medical research, consider the following passage from Kremer, Miguel, and Thornton’s randomized evaluation of a girls’ scholarship program in western Kenya:

There is also a tradition of suspicion of outsiders in Teso, and this has at times led to misunderstandings with NGOs there. A government report noted that indigenous religious beliefs, traditional taboos, and witchcraft practices remain stronger in Teso than in Busia (Were, 1986).

Events that occurred during the study period appear to have interacted in an adverse way with these preexisting factors in Teso district. In June 2001 lightning struck and severely damaged a Teso primary school, killing 7 students and injuring 27 others. Although that school was not in the scholarship program, the NGO had been involved with another assistance program there. Some community members associated the lightning strike with the NGO, and this appears to have led some schools to pull out of the girls’ scholarship program. Of 58 Teso sample schools, 5 pulled out immediately following the lightning strike, as did a school located in Busia with a substantial ethnic Teso population. (Moreover, one girl in Teso who won the ICS scholarship in 2001 later refused the scholarship award, reportedly because of negative views toward the NGO.)

Witches or healers?

One take away from this is that researchers need to do more to make sure participants understand what they are participating in.

The Promise of Teacher Coaching and Peril of Going to Scale

This, from Matt Barnum’s review of a recent meta-analysis on teacher coaching, by Kraft, Hogan, and Blazar.

First, what is coaching?

Teacher coaching involves “instructional experts work[ing] with teachers to discuss classroom practice in a way that is” individualized, intensive, sustained, context-specific, and focused.

What’s the finding?

“We find large positive effects of coaching on teachers’ instructional practice,” the authors write… Similarly, the 21 papers that looked at student achievement found notable positive results, on average.

But before you get too excited

When the research examines large-scale programs (with more than 100 teachers involved), the benefits, relative to small coaching initiatives, are cut roughly in half. 

Read the whole article or the meta-analysis itself.

Cash Transfers and Health: Evidence from Tanzania

My paper, “Cash Transfers and Health: Evidence from Tanzania” (with Holtemeyer & Kosec of IFPRI) has been published in the World Bank Economic Review. The paper is attached. Here is the abstract:

How do cash transfers conditioned on health clinic visits and school attendance impact health-related outcomes? Examining the 2010 randomized introduction of a program in Tanzania, this paper finds nuanced impacts. An initial surge in clinic visits after 1.5 years—due to more visits by those already complying with program health conditions and by non-compliers—disappeared after 2.5 years, largely due to compliers reducing above-minimal visits. The study finds significant increases in take-up of health insurance and the likelihood of seeking treatment when ill. Health improvements were concentrated among children ages 0–5 years rather than the elderly, and took time to materialize; the study finds no improvements after 1.5 years, but 0.76 fewer sick days per month after 2.5 years, suggesting the importance of looking beyond short-term impacts. Reductions in sick days were largest in villages with more baseline health workers per capita, consistent with improvements being sensitive to capacity constraints. These results are robust to adjustments for multiple hypothesis testing.

This is a deep analysis of the health investments and impacts stemming from cash transfers in Tanzania. Here are some other resources from the same experiment:

  1. An open access working paper version of the attached paper is available here (which is substantively the same as the published version), and a summary blog post is here.
  2. A broader analysis of program impacts (beyond health) is available here. A quick summary of those results is available here.
  3. All of the data from the Tanzania community-based conditional cash transfer evaluation are available here.

Why scaling up fails

Scaling up a successful small-scale program involves changes to the program. In some cases, that includes a shift in providers, from individuals employed by a private agency to civil servants. This could have both a quality effect (moving from specialists to generalist civil servants) and a motivation effect — both intrinsic, since specialist agencies might employ people who care more, and extrinsic, since non-government agencies might find it easier to fire people.

Lisa Cameron and Manisha Shah have a new paper that examines the scale-up of a sanitation program in Indonesia. From the abstract:

This paper evaluates the effectiveness of a widely used sanitation intervention, Community-Led Total Sanitation (CLTS), using a randomized controlled trial. The intervention was implemented at scale across rural East Java in Indonesia. CLTS increases toilet construction, reduces roundworm infestations, and decreases community tolerance of open defecation… We also examine the program’s scale up process which included local governments taking over implementation of CLTS from professional resource agencies. The results suggest that all of the sanitation and health benefits accrue from villages where resource agencies implemented the program, while local government implementation produced no discernible benefits. [emphasis added]

Okay, so when the government took over, the program didn’t work. Why? They explore a number of mechanisms. The data suggest that the problem is NOT the quality of the facilitators: 

In the field one hears a lot about the importance of the “quality” of the facilitator. In order to test whether the RA facilitators are “better” than the LG facilitators, we collected information from respondents on their perceptions of how charismatic/persuasive the facil- itators were… There is no significant difference in the average reported persuasiveness of the facilitators.

What else?

The intensity of implementation is…greater in RA villages (driven by facilitators making more visits).

More people had heard about the program in RA villages, and “RAs appear to be more effective at reducing tolerance to open defamation.”
This points to motivation rather than quality, in this particular case.

The paper reminds me of Kerwin & Thornton’s work on teacher training in Uganda, in which a “full” version of the program had large impacts on student learning, whereas a lower cost version used government employees (whose job is to train teachers) as well as fewer materials: 

A cost-effectiveness comparison of the two programs reveals the low-cost version to be slightly more cost-effective than the full-cost one… However, focusing on the “headline” measure of letter name knowledge hides significant drawbacks to the low-cost version of the program: the cost-effectiveness result is reversed when considering the overall reading score index, and the low-cost version of the program causes a small (but statistically-insignificant) decline in students’ English speaking ability… Most concerningly, the low-cost program causes large and statistically-significant reductions in several aspects of writing ability – of about 0.3 SDs – relative to the control group. In contrast, the full-cost version of the program improves writing scores across the board, with the effects on several exam components being statistically significant.

In that case, reduced inputs may also play a role. But it is potentially additional evidence that a shift in implementer can have a major impact. 

Bold et al. find something related when examining the scale-up of a contract teacher intervention in Kenya: 

What constraints arise when translating successful NGO programs to improve public services in developing countries into government policy? We report on a randomized trial embedded within a nationwide reform to teacher hiring in Kenyan government primary schools. New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher charac- teristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms.
[emphasis added]

Last week, on the World Bank’s Development Impact blog, I wrote about an experience with scaling up an education pilot in Kenya where the pilot was explicitly implemented using existing government programs, where government actors are playing roles already included in their job descriptions. The pilot was effective, and results on the scale up come in later this year. Fingers crossed.