Even if technology improves literacy, is it worth the cost?

Ben Piper reports on insightful work that he and co-authors have done comparing various education technology intervention in Kenya in terms of both effectiveness (do they improve reading ability?) and the cost-effectiveness (what’s the cost per reading gain?).

I recommend his full post (or the research paper it’s based on). Here are a couple of highlights:

When compared to traditional literacy programs, the more intensive ICT interventions did not produce large enough gains in learning outcomes to justify the cost. This is not to say that each of the ICT interventions did not produce improvements in students’ reading ability…. [But] the cost-effectiveness of all of these programs might still be significantly lower than a clear investment in high quality literacy programs…. In additional to monetary cost, an opportunity cost existed…. Many of the teachers, tutors, and students lacked exposure to technology and the time and energy spent on learning how to use the technology reduced the amount of time for instructional improvement activities. 

When costs are considered, there are non-ICT interventions that could have larger impacts on learning outcomes with reduced costs; one such option could include assigning the best teachers to the first grade when children are learning how to read, rather than to the end of primary school as many schools do.

Economists will disagree with the standard errors if I understand the specification right: Randomization is at the district level and I don’t believe the authors cluster the standard errors. 

But I don’t think that will change the fundamental message here: Even if there are some gains from education technology, we have to ask when they will be most likely to be worth the cost.

Economic Development in 20 minutes to middle schoolers

Earlier this week I had 20 minutes each to speak to four classes of middle schoolers about my career. I talked about economic development. I used a presentation (available in full here). Given that it was the antepenultimate day of school, the students and teachers appeared very engaged.

  1. I showed the students four families from Gapminder’s Dollar Street initiative — one from India, one from Burundi, one from Ukraine, and one from Colombia, and I had students vote (by raised hands) on which family they expected was poorest and which was wealthiest.

slide 2

2. Then I introduced the four families in turn, and I expressed their monthly income in terms of the price of school lunches at the middle school where I was speaking. (The Ukrainian family’s monthly income was the equivalent of 3,100+ school lunches, more than the most insatiable student should ever consume.)

slide 3

3. Having highlighted the massive gaps in income between families, I invited the students to vote (again, by raised hands) on the number of people in the world currently in extreme poverty. Hint: According to the latest estimates from Cruz et al. (2015), we’re at 700 million.

slide 4

4. Then I showed — in two ways — how much poverty has decreased over time. First I showed the figure below from Our World in Data. I also showed the evolving chart on income and life expectancy from Gapminder. (Technical difficulties precluded showing the actual evolution over time, but at least I could show screenshots of the beginning and the end.)

slide 9

5. I then highlighted geographical concentrations of poverty.

slide 12

6. Then I gave two very simple definitions of economics:

  • Macro: Why are some countries rich and others poor
  • Micro: How can poor families get out of poverty?

7. What does economic growth look like? Here’s some of the variation, where countries on the top left are those that grew the most: low income in 1960, high income in 2014.

slide 13

8. I then invited the students to suggest what makes countries grow. We talked about a few possibilities.

slide 15

9. We then returned to the growth map and differentiated between two high-growth countries: South Korea, which produces goods for trade (I had all the students with Samsung devices raise their hands) versus Equatorial Guinea, which produces a natural resource for trade (oil). We talked about the different implications for inequality.

10. I then talked through the two objectives of the World Bank: to encourage growth and to end extreme poverty. (To be more precise, the “twin goals” are to encourage “shared prosperity” — growth that benefits the bottom 40% of the population — and to end extreme poverty.)

11. Then, since education is an area I work in actively, I highlighted the relationship between learning and economic growth, using data from Hanushek et al. (2008).

slide 18a

12. I then asked how many of the 7th graders could read a sentence: All of them claimed that ability. I then showed data from the Early Grade Reading Barometer on the percentage of 2nd graders in various countries who couldn’t read a single word, which of course predicts future literacy.

13. I talked about what I do specifically, with a few examples (including a few funny stories).

slide 20

14. And finally, I talked them through how I got to my current job and reminded them that it’s not just economists working in international development.

slide 21

Many thanks to all those who gave suggestions. I used several of them and would have enjoyed using others if I’d had more time (either to prepare or with the classes).

The next day, I received a number of thank you notes from students. This one took the cake.

thank you

To improve educational outcomes, help households to smooth consumption throughout the year

A new paper by Paul Christian and Brian Dillon poses this question: “Does a consistently seasonal diet during childhood have long-run effects on human capital formation?” They use Tanzania’s Kagera Health and Development Survey — a 19-year panel survey — to answer the question. As you can see from the figure below, Tanzania has dramatic seasonality: Children have very different access to food in some parts of the year than in others.

histogram of seasonality

Christian and Dillon develop a structural model — which you can read all about in the paper — and use the household data to estimate it.

Here is a taste of the results:

We find a robust, negative relationship between consumption seasonality and human capital formation. Across specifications, the negative relationship between seasonality and human capital is 30-60% of the magnitude of the positive relationship between average consumption and human capital (in the same units). … The effects of seasonality on height is greatest for children in utero and during infancy, during the critical first 1,000 days of life. Effects on education are most pronounced for older children, suggesting that behavioral channels such as dropping out of school to help on the farm are more important in this sample than early life impacts on cognitive performance. When we further allow for heterogeneity by both age and gender, we see that the height effects during infancy are concentrated among girls, while the education effects during adolescence are largely driven by boys.

 

How many times do you have to test a program before you’re confident it will work somewhere else?

I heard this question at an impact evaluation training event a few weeks ago. I’ve heard some variation on it many times. Wouldn’t it be grand if there were a magic number? “5 times. If it works 5 times, it will work anywhere.” Alas, ’tis not so.

But Mary Ann Bates and Rachel Glennerster have a good answer in their new essay in the Stanford Social Innovation Review:

Must an identical program or policy be replicated a specific number of times before it is scaled up? One of the most common questions we get asked is how many times a study needs to be replicated in different contexts before a decision maker can rely on evidence from other contexts. We think this is the wrong way to think about evidence. There are examples of the same program being tested at multiple sites: For example, a coordinated set of seven randomized trials of an intensive graduation program to support the ultra-poor in seven countries found positive impacts in the majority of cases. This type of evidence should be weighted highly in our decision making. But if we only draw on results from studies that have been replicated many times, we throw away a lot of potentially relevant information.

Read the whole essay or my blog post on other aspects of the essay.

Researchers as healers or witches?

“A researcher [mtafiti] is an important person because he indeed is the one who discovers everything [anayegundua kila kitu].” – Mzee Thomas Inyassi

Melissa Graboyes describes how research participants in Tanzania see the medical researchers who come to them for samples and information. On the one hand, “East Africans noted the similarity between researchers and doctors: they both gave out medicine and helped the sick recover.” On the other hand…

As healers and witches are understood to rely on the same skills, once researchers were compared with healers, it was not such a stretch to compare them to witches. … Witch doctors often work at night and want blood. … Researchers also worked at night, collecting blood samples by going door to door or collecting night-biting mosquitos by walking around in the bush. For both witches and researchers, blood was valued above all other substances and its use was shrouded in secrecy.

This, from Graboyes’ intriguing book The Experiment Must Continue: Medical Research and Ethics in East Africa, 1940-2014.

Lest you think this is limited only to medical research, consider the following passage from Kremer, Miguel, and Thornton’s randomized evaluation of a girls’ scholarship program in western Kenya:

There is also a tradition of suspicion of outsiders in Teso, and this has at times led to misunderstandings with NGOs there. A government report noted that indigenous religious beliefs, traditional taboos, and witchcraft practices remain stronger in Teso than in Busia (Were, 1986).

Events that occurred during the study period appear to have interacted in an adverse way with these preexisting factors in Teso district. In June 2001 lightning struck and severely damaged a Teso primary school, killing 7 students and injuring 27 others. Although that school was not in the scholarship program, the NGO had been involved with another assistance program there. Some community members associated the lightning strike with the NGO, and this appears to have led some schools to pull out of the girls’ scholarship program. Of 58 Teso sample schools, 5 pulled out immediately following the lightning strike, as did a school located in Busia with a substantial ethnic Teso population. (Moreover, one girl in Teso who won the ICS scholarship in 2001 later refused the scholarship award, reportedly because of negative views toward the NGO.)

Witches or healers?

One take away from this is that researchers need to do more to make sure participants understand what they are participating in.

Nothing new under the sun — Education edition

Some years ago, I was evaluating an education program in The Gambia (read the evaluation here), soon after the government had outlawed corporal punishment in schools. We included a question about it in the evaluation and learned that there was a gap between legislation and practice, which the government then sought to resolve. 

Of course, controversy over corporal punishment in schools isn’t new, but I was surprised to see it debated in Noli me tangere, the 1887 novel by Filipino writer José Rizal. A frustrated schoolteacher recounts that after reading several books, his views changed: 

Lashings, for example, which since time immemorial had been the province of schools and which before I had seen as the only effective way to make children learn (that is how they have accustomed us to believe), began to seem far removed from contributing to a child’s progress, completely useless. I became convinced that when one keeps the switch or the rod in view reasoning is impossible… I began to think that the best thing I could do for these children was to develop confidence, security, and self-esteem.

So he eliminates corporal punishment. 

Little by little I held back the switch. I took the whips home and replaced them with emulation and belief in oneself.

Like any good experimenter, he evaluated short and medium run impacts.

In the beginning it seemed as though my method was impractical: a lot of them stopped studying altogether. But I pressed on, and I noticed that little by little their spirits rose. More students attended class, and more often. And when one day one was praised in front of everyone, the following day he learned twice as much.

But the local priest and the parents didn’t buy it and demanded he return to the traditional system.

I had to renounce a system that after a great deal of effort had begun to bear fruit.

Poor guy, but I imagine there are a number of reformers today who can feel his pain. 

The quotes are from Harold Augenbraum’s translation of the book.

The Promise of Teacher Coaching and Peril of Going to Scale

This, from Matt Barnum’s review of a recent meta-analysis on teacher coaching, by Kraft, Hogan, and Blazar.

First, what is coaching?

Teacher coaching involves “instructional experts work[ing] with teachers to discuss classroom practice in a way that is” individualized, intensive, sustained, context-specific, and focused.

What’s the finding?

“We find large positive effects of coaching on teachers’ instructional practice,” the authors write… Similarly, the 21 papers that looked at student achievement found notable positive results, on average.

But before you get too excited

When the research examines large-scale programs (with more than 100 teachers involved), the benefits, relative to small coaching initiatives, are cut roughly in half. 

Read the whole article or the meta-analysis itself.

Cash Transfers and Health: Evidence from Tanzania

My paper, “Cash Transfers and Health: Evidence from Tanzania” (with Holtemeyer & Kosec of IFPRI) has been published in the World Bank Economic Review. The paper is attached. Here is the abstract:

How do cash transfers conditioned on health clinic visits and school attendance impact health-related outcomes? Examining the 2010 randomized introduction of a program in Tanzania, this paper finds nuanced impacts. An initial surge in clinic visits after 1.5 years—due to more visits by those already complying with program health conditions and by non-compliers—disappeared after 2.5 years, largely due to compliers reducing above-minimal visits. The study finds significant increases in take-up of health insurance and the likelihood of seeking treatment when ill. Health improvements were concentrated among children ages 0–5 years rather than the elderly, and took time to materialize; the study finds no improvements after 1.5 years, but 0.76 fewer sick days per month after 2.5 years, suggesting the importance of looking beyond short-term impacts. Reductions in sick days were largest in villages with more baseline health workers per capita, consistent with improvements being sensitive to capacity constraints. These results are robust to adjustments for multiple hypothesis testing.

This is a deep analysis of the health investments and impacts stemming from cash transfers in Tanzania. Here are some other resources from the same experiment:

  1. An open access working paper version of the attached paper is available here (which is substantively the same as the published version), and a summary blog post is here.
  2. A broader analysis of program impacts (beyond health) is available here. A quick summary of those results is available here.
  3. All of the data from the Tanzania community-based conditional cash transfer evaluation are available here.

Why scaling up fails

Scaling up a successful small-scale program involves changes to the program. In some cases, that includes a shift in providers, from individuals employed by a private agency to civil servants. This could have both a quality effect (moving from specialists to generalist civil servants) and a motivation effect — both intrinsic, since specialist agencies might employ people who care more, and extrinsic, since non-government agencies might find it easier to fire people.

Lisa Cameron and Manisha Shah have a new paper that examines the scale-up of a sanitation program in Indonesia. From the abstract:

This paper evaluates the effectiveness of a widely used sanitation intervention, Community-Led Total Sanitation (CLTS), using a randomized controlled trial. The intervention was implemented at scale across rural East Java in Indonesia. CLTS increases toilet construction, reduces roundworm infestations, and decreases community tolerance of open defecation… We also examine the program’s scale up process which included local governments taking over implementation of CLTS from professional resource agencies. The results suggest that all of the sanitation and health benefits accrue from villages where resource agencies implemented the program, while local government implementation produced no discernible benefits. [emphasis added]

Okay, so when the government took over, the program didn’t work. Why? They explore a number of mechanisms. The data suggest that the problem is NOT the quality of the facilitators: 

In the field one hears a lot about the importance of the “quality” of the facilitator. In order to test whether the RA facilitators are “better” than the LG facilitators, we collected information from respondents on their perceptions of how charismatic/persuasive the facil- itators were… There is no significant difference in the average reported persuasiveness of the facilitators.

What else?

The intensity of implementation is…greater in RA villages (driven by facilitators making more visits).

More people had heard about the program in RA villages, and “RAs appear to be more effective at reducing tolerance to open defamation.”
This points to motivation rather than quality, in this particular case.

The paper reminds me of Kerwin & Thornton’s work on teacher training in Uganda, in which a “full” version of the program had large impacts on student learning, whereas a lower cost version used government employees (whose job is to train teachers) as well as fewer materials: 

A cost-effectiveness comparison of the two programs reveals the low-cost version to be slightly more cost-effective than the full-cost one… However, focusing on the “headline” measure of letter name knowledge hides significant drawbacks to the low-cost version of the program: the cost-effectiveness result is reversed when considering the overall reading score index, and the low-cost version of the program causes a small (but statistically-insignificant) decline in students’ English speaking ability… Most concerningly, the low-cost program causes large and statistically-significant reductions in several aspects of writing ability – of about 0.3 SDs – relative to the control group. In contrast, the full-cost version of the program improves writing scores across the board, with the effects on several exam components being statistically significant.

In that case, reduced inputs may also play a role. But it is potentially additional evidence that a shift in implementer can have a major impact. 

Bold et al. find something related when examining the scale-up of a contract teacher intervention in Kenya: 

What constraints arise when translating successful NGO programs to improve public services in developing countries into government policy? We report on a randomized trial embedded within a nationwide reform to teacher hiring in Kenyan government primary schools. New teachers offered a fixed-term contract by an international NGO significantly raised student test scores, while teachers offered identical contracts by the Kenyan government produced zero impact. Observable differences in teacher charac- teristics explain little of this gap. Instead, data suggests that bureaucratic and political opposition to the contract reform led to implementation delays and a differential interpretation of identical contract terms.
[emphasis added]

Last week, on the World Bank’s Development Impact blog, I wrote about an experience with scaling up an education pilot in Kenya where the pilot was explicitly implemented using existing government programs, where government actors are playing roles already included in their job descriptions. The pilot was effective, and results on the scale up come in later this year. Fingers crossed.