Advice for impact evaluations with government: Drop the baseline

karthik2

This was a case where we did randomization without a baseline. I highly recommend this when you’re working with a government because the biggest risk is implementation failure. You’ll spend a lot of time doing the baseline – spend time, spend money – and have the intervention not be implemented. So when you’re working with the government, it’s better to get power by doubling the sample of your endline and just randomized with administrative data so you’ll get the same amount of power but you reduce risk up front.

That is Karthik Muralitharan speaking at the RISE conference today. Of course, he didn’t have to say that doubling your sample also increases your likelihood of randomization resulting in balanced intervention and comparison groups, thus making a baseline less necessary.

Update: This prompted a very active discussion on Twitter, which you can read in full here. Below are a few points.

Ultimately, there are a number of factors to consider — the potential sample size, the probability of implementation failure, the importance of baseline covariates for your analysis. But still, where there is serious concern that the program may not be implemented as expected — and especially if there are decent administrative data — it’s worth consideration. I’ll give the penultimate words to Karthik’s co-author, Abhijeet Singh.

First, responding to Pauline and Andrew’s point.

Second, to Cyrus and Seema’s points.

And the last word to Karthik himself.

 

There are many more comments, but I won’t embed them all here. You can read the full conversation here on Twitter.

Activity for teaching regression discontinuity design

One method for evaluating the impact of a program is regression discontinuity design (RDD). This works when an intervention (to be evaluated) is assigned based on a score of some sort. For example, a welfare program that is assigned for all households below a certain level of income, or an education program that is assigned to all students above a certain test score (or below a certain test score). In short, this method compares individuals who are just above and below the cut-off for assignment to the program, since they are very similar (except for 1 or 2 points on a test or a small amount of income). You then adjust for those small differences statistically, but the intuition is that you’re comparing people who are very similar, except that one group gets the intervention.

When I teach impact evaluation, an activity where students get up and move around can be helpful for more at least two reasons: It can make a point visually, and it can keep people from falling asleep. Here’s an activity I came up with for demonstrating the concept behind RDD, and it has worked pretty well.

Tell the students that we are evaluating the impact of an injection that is supposed to increase the height of recipients. Every participant under a certain height will receive the injection. How can we evaluate it?

Have all the students line up in a row by height. (With a big class, use a subset of students.) Pick a couple of students of similar (but not identical) height in the middle and explain that this is the height cut-off. The shorter student on the left will receive the injection, and the taller student on the right will not.

Now, if we were to compare the height of the tallest student (to the right of the group) and the shortest student (to the left of the group), we wouldn’t have a good sense of the impact of the injection, since their heights are already so different. But if we compare those who are just below the qualifying height (getting the injection) to those just above (not getting the injection), then differences we observe are likely to be due to the injection.

I’ve done this activity with adults in more than one country, and it’s been effective and fun.

Any ideas for how you’d make this activity better? What activities do you use to teach impact evaluation methods?

The image at the top of this post is from Impact Evaluation in Practice (Second Edition)

How does lowering the cost of schooling in early years affect later attainment?

School Costs, Short-Run Participation, and Long-Run Outcomes: Evidence from Kenya”: My paper with Mũthoni Ngatia is out as a World Bank Policy Research Working Paper. Here’s what we learned.

uniforms abstract

Even though primary education is “free” in many countries, families face many incidental expenses: uniforms, transport, and materials, among others.

cost

In Kenya, we worked with an NGO that provided free school uniforms to children to reduce the cost of schooling.

uniform

I know that you’re going to say: Do we need another study of “giving stuff” for education and how it affects attendance? Aren’t we supposed to be focused on learning and pedagogy?

learning

First, while attending school is no guarantee of learning, it’s a really important part of the process.

school

Second, we follow these students over 8 years. Few international education studies trace the time path of impact.

clock.gif

A school uniform can increase school participation by multiple means. Families don’t have to pay for the uniforms. AND students don’t feel stigmatized by being the only kid without a uniform.

duck

What do we find? In the short run, providing a school uniform does increase school participation.

yay

The impacts are particularly large for the poorest kids. Absenteeism drops by 15 percentage points for them, eliminating 55 percent of absenteeism for them.

yay2

But 8 years later, the children who participated in the program had no better educational outcomes than those who did not.

tear

Some educational interventions have long-lasting impacts: Smaller early-grade classes in the USA have translated into better college performance.

college

But we can’t assume it. In this case, initial gains in school participation do not translate into more school completion.

assume

And a few last words from the paper: “Take care when interpreting short-term results, taking into account these results and others which demonstrate that long-term impacts may vary – sometimes dramatically – from initial effects.”

changes

“Gathering long-term data is costly, but without it, the trajectory of impacts resulting from the wide range of interventions currently being implemented remains a mystery.”

batman

That’s it! Big short-term impacts for poor kids but disappointing long-term impacts. Check out the paper!

thank you

 

Can microcredit be profitable?

A new study by Burke, Bergquist, and Miguel suggests that it can. Not only that: it delivers positive spillovers. I write about it over at Let’s Talk Development.

Microcredit that helps more than just the borrower

Prices in African agricultural markets fluctuate a lot: “Grain prices in major markets regularly” rise “by 25-40% between the harvest and lean seasons, and often more than 50% in more isolated markets.” To an economist, this looks like a massive missed opportunity: Why don’t farmers just hold onto their harvested grain and sell at a much higher price during the lean season?

According to new work by researchers Burke, Bergquist, and Miguel, farmers in Kenya lack access to credit or savings opportunities, and so they “report selling their grain at low post-harvest prices to meet urgent cash needs (e.g., to pay school fees). To meet consumption needs later in the year, many then end up buying back grain from the market a few months after selling it.” It’s like the grain market is a very expensive source of short-term loans.

Can microcredit help? Offering farmers a loan at harvest led them to sell less at harvest time and to sell more grain later, when prices were higher. “The loan produces a return on investment of 28% over a roughly nine month period.”

Read more…

Does increasing teacher salaries improve student test scores?

Over at Development Impact, I just posted a piece — What do we learn from increasing teacher salaries in Indonesia? More than the students did — where I discuss recent work by de Ree et al. on an impressive policy experiment, where Indonesia doubled base pay for many civil service teachers.

Here’s the abstract of their paper:

How does a large unconditional increase in salary affect the performance of incumbent employees in the public sector? We present experimental evidence on this question in the context of a policy change in Indonesia that led to a permanent doubling of teacher base salaries. Using a large-scale randomized experiment across a representative sample of Indonesian schools that accelerated this pay increase for teachers in treated schools, we find that the large pay increase significantly improved teachers’ satisfaction with their income, reduced the incidence of teachers holding outside jobs, and reduced self-reported financial stress. Nevertheless, after two and three years, the increase in pay led to no improvement in student learning outcomes. The effects are precisely estimated, and we can rule out even modest positive impacts on test scores. Our results suggest that unconditional pay increases are unlikely to be an effective policy option for improving the effort and productivity of incumbent employees in public-sector settings.

 

 

If you want School Report Cards to improve learning, try sharing results on the whole local education market

Over at Let’s Talk Development, I give my take on an interesting new study using school report cards.

Better information to improve service delivery: New evidence

Countries around the world have experimented with “school report cards”: providing parents with information about the quality of their school so that they can demand higher quality service for their children. The results have been mixed. Andrabi, Das, and Khwaja bring a significant contribution to that literature in last month’s American Economic Review with their article, “Report Cards: The Impact of Providing School and Child Test Scores on Educational Markets.”

Here’s the abstract: “We study the impact of providing school report cards with test scores on subsequent test scores, prices, and enrollment in markets with multiple public and private providers. A randomly selected half of our sample villages (markets) received report cards. This increased test scores by 0.11 standard deviations, decreased private school fees by 17 percent, and increased primary enrollment by 4.5 percent. Heterogeneity in the treatment impact by initial school test scores is consistent with canonical models of asymmetric information. Information provision facilitates better comparisons across providers, and improves market efficiency and child welfare through higher test scores, higher enrollment, and lower fees.”

Read my take at the original post!

Even if technology improves literacy, is it worth the cost?

Ben Piper reports on insightful work that he and co-authors have done comparing various education technology intervention in Kenya in terms of both effectiveness (do they improve reading ability?) and the cost-effectiveness (what’s the cost per reading gain?).

I recommend his full post (or the research paper it’s based on). Here are a couple of highlights:

When compared to traditional literacy programs, the more intensive ICT interventions did not produce large enough gains in learning outcomes to justify the cost. This is not to say that each of the ICT interventions did not produce improvements in students’ reading ability…. [But] the cost-effectiveness of all of these programs might still be significantly lower than a clear investment in high quality literacy programs…. In additional to monetary cost, an opportunity cost existed…. Many of the teachers, tutors, and students lacked exposure to technology and the time and energy spent on learning how to use the technology reduced the amount of time for instructional improvement activities. 

When costs are considered, there are non-ICT interventions that could have larger impacts on learning outcomes with reduced costs; one such option could include assigning the best teachers to the first grade when children are learning how to read, rather than to the end of primary school as many schools do.

Economists will disagree with the standard errors if I understand the specification right: Randomization is at the district level and I don’t believe the authors cluster the standard errors. 

But I don’t think that will change the fundamental message here: Even if there are some gains from education technology, we have to ask when they will be most likely to be worth the cost.

How many times do you have to test a program before you’re confident it will work somewhere else?

I heard this question at an impact evaluation training event a few weeks ago. I’ve heard some variation on it many times. Wouldn’t it be grand if there were a magic number? “5 times. If it works 5 times, it will work anywhere.” Alas, ’tis not so.

But Mary Ann Bates and Rachel Glennerster have a good answer in their new essay in the Stanford Social Innovation Review:

Must an identical program or policy be replicated a specific number of times before it is scaled up? One of the most common questions we get asked is how many times a study needs to be replicated in different contexts before a decision maker can rely on evidence from other contexts. We think this is the wrong way to think about evidence. There are examples of the same program being tested at multiple sites: For example, a coordinated set of seven randomized trials of an intensive graduation program to support the ultra-poor in seven countries found positive impacts in the majority of cases. This type of evidence should be weighted highly in our decision making. But if we only draw on results from studies that have been replicated many times, we throw away a lot of potentially relevant information.

Read the whole essay or my blog post on other aspects of the essay.