3 questions to guide every experiment, from Dean Karlan of IPA

I always try to reduce things to three questions—even in simple interventions like passing out nutritional supplements to infants and toddlers. These are theory-driven questions that you should be using whenever you’re justifying any intervention whatsoever. 

And the questions are 

  1. “What’s the market failure? Why isn’t the market and the invisible hand working?”
  2. “How does this intervention specifically solve the market failure?”
  3. “What’s the welfare impact of solving the market failure?”

Karlan makes the point that experiments should absolutely be driven by theory. But theory doesn’t have to mean three pages of math. The theory can be simple, and the questions above sum up what your theory should predict.

from Tim Ogden’s interview with Dean Karlan in Experimental Conversations: Perspectives on Randomized Trials in Development Economics

The poor don’t use cash transfers on alcohol and tobacco. Really.

Two years ago, Anna Popova and I put out a working paper examining whether beneficiaries of cash transfer programs are more likely than others to spend money on alcohol and cigarettes (“temptation goods”). That paper has just been published, in the journal Economic Development and Cultural Change.

The findings of the published version do not vary from the working paper: Across continents, whether the programs have conditions or don’t, the result is the same. The poor don’t spend more on temptation goods. But for the published version, we complemented our vote count (where you sum up how many programs find a positive effect and how many find a negative effect) with a formal meta-analysis. You can see the forest plot below. (The results are not substantively different from the “vote count” review that we did in the working paper and maintain in the published version as a complement to the meta-analysis.)

meta-analysis

As you can see, while there are only two big negative effects, both from Nicaragua, most of the effects are slightly negative, and none of them are strongly positive. We do various checks to make sure that we’re not just picking up people telling surveyors what they want to hear, and we’re confident that cannot explain the consistent lack of impact across venues.

Why might there be a negative effect? After all, if people like alcohol, we might expect them to spend more on it when they have more money. We can’t say definitively, but even unconditional transfer programs almost always come with strong messaging: Recipients hear, again and again, that this money is for their family, that this money is to make their lives better, and so on and so on. We know from others areas of economics that labeling money has an effect (called the flypaper effect).

So you can be for cash transfers or against cash transfers, but don’t be against them because you think the poor will use the money on temptation goods. They won’t. To quote the last line of our paper, “We do have estimates from Peru that beneficiaries are more likely to purchase a roasted chicken at a restaurant or some chocolates soon after receiving their transfer (Dasso and Fernandez 2013), but hopefully even the most puritanical policy maker would not begrudge the poor a piece of chocolate.”

Has the war over RCTs been won?

Esther Duflo and Abhijit Banerjee of MIT and JPAL weigh in during an interview in Tim Ogden’s forthcoming book, Experimental Conversations, which I am enjoying thoroughly:

Esther: “I think it’s been completely won in that I think it’s just happening. A lot of people are doing it without us. It’s being used. I think it is now understood to be one of the tools. The argument within the economics profession [over the value of RCTs] had two main consequences, both good. First, it raised the profile. If something was debated, people began to believe it must be significant. Second, it did force us to answer the challenges. There were a lot of valid points that were raised and it forced us to react. We’ve become more intelligent as a result.”

Abhijit: “I am less certain that it has been won. The acid test of whether an idea has come to stay is that it becomes something that no one needs to justify using. … RCTs aren’t there yet: it is true almost everyone is doing them, but many of them are taking the trouble to explain that what they do is better than a ‘mere RCT.’ We need to get to the point where people take RCTs to be the obvious tool to use when possible to answer a particular class of empirical questions.”

Beyond incentives: The economist as plumber

Earlier this month, Esther Duflo of MIT gave a talk at the IMF, and the slides are available here. I found the framing insightful.


She goes on to give three examples from impact evaluations in India that seek to improve “the rules of the game,” creating systems for better governance: (1) “fixing the pipes” — eplatform for workfare payments, (2) “changing the faucet” — biometric identification for welfare payments, and (3) “replacing the meter” — inspections on polluting compliance.

Michael Kremer on how RCTs lead to innovation

The modern movement for RCTs in development economics…is about innovation, as well as evaluation. It’s a dynamic process of learning about a context through painstaking on-the-ground work, trying out different approaches, collecting good data with good causal identification, finding out that results do not fit pre-conceived theoretical ideas, working on a better theoretical understanding that fits the facts on the ground, and developing new ideas and approaches based on theory and then testing the new approaches.

This is from an insightful interview with Michael Kremer, Harvard economics professor “generally given credit for launching the RCT movement in development economics with two experiments he led in Kenya in the early 1990s,” and my graduate school advisor.

The interview is in Tim Ogden’s book Experimental Conversations: Perspectives on Randomized Trials in Development Economics.

 

Quick take: “Education Quality and Teaching Practices” in Chile

On this morning’s commute, I caught up on a new NBER Working Paper: “Education Quality and Teaching Practices,” by Marina Bassi, Costas Meghir, and Ana Reynoso.

Here is the abstract: “This paper uses a RCT to estimate the effectiveness of guided instruction methods as implemented in under-performing schools in Chile. The intervention improved performance substantially for the first cohort of students, but not the second. The effect is mainly accounted for by children from relatively higher income backgrounds. Based on the CLASS instrument we document that quality of teacher-student interactions is positively correlated with the performance of low income students; however, the intervention did not affect these interactions. Guided instruction can improve outcomes, but it is a challenge to sustain the impacts and to reach the most deprived children.”

Why no effect for lower-income students? To expand a bit on the abstract: “The most striking result from the table is the association between better student teacher interactions (reflected in a higher CLASS score) and the performance of low income students. In effect, one additional standard deviation in the principal component of CLASS scores is associated with a higher SIMCE test score for low income students of between 15% and 20% of sd units. These results are robust to adjustments in p-values to control for the FWE rate. For higher income students, effects are smaller and in some cases insignificant.”

So if the quality of interactions is particularly important for lower-income students, and the intervention isn’t affecting those interactions, then that could explain the differential effects. It’s an interesting hypothesis, and it points to the ongoing need to better understand what’s happening in the classroom.

Here’s a little more on the intervention: “The main intervention of the program was to support teachers through a modifed method of instruction by adopting a more prescriptive model. Teachers in treated schools received detailed classroom guides and scripted material to follow in their lectures.”

project placement bias in the evaluation of Christianity

If Christianity is true then it ought to follow (a) That any Christian will be nicer than the same person would be if he were not a Christian. (b) That any man who becomes a Christian will be nicer than he was before. … Christian Miss Bates may have an unkinder tongue than unbelieving Dick Firkin. That, by itself, does not tell us whether Christianity works. The question is what Miss Bate’s tongue would be like if she were not a Christian and what Dick’s would be like if he became one. …

We must, therefore, not be surprised if we find among the Christians some people who are still nasty. There is even, when you come to think it over, a reason why nasty people might be expected to turn to Christ in greater numbers than nice ones. That was what people objected to about Christ during His life on earth: He seemed to attract “such awful people.” (C.S. Lewis, Mere Christianity [available in full on-line], Book 4, Chapter 10)

just dumping computers in schools might not help

In The Use and Misuse of Computers in Education: Evidence from a Randomized Experiment in Colombia, by Felipe Barrera-Osorio and Leigh L. Linden, the authors examine a program that

aims to integrate computers, donated by the private sector, into the teaching of language in public schools. The authors conduct a two-year randomized evaluation of the program using a sample of 97 schools and 5,201 children. Overall, the program seems to have had little effect on students’ test scores and other outcomes. These results are consistent across grade levels, subjects, and gender. The main reason for these results seems to be the failure to incorporate the computers into the educational process. Although the program increased the number of computers in the treatment schools and provided training to the teachers on how to use the computers in their classrooms, surveys of both teachers and students suggest that teachers did not incorporate the computers into their curriculum.

Two thoughts on this:

  1. This reminds us – and I’d say “as if we needed reminding” except that we do – that you cannot just dump inputs into schools and expect changes.  If inputs don’t get used well, they don’t matter.  Even though this seems like a no-brainer, many development programs are very narrow: build a school or give some books or ….  Same problem, I’m afraid.
  2. That said, a quick look at the tables suggests to me that the authors may be confusing a noisy result with a narrowly bound zero result.  In other words, there seem to be differences in outcomes between kids who got computers and those who didn’t, but there is so much variation in both groups that we cannot be sure.  What this really means is that we don’t know if there is an effect, that there might be a heterogeneous effect, or there might not.  (Either way, clearly this program wasn’t a raging success.)

There is also some evidence from India (I haven’t evaluated the quality) that if you just let kids play with the computers, they’ll learn some stuff.  (One question is, Do they learn things that will help them?)

impact evaluation matters: how long do i have to wait to know if my program works?

King and Behrman have a paper on Timing and Duration of Exposure in Evaluations of Social Programs. The paper gives a useful (if not adrenaline-fueled – this is a review, after all) discussion of a host of issues to consider when deciding when to look for results of social programs. [For example, what if the program ends up rolling out at different times in different places? etc] They then give lots of examples of papers that have dealt with the issues (and how they’ve done it). Instructive stuff.

Summary: Impact evaluations aim to measure the outcomes that can be attributed to a specific policy or intervention. Although there have been excellent reviews of the different methods that an evaluator can choose in order to estimate impact, there has not been sufficient attention given to questions related to timing: How long after a program has begun should one wait before evaluating it? How long should treatment groups be exposed to a program before they can be expected to benefit from it? Are there important time patterns in a program’s impact? Many impact evaluations assume that interventions occur at specified launch dates and produce equal and constant changes in conditions among eligible beneficiary groups; but there are many reasons why this generally is not the case. This paper examines the evaluation issues related to timing and discusses the sources of variation in the duration of exposure within programs and their implications for impact estimates. It reviews the evidence from careful evaluations of programs (with a focus on developing countries) on the ways that duration affects impacts.