Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Topics covered: What is evaluation?
- Needs assessment
- Process evaluation
- Impact evaluation
- Cost-benefit analysis
Instructor: Rachel Glennerster
1: What is Evaluation?
Related Resources
Lecture slides (PDF)
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
So for the course, we have four days of lectures. Today we'll try to convince you that it was actually a good idea to come here, why randomized evaluation is such a useful tool and why it's superior to many other kinds of impact evaluation.
Once we've convinced you that it's a good idea to come here, then we'll start going through the nuts and bolts of actually how to run randomized evaluations. Tomorrow we'll go over some of the general design possibilities. The following day we'll go into some more of the technical aspects like sample size and measurement. The last day we'll kind of discuss the fact that even in randomized evaluations, things can go wrong, and how you deal with that.
And throughout the entire course as you're learning this, you'll be incorporating what you learn by designing in step with the lectures your own randomized evaluation in the groups that you've been preassigned. So if you check out your name tags, you'll see that you have a group number. Find other people with the same color and number and those will be your group members. We tried to put you together in ways that made sense. So we tried to have people who were interested in agriculture work with other people within agriculture.
And over the course of the four days, you'll be designing your own randomized evaluation. And on the last day, on Saturday, you will be presenting your evaluation to the entire group. So that is pretty much what to expect over the next five days. Have I forgotten anything? Then let me reintroduce Rachel who will start five minutes early.
RACHEL GLENNERSTER: The group the Mark was talking about is a really integral part of the course. So unfortunately that is not something that you'll be able to get online. But hopefully it's a chance for you to go through, each time we present an idea in the lecture, you then need to go and apply it in the case that you're developing in your evaluation you're developing in your group. And that's what all the teaching assistants are here, to help you go through the case studies, but also to develop your own evaluations.
So I'm going to start with the general question of what is an impact evaluation and when should we do one. One of the objectives of this lecture is just to make sure that we're all on the same page when we start using terms like process evaluation and impact evaluation. Because I realize the more I spend time with people who are kind of professional evaluators, I realize that economists and professional evaluators use the same terms but to mean different things, which is incredibly confusing.
So I'm sure a lot of this will be familiar to you, but on the other hand, we need to make sure that every one is at the same level in using the terms in the same way before we kind of head into the nuts and bolts.
But I also have incorporated in this discussion about when you should do an impact evaluation, which is something that comes up an awful lot when I go and talk to organizations who are trying to think through their evaluation strategy. They've heard that they ought to be doing more impact evaluations. There's lots of focus on this. But they're expensive. So how do they decide which of their programs to evaluate and at what stage to evaluate it. They're getting pressure from donors to do this. But they're not quite sure when it's appropriate.
So we'll try and cover those ideas. So we'll start with why is it that we here at J-PAL are focused on impact evaluation. Because there's lots of other things in evaluating our programs that are important. But we just do impact evaluation. We also only do randomized impact evaluation. And that's not to say that's the only thing that's worth doing. We certainly don't think that. That's what we do. There's a reason we do it, because we think it's important. But it's certainly not the only thing that you should be doing in your organizations.
So step back and look at the objectives of evaluation and a model of change, which is very important in terms of how to think about your evaluation, how to design it, different types of evaluation, how evaluation feeds into cost benefit analysis, and then into why to do an impact evaluation, and putting it all together into an evaluation strategy. And then coming back to how do we learn. How do we make an organization that learns from its evaluation strategy rather than just doing this as something a funder wants me to do or I have to do to tick a box. How do I develop an organization that learns from its evaluations and makes it a better organization?
So this is the motivation for what we do. And I think this point is sort of the main point here. If you step back and you think about how much evidence we have in development, to make the decisions that we need to make, it's really quite appalling how little information we have.
If you think about some of the biggest challenges in the world in development about how to prevent the spread of HIV/AIDS in Sub-Saharan Africa, how to improve the productivity of small farmers across the world, it's really amazing how little really rigorous evidence we have to make those decisions. And we may know that this project may work or that project may work, but we very rarely know what is the most cost-effective place to put a dollar that I have.
If I'm choosing in HIV prevention, if I've got to choose between a lot of different seemingly great projects, what is the project that's going to give me the most bang for my buck? And we really don't have that kind of consistent rigorous impact evaluation data in order to make those decisions. And that was really the reason why J-PAL was started, because of the feeling that we could do so much better if we had that kind of data.
And it's also too often the case that decisions about development are based on emotion rather than data. You can see this in proposals that people write and the discussions that people have, very compelling, personal stories, which are important, but aren't really what we should be making all our decisions on. That may be very motivating to get people involved. But when you're talking about trade-offs, you've got to have a lot more rigorous evidence. If we had that kind of evidence, we could be a lot more effective with the money that we have.
I also think it's true, sometimes people say oh, well you're just talking about passing a dollar and spending it in a slightly more marginally effective way. But what we really need is more money going into poverty really. But arguably, potentially one of the most important ways to get more money to go into poverty relief is to convince people that the money that's going in is actually used effectively. So I don't see these as either/or. Using the money effectively and raising more money, I think, both can come from having more evidence.
So it's also important, I think, in a way, to move from what I think is a very damaging and nonconstructive debate between the aid optimists and the aid pessimists. It's a very kind of polarized debate with Jeff Sachs on one side and Bill Easterly on the other. This is a quote from Jeff Sachs: "I've identified the specific investments that are needed"-- from the previous sentence, you know that this is to end poverty-- "found ways to plan and implement them and show that they can be affordable."
Now if you think we know everything about development already-- we know what's needed, we know how to implement it-- then, kind of, this is the wrong course for you. But I think most of us would agree that that's slightly overstating how much information we have about how to end poverty. There's a lot more questions out there than that suggests. His argument is, but we have to get people motivated. So we have got to say that we know everything. I don't think we have to be quite that optimistic.
On the other hand, I think this is way too pessimistic. After $2.3 trillion over five decades, why the desperate needs of the world's poor still so tragically unmet? Isn't it finally time to end the impunity of foreign aid? So Bill Easterly is kind of saying, oh, it has not worked. So let's throw it all away.
We've got to find a middle ground here. It's just about time that we have a development. And they're talking about aid. And I would argue it's much more about development than aid. Aid is only a small fraction of the money that's spent on reducing poverty in developing countries. Development pessimism is just as bad. We've got to think more strategically about not just that all aid is bad or development funding is wasted, but how do we focus the money on the right things.
So that's kind of the motivation for what we're doing. But if you think on a very grand scale, but thinking about the objective of evaluation in general, you can think of it as three things. Accountability, did we do what we say we were going to do? And again, this is true of aid agencies, NGOs, government. And did we have a positive impact on people's lives? So those are two different aspects of accountability that evaluation needs to speak to.
Evaluation isn't only about accountability though. I think it's very importantly about lesson learning so we do better in the future. And that's about does a particular program work or not, and what's the most effective route to achieve a certain outcome? Are there similarities? Are there lessons that you can learn across projects? Are there similarities about what we're finding in the evaluation of this project and that project?
For example, are there are ways that you're learning about how to change people's behavior in health, and agriculture, and education? Are there similarities, sort of underlying principles, that we're learning about that we can use in different contexts? And ultimately, to reduce poverty through more effective programs is the ultimate objective of evaluation.
So using that as a framework, what makes a good evaluation? Well, the key thing is it's got to answer an important question. But it's no good if it answers an important question but it answers it badly. It's got to answer it in an unbiased way. What do I mean by that? I mean that it's got to find the truthful answer to the question.
And really to do that, you need to have a model or a theory of change about how the project is working so that you can test the different steps in the model. And that's the best way to then learn the most. If we simply say, we test whether the project worked or didn't work, we learned something, but we learn an awful lot more if you have a specific model of how the project is going to work and you test the different steps along the way.
Sometimes people say-- this is something that drives me mad in the evaluation literature-- you hear people saying, well, randomized evaluations are a black box. They can tell you whether something works or not, but they can't tell you why. I hope in the next few days, we're going show to you how you design an evaluation that tells you not just whether it works or not at the end, but why and how, and the steps along the way, and design it cleverly so that you learn as much as you possibly can from the evaluation about the fundamental question. And that's about getting the questions right at the beginning. And it's about doing your model correctly and thinking of indicators along the way that are going to allow you to get to all those steps and really understand the theory of change that's happening.
The model is going to start with what is it we're trying to do, who are the targets, and what are their needs. So in an evaluation in development, this would often be called a needs assessment. And what are their needs? But then what's the program seeking to change? And looking at precise and individual bits of the program, what's the precise program or part of the program that's being evaluation, so asking very specific questions.
So let's look at an example. And all of this we're going to come back to and do in more detail. How do you do a logframe? Again, maybe some of you have done that before. Maybe you haven't. But hopefully you'll learn more about how we think about doing a logframe.
So here's an example of an evaluation that looked, a very simple one, does giving textbooks to children in Kenya improve test scores was the evaluation. But what was the need? What was the problem that the program was trying to test? Well poor children in Busia District in Kenya had low learning levels. They also had low incomes. They had few books. That meant that they couldn't take the books home, and that, the theory was, made it hard to learn. So it was hard to learn because they didn't have a book in front of them in class, but also because there were so few, they couldn't take them home and read up more and do exercises at home.
So what was the input? The input was that a local NGO bought additional textbooks. In order to get to your long-term goal, you not only need the books, you need to make sure that they're delivered. Because, again, making this chain along the way will help you understand if it doesn't work, where did it go wrong, if the books were bought but they never got there, or they were stuck in the cupboard. How many times have we been to schools and oh, yes, we have lots of books. And we don't want them to get messy when the children are using them. So they're all nicely in their sealed package.
Well you need to be able to distinguish if something doesn't work, is it because it's stuck in the cupboard? Or was it because even when the books are out there they didn't get used or they didn't help? So the books are delivered and used. The children use the books and they're able to study better.
And finally, the impact, which is what we're about here, is yes, you got all of those steps done. But did it actually change their lives? Did it actually achieve the impact you are hoping to get is high test scores? The long-term goal would be not just high test scores but higher income. And that long-term goal may be very difficult to test in the evaluation. And you may use some other work that may have linked these in previous studies in the same country to make the assumption that if we got higher test scores, it will have a positive impact. So again, that's a decision when you make in your evaluation how far along this chain you go, if it's a process evaluation you may stop here. If it's an impact evaluation you have to stop here. But you may not have enough money to take it all the way through to the finest, finest level that you would like to. Oh, I didn't do my little red triangles at right point.
OK. So I've already, in a sense, introduced some of these concepts. But again, let's review them so we know we're talking about the same thing. There are many different kinds of evaluation. And needs assessment is where you go in and look at a population, see what are the issues. How many of them have bed nets? What are test scores at the moment? How many books are there? What's class size? What are the problems in your target population?
In our process evaluation, does someone want to tell me what they would see as a process evaluation? We talked a little bit about it. Someone? Yeah.
AUDIENCE: [UNINTELLIGIBLE PHRASE] the chain that you just presented to see how you get from the input to the output from the output to the outcome of this
RACHEL GLENNERSTER: Right.
AUDIENCE: Are we successful in doing that in transforming our input into output?
RACHEL GLENNERSTER: Right. So process evaluation looks at did we buy the textbooks? Were they delivered? Were they used? So moving inputs, outputs, outcomes, but stopping short before we get to the impact. And that's a very useful thing to do, and should be done basically everywhere you do a program, or at least some of those steps need to be measured almost every time you do a program. But it kind of stops short before you get to the impact stage. Have we actually changed people's lives?
We wanted to build a school. Did we build a school? We wanted to build a bridge. Did we build a bridge? We wanted to deliver things? Did we deliver things? But it's stopping before you get the point of knowing whether this has actually changed people's lives.
So an impact evaluation then goes to the next stage and says, given that we have done what we said we're going to do, has that actually changed things? And this is where there was a big gap in terms of what we know. There's a lot of lesson learning you can do from process. But in terms of knowing what kind of project is going to be successful in reducing poverty, you really need to go this next step.
Now we used to just talk about those three. But increasingly, as I say as I have more contact outside economics and the more research side of evaluation to work a lot with DFID and other organizations, other foundations, and other agencies, I realize a lot of what people outside the academic community call evaluation I would call review. It's very weird. Because they would often call what I call something else. But what I mean by review is, it's sort of an assessment. It's sending a knowledgeable person in and reviewing the program and giving their comments on it, which can be extremely helpful if you have a good person going and talking to the people involved, and saying, well, in my experience, it could have been done differently. But it doesn't quite actually do any of these things. It's not just focused in did I build the school. But it's asking questions about was there enough participation. How well organized was the NGO?
And a lot of this is very subjective. So I'm not saying that this is bad. It's just kind of different. And if you have someone very good doing it, it can be very useful. My concern with it, is that it's very subjective to the person who's going. Yeah. Logan?
AUDIENCE: I think that you see so many reviews simply because the way-- you just mentioned DFID. USAID, I think, is the same way. It's all retroactive. The way that contracts are awarded and things like that, usually it's because it's a requirement to evaluate a certain number of programs. And it's not until after the program is actually done that they decide they're going to evaluate it. And it's obviously cheaper to send one person over and do the simple review.
I think it would be interesting. We'll probably get to this when we talk about how you can apply some of the randomized control test methodology to something that you're doing retroactively.
RACHEL GLENNERSTER: So just to repeat, the argument is we do a lot of reviews because a lot of evaluation is done retroactively. What you can do at that point is very limited. Yes.
So this is a big distinction between the kinds of evaluations, is one that's set up beforehand and one that is after the event. We've got this program. We want to know whether it works. Basically it's really hard to do that. You've already kind of shot yourself in the foot if you haven't set it up beforehand. If we think about what I was saying about it's crucial to have a theory of change, a model about what we're trying to achieve and how we're going to try and achieve it, and measuring each of those steps, if you're coming in afterwards, then you're kind of adhoc-ly making up what your theory of change is. And you haven't set up systems to measure those steps along the way, it's going to be very hard to do. And that's exactly why you end up with a lot of reviews. You're in this mess. And so you just send someone knowledgeable and hope they can figure it out.
To answer your specific question though, you can't do a randomized evaluation after the event. Because the whole point is you're moving people into treatment and control based on a flip of the coin. And then after the event, people have been allocated to the treatment or not the treatment. And it's very difficult to know afterwards were these people similar beforehand. It's impossible to distinguish. They may look different now. But you don't know whether they look different now because they were different in the beginning or because they're different because of the program. Yeah?
AUDIENCE: I was really interested to read the first case study because it seemed that you were applying randomized control methodology. But it seemed to be actually done retroactively.
RACHEL GLENNERSTER: No. It wasn't. It might look that way. But it was set up so the first case study uses a lot of different methodologies, compares different methodologies. But they couldn't use all those methodologies if they hadn't designed it as a randomized study at the beginning actually. If you've set up a randomized evaluation, you can always do a non-randomized evaluation of it. But if you haven't done it as a randomized to start with, you can't make it randomized.
Prospective evaluation, setting up the evaluation from the beginning, is very important I would say in any methodology you use. But it's impossible to do it. There are a couple of examples where people have done a randomized evaluation afterwards or an evaluation afterwards. And that is because the randomization happened beforehand. But it wasn't done because it was an evaluation.
So if you look at the case on the women's empowerment in India, you will do later in the week, that was not set up as an evaluation. It was set up as a randomized program. And the rationale was they wanted to be fair. So where there's limited resources, sometimes governments are the people who randomized in order to be fair to the different participants.
Some people, in this case, will have to get a women's leader. In Colombia Project, that you'll find on our website, the Colombian government wanted to provide vouchers to go to private school. But they couldn't afford it for every one. So they randomized who would get them.
So that's the one case where you can do a randomized evaluation after the event, when somebody else is randomized beforehand, but they weren't actually thinking of it as an evaluation. But even then, it would've been nice to have data beforehand.
So the last thing on this list is cost-benefit analysis, which is something that you can do with the input from all of these other things. As they say, the piece of information that we have so little of is what's the effect of a dollar here versus a dollar here. And you can only do that if that's one of your ultimate objectives when you're doing these other impact evaluations or these other evaluation methodologies. Because you need to be collecting data about costs. And the benefits will come from your impact evaluation. But you need to get your costs from your process evaluation.
And you can put the two together. And you can do a cost effectiveness. Then if somebody else has done that in their other study, you can do a cost effectiveness comparison across studies. Or even you can evaluate a range of different options on your impact evaluation. And that will give you comparative cost-effectiveness across.
So going into a bit more detail in some of these, needs assessment. We'll look at who's the target population. Is it all children or are we particularly focused on helping the lowest-achieving in the group? What's the nature of the problem being solved? Many of these communities will have lots of problems. So what are we particularly trying to focus on here? Is a test schools? Is it attendance at school? How will textbooks solve the problem? I was talking about textbooks. Maybe it's because they can take them home. Well, if that's part of your model, your theory of change, you need to be actually measuring that, not just did they arrive.
How does the service fit into the environment? How many times have we sat in an office and designed something that we thought made complete sense, and gone out to the field and thought, what was I thinking? This isn't going to work. How does it feel for the teachers? Do they understand the new books? Do they know how to teach from them? How do the books fit into the curricula?
What are you trying to get out of this? As they say, you want a clear sense of the target population. So then you want to see are the students responding? If you're particularly worried about low-performing kids, are they responding to the textbooks?
Students who are falling behind, a sense of the needs the program will fill, what are the teachers lacking? How are we going to deliver the textbooks? How many textbooks are we going to deliver? And what are the potential barriers for people learning from the textbooks? Then a clear articulation of the program benefits, and a sense of alternatives.
And as I say, if you want to look at cost-effectiveness of alternative approaches, it's very important to think through not just this program in isolation, but what are the alternatives that we could be doing? And how does that fit with them? Is this one is the most expensive things we're going to try, one of the cheapest things we want to try, and everything in between? So you may be thinking in this context, is this a replicable program that I'm going to be able to do elsewhere? Is this the gold-plated version that I'll do if I get lots of funding? Or is this something that I can replicate in lots of other places?
Process evaluation, I've really sort of talked quite a bit about these. So I'm going through them faster. And when you do an impact evaluation, because the impact evaluation is the last thing on that chain, you need to do all the other bits on the chain as well. You can't do an impact evaluation without a process evaluation or you won't understand what the hell your answer is meaning at the end.
So as we say, a process evaluation is asking are the services being delivered? Is the money being spent? Are the textbooks reaching the classroom? Are they being used? And it's also, as I say, important to be asking yourself, what are the alternatives. Could you do this in a better way? Just like a company is always thinking are there ways to reduce costs. You should be thinking are their ways to do this more cheaply.
Are the services reaching the right populations? Which students are taking them home? Is it the ones that I'm targeting or only the most motivated ones? And also, are the clients satisfied? What's their response to the program?
So an impact evaluation, am i missing a top bit here? No. OK. Here we go. We're out of order. So an impact evaluation is, as they say, taking it from there. So assuming once you've got all the processes working, and it's all happening, but if it happens, does it produce an impact?
Take our theory of change seriously. And say what we might expect to change if that theory of change is happening. So we've got this theory of change that says, this is how we expect things to change. These are the processes by which we expect, like the kids taking the books home. So we want to design some intermediate indicators and final outcomes that will trace out that model.
So our primary focus is going to be, did the textbooks cause children to learn more. But we might also be interested in some distributional issues. Not just on average, we might also be interested in was it the high achieving kids that learned more? Was it the low-achieving kids? Because very often in development, we're just as interested in the distributional implications of a project as the average. So who is it who learned?
How does impact differ from process? In the process, we describe what happened. And you can do that from reading documents, interviewing people in administrative records. In an impact question, we need to compare what happened to the people who got the program with what would have happened. This is the fundamental question that Dan is going to hammer on about in his lecture about why do we use randomized evaluations. We talk about this is the counterfactual. What would have happened if the program hadn't happened? That's the fundamental question that we're trying to get at. Obviously it's impossible to know exactly what would have happened if the program hadn't happened. But that's what we're trying to get at. Just one second. Yeah?
AUDIENCE: So one thing that would seem to fit in somewhere with the impact thing but doesn't quite meet the criteria that you've just described that we've use sometimes is this pre-post test. And that isn't necessarily going to say what would have happened. But it will say, well, what were the conditions when you started? And we extrapolate from that looking at where we are when we ended, what can we say about the impact of the intervention?
RACHEL GLENNERSTER: Right. So that is one way that people often try and do an impact evaluation and measure are they having an impact. And I guess it can give you some sense of whether you're having an impact or flag problems. It's to say, well, what were conditions at the beginning? What are they like now? Then you have this assumption, which is that all the difference between then and now is due to the program. And often that's not a very appropriate assumption. Often things happen.
If we take our example of schools, the kids will know more at the end of the year then they knew at the beginning. Well would they have known more even if we hadn't given them more textbooks. Probably. So that's kind of the fundamental assumption you're making. And it's a difficult one to make.
It's also the case that we talked to people who were doing a project in Gujarat. And they were tearing their hair out and saying, well, we seem to be doing terribly. Our program is doing terribly. People now are worse off than when we started. This was, well Mark will know the years of the riots and earthquake in Gujarat.
They'd basically taken data when they started. In the meantime, there had been a massive earthquake and massive ethnic riots against Muslims in Gujarat. Of course people were worse off. And that's not because of you. So if can go either way actually. You can assume that your program is doing much better because other things are coming along and helping people. And you're attributing all the change to your program.
Or it could be the case in this extreme example. There's a massive earthquake and massive religious and ethnic riots and you attribute all the negative to your program. So it's a way that sometimes people use of trying to get an impact. It's not a very accurate way of getting your impact, which is why a randomized evaluation would help.
So, as you say, it doesn't quite fit this criteria. Because it doesn't quite answer. It says what happened over the period. It doesn't say what would have happened. It is not a comparison of what would have happened with what actually had. And that's how you want to get at your impact. So there are various ways to get at it. But some of them are more effective than others.
So let's go back to our objectives and see if we get a match these different kinds of evaluations to our different objectives for evaluation and find out which evaluation will answer which question. So accountability: the first question for accountability is just did we do what we said we were going to do. Now that you can use a process evaluation to do that. Because did I do what I said I was going to do. I promised to deliver books. Did I actually deliver books? Process evaluation is fine for that kind of level of accountability.
If my accountability is not just did I do what I said, but did what I do help? Ultimately I'm there to help people. Am I actually helping people? That's a deeper level of accountability. And that, you can only answer with an impact evaluation. Did I actually make the change that I wanted to happen?
If we look at lesson learning, the first kind of lesson learning is, does a particular program work or not work. So an impact evaluation can tell you whether a particular program worked. If you look at different impact evaluations of different programs, you can start saying which ones worked, whether they work in different situations, or whether a particular kind of program works in different situations or not.
Now what is the most effective route for achieving a certain outcome is kind of an even deeper level of learning. What kind of thing is the best thing to do in this situation? And there you want to have a cost-benefit analysis comparing several programs based on a number of different impact evaluations.
And then we said an even deeper level is, can I understand how we change behavior? Understand deep parameters of what makes a successful program, of how do we change behavior from health to agriculture? What are some similarities and understanding of how people tick, and how we can use that to design better programs? And again, that's linking our results back to theories. You have got to have a deeper theory understanding it, and then test that with different impact evaluations. And you can get some kind of general lessons from looking across impact evaluations.
And then if we want to have our reduced poverty through more effective programs, which is our ultimate objective of doing evaluations, we've got to say, did we learn from our impact evaluations? Because if we don't learn from them and change our programs as a result, then we're not going to achieve that.
And I guess to say that solid, reliable impact evaluations are a building block. You're not going to get everything out of one impact evaluation. But if you build up enough, you can generate the general lessons that you need to do that.
I've said quite a lot of this already. But needs assessments give you the metric for defining the cost-benefit ratio. So when we're looking at cost-benefit analysis, we're looking at what's the most cost-effective way of achieving x? Well, you need a needs assessment to say what's the x? What's the thing that I should be really trying to solve? Process evaluation gives you the costs for your inputs to do a cost-benefit analysis. And an impact evaluation tells you the benefit. So all of these different inputs and needed to be able to do an effective cost-benefit analysis.
AUDIENCE: Rachel?
RACHEL GLENNERSTER: Yeah?
AUDIENCE: The needs assessment seems to be more of a program design sort of a [UNINTELLIGIBLE], whereas the remaining three are more like the program has already been designed and we are being cautious, we have thought that this is the right program to go with. Please design a process evaluation for this or a program evaluation for this. How has that needs assessment different from the one that feeds into program design?
RACHEL GLENNERSTER: Well, in a sense, there's two different concepts here. You're right. There's a needs assessment for a particular project. We're working with an NGO in India called [UNINTELLIGIBLE] working in rural Rajasthan. And they said, we want to do more on health in our communities. We've done a lot of education and community building. But we want to do a lot more in health.
But before we start, we want to know what are the health problems in this community? It doesn't make sense to design the project until you know. So we went in and did., what are the health problems? What's the level of services? Who are they getting their health from? We did a very comprehensive analysis of the issues.
And that was a needs assessment for that particular NGO in that particular area. But you can kind of think of that in a wider context of saying, what are the key problems in health in India or in developing countries? What are the top priority things that we should be focusing on? Because again-- and I'm going to get on to strategy in a minute-- if you're thinking as an organization, you can't do an impact evaluation for everything. You can't look at comparative cost-effectiveness for outcomes in the world. Or at least you've got to start somewhere. You've got to start on, what do I most want to know? What's the main thing I want to change to see what's the cost of changing that thing?
So is it test scores in schools? Or is it attendance? Am I most concerned about improving attendance? If you look at the Millennium Development Goals, in the sense, that's the world's prioritizing. They're saying, these are the things that I most want to change in the world. And there they made the decision, rightly or wrongly, on education, that they wanted to get kids in school. And there isn't anything about actually learning.
And whether your needs is getting kids in school or learning, you would design very different projects. But you would also design different impact evaluations, because those are two very different questions. So the needs assessment is telling you what are the problems? What am I prioritizing for my programs, but also for my impact evaluations? Yeah?
AUDIENCE: Do you need to make a decision early on whether you're interested in actually doing a cost-effectiveness analysis as opposed to a cost-benefit.
RACHEL GLENNERSTER: [INTERPOSING VOICE]
AUDIENCE: [UNINTELLIGIBLE PHRASE]. efficiency measure, where as cost [UNINTELLIGIBLE}--
RACHEL GLENNERSTER: So I'm kind of using those too easily interchangeably. I don't think it's so important here. How would you define the difference between them?
AUDIENCE: As I understand it, but [UNINTELLIGIBLE PHRASE] they getting a better answer. But cost-effectiveness is a productivity measure. And it would mean that you would have to, in an evaluation say, OK, I'm going to look at I put one buck into this program and I get how many more days of schooling out of it. Right?
RACHEL GLENNERSTER: Right.
AUDIENCE: Whereas cost-benefit requires that it all be in dollars or some other [UNINTELLIGIBLE].
RACHEL GLENNERSTER: So you've got to change your benefit into dollars. So I'll give you an example of the difference.
AUDIENCE: Like [INAUDIBLE PHRASE].
RACHEL GLENNERSTER: Let's make sure everybody's following this discussion. A cost-effectiveness question would be to say, I want to increase the number of kids in school. How much would it cost to get an additional year of schooling from all of these different programs? And I'm just assuming that getting kids in school is a good thing to do. Right? I want to do it. So I'm asking what's the cost per additional year of schooling from conditional cash transfer, from making it cheaper to go to school by giving free school uniforms, or providing school meals. There are many different things I could that will encourage children to come to school. But I know I want children to come to school. I'm not questioning that goal. So I just want to know the cost of getting a child in school.
Cost-benefit kind of squishes is it all. And it really asks the question, is it worth getting kids in school? Because then you can say, if I get kids in school, they will earn more and that will generate income. So if I put a dollar in, am I going to get more than a dollar out at the end? I'm not going to flick all the way back to it. But if you remember that chart that went through the process and impact, and then the final thing of high test scores was higher income, ultimately am I getting more money out of it than I'm putting in?
I think that's sort of a philosophical decision for the organization to make. It's very convincing to be able to say, for every dollar we put in, I think for the deworming case that you've got and you do later in the week, they do both, cost-effectiveness and cost-benefit. And the cost-effectiveness says, this is the most cost-effective way to get children in school.
But they also then go further and say, assuming that these studies that look at children in school in Kenya earn higher incomes are correct, then given how much it costs to get an additional year of schooling, and given an assumption about how much extra kids will in the future because they went to school, then for every dollar we put in, I think it's you get $30 back.
So you kind of have to make an awful lot more assumptions. You have to go to that final thing and put everything on income. Now if I was doing the women's empowerment study, then I'm not sure that I would want to reduce women's empowerment to dollars. I might just care about it. I might care that women are more empowered whether or not it actually leads to higher incomes.
So it kind of depends on the argument that you're making. If you want to try and make a this is really worth it, this is a great program, not just because it's more effective than another program, but that it generates more income then I'm putting in. That's a great motivation. But I wouldn't say you always have to reduce it to dollars. Because you have to make an awful lot of assumptions. And we don't necessarily always want to reduce everything to dollars.
So here it is. We've just been talking about it. So this is a cost-effectiveness. So this is the cost per additional year of schooling induced. We're not linking it back to dollars we're measuring. We're just assuming that we want kids in school. Millennium Development Goals have it as a goal. we just think it's a good thing, whether or not it generates income.
What we did is take all the randomized impact evaluations that had as an outcome getting more children in school and calculated the cost per additional year of schooling the resulted. So you see a very wide range of different things.
Now conditional cash transfers turn out to be by far the most expensive way of getting an additional year of schooling. Now that's partly because mainly they're done in Latin America where enrollment rates are already very high. So it's often more expensive to get the last kid in school than the 50th percentile kid in school. And then the other thing, of course, in general, things cost more in Mexico than in Kenya, especially when you're talking about people. Teacher's wages or wages outside of school are more expensive.
But the thing that was amazing was that providing children with deworming tablets was just unbelievably cost-effective. So $3.50 for an additional year of schooling induced. And putting it this way I think really brought out that difference.
The other thing I should say in comparing this is, there were other benefits to these programs. So Progresa actually gave people cash as well. So it wasn't just about getting kids in school. So of course it was expensive, right? And we haven't calculated in those costs. In cost-benefit, if we reduced everything to dollars, it would look very different because you've got a value of all these other benefits.
But again, deworming had other benefits. It had health benefits as well as education benefits. So we're just looking at one measure of outcomes here.
AUDIENCE: Excuse me.
RACHEL GLENNERSTER: Yeah?
AUDIENCE: Are these being adjusted for Purchasing Power Parity, PPP?
RACHEL GLENNERSTER: So this is not PPP. This is absolute. So again, we've sort of debated it backwards and forwards. So if you're a country, you care more about PPP. But if you're a donor and you're wondering whether to send a dollar or a pound to Mexico or Kenya, you don't care about PPP. You care about where your dollar is going to get most kids in school. So there's different ways of thinking about it.
It sort of depends on the question you're asking and who's the person. For a donor, I think this is the relevant way. If you're a donor who only cares about getting kids in school, this is what you care about. We also can redo this taking out the transfers. There's this other benefit, the families, of getting the money. So this is the cost to a donor. So that's one way of presenting it. But you can present it in other ways too.
AUDIENCE: Can you also sometimes do a cost-benefit of the evaluation itself?
RACHEL GLENNERSTER: That's kind of hard to do because the benefits may come ten years later. The way to think about that is to think about who's going to use it, and only do it if you think it's going to actually have some benefits in terms of being used and not just maybe within the organization. But if it's expensive, is it going to be useful for other people? Is it answering a general question that lots of people will find useful?
So often evaluations are expensive in the context of a particular program. But they're answering a question the lots of other people will benefit from. So the Progresa evaluation has spurred not just the expansion of Progresa in Mexico, but it has spurred it in many other countries as well because it did prove very effective. Although it's slightly less cost-effective in these terms. But it led to an awful lot of learning in many, many other countries. So, I think, in that sense, it was an extremely effective program evaluation.
AUDIENCE: Excuse me, I just have a question on that very last item there.
RACHEL GLENNERSTER: OK, so this one is even cheaper and it's a relatively new result. But it only works in certain circumstances. When people don't know the benefits of staying on in school, ie., how much higher wages they're going to get if they have a primary education, then telling them that information is very cheap. And both in the Dominican Republic and Madagascar-- so two completely different contexts, different rates of staying on in school, different continents, very different schooling systems-- in both cases it was extremely effective at increasing the number of kids staying on in school. But that only works if people are underestimating the returns of staying on in school.
If they're overestimating them, then it would reduce staying on in school or if they know already, then it's not going to be effective. So this is something that I think is a very interesting thing ti do, and again, is worth doing. But you need to first go in and test whether people know what the benefits of staying on in school are.
Basically they just told them what's the wage if you complete primary education versus what's the wage if you don't complete primary education. It's very cheap. So if it changes anything, it's incredibly effective.
AUDIENCE: Is the issue of marginal returns a problem? Do you have to say that every program is only relevant to places where it's at same level of enrollment or admission?
RACHEL GLENNERSTER: Well this is a sort of wider question of external validity. When we do a randomized evaluation, we look at what's the impact of a project in that situation. Now at least you know whether it worked in that situation, which is better than not really knowing whether it worked in that situation. Then you've got to make a decision about whether you think that is useful to another situation. A great way of doing that is to test it in a couple of different places.
So again, this was tested in two very different situations. The deworming had very similar effects. In rural primary schools in Kenya, it works through reducing anemia. Reducing anemia in preschool urban India had almost identical effects. Getting rid of worms in a non-randomized evaluation to be true, but kind of a really nicely designed one in the south of the United States had almost exactly the same effect.
So they got rid of hookworm in the 1900s. And again, it would increase school attendance, increase test scores, and actually increase wages just from getting rid of hookworm. And they reckoned a quite substantial percentage. This paper by Hoyt Bleakley at Chicago found that quite a substantial difference in the income of the North and the South of United States in 1900 was simply due to hookworm. So this is being tested.
So ideally you test something in very different environments. But you also think about whether it makes sense that it replicates. So if I take the findings of the women's empowerment study in India where it works through local governance bodies that are quite active in India and have quite a lot of power, and tried to replicate in Bangladesh where there is no equivalent system, I would worry about it. Whereas worms cause anemia around the world. And anemia causes you to be tired. And being tired is likely to affect you going to school. That's something that seems like it would replicate. So you have to think through these things and ideally test them.
If I'm doing microfinance, would I assume it has identical effects in Africa, or Asia, and Latin American? No. Because it's very dependent on what are the learning opportunities in those environments. And they're likely to be very different. So I'd want to test it in those different environments.
We're falling a bit behind. So I promised to do when to do an impact evaluation. So there are important questions you need to know the answer to. So that might be because there's a program that you do in lots of places, and you have no idea whether it works. That would be a reason to do one.
You're very uncertain about which strategy to use to solve a problem. Or there are key questions that underline a lot of your programs, for example, adding beneficiary control, having some participatory element to your program. It might be something that you do in lots of different programs when you don't know what's the best way to do it or whether it's being effective. An opportunity to do it is when you're rolling out a big new program. And you're going to invest an awful lot of money in this program, you want to know whether it works.
This is a tricky one. You're developing a new program and you want to scale it up. At what point in that process should you do the impact evaluation? Well you don't want to do it once you've scaled it up for everywhere. Because then you find out it doesn't work, and you've just spend millions of dollars scaling it up. Well that's not a good idea.
On the other hand, you don't want to do it when it's your very first designs. Because often it changes an awful lot in the first couple of years as your tweaking it, and developing it, and understanding how to make it work on the ground. So you want to wait until you've got the basic kinks ironed out. But you want to do it before you scale it up too far.
We've done a lot of work with this NGO in India called Pratham. And we started doing some work for them. And by the time we finished doing an evaluation, their program had completely changed. So we kind of did another one. So we probably did that one a little bit too early. But on the other hand, now they're scaling up massively. And it would be silly to wait until they'd done the whole of India before we evaluated it.
AUDIENCE: You said it may be more appropriate to do a process evaluation initially to get a program to the point where it can be fully implemented and all the kinks are worked out.
RACHEL GLENNERSTER: Yeah, exactly. If we're going back to our textbook example again, you don't want to be doing it until you've got your delivery system for the textbooks worked out, and you've made sure you've got the right textbook. It's a bit of a waste of money until you've got those things. And exactly, a process evaluation can tell you whether you've got those things working.
The other thing that makes it a good time or a good program to do an impact evaluation of is one that's representative and not gold-plated. Because if Millennium Development Villages, $1 million per village. If we find that that has an impact on people's lives, that's great. But what do we do with that? We can't give $1 million to every village in Africa.
So it's not quite, what's the point? But it's less useful than testing something that you could replicate across the whole of Africa, that you have enough money to replicate in a big scale. So that's interesting because you can use it more. Because if you throw everything at a community, yes, you can probably change things. But what are you learning from it?
So it takes time, and expertise, and money to do it right. So it's very important to think about when you're going to do it and designing the right evaluation to answer the right question that you're going to learn from.
AUDIENCE: If a program hasn't been successful, have you found that the NGO's have abandoned that program?
RACHEL GLENNERSTER: Yes, mainly. We worked with an NGO in Kenya that didn't work. They just moved on to something else. Pratham, we actually did two things, both of which worked, but one which was more cost-effective than the other. And they dumped the computer assisted learning even though it was like phenomenally successful. But the other one was even cheaper.
So they really scaled that up. And they haven't really done computer assisted learning even though it had a very big effect on math test scores. And compared to anybody else doing education, it was very cost-effective But compared to their other approach, which was even more cost-effective, they were like, OK. We'll do the one that's most cost-effective.
Now there are some organizations that kind of do one thing. And it's much harder for them to stop doing that one thing if you find it doesn't work. They tend to think, well, how can I adapt it? But these organizations that do many things are often very happy to, OK, that didn't work. We'll go this direction.
So we want to develop an evaluation strategy to help us prioritize what evaluations to do when. So the first thing to do is step back and ask, what are the key questions for your organization? What are the things that I really, really need to know? What are the things that would make me be more successful, that I'm spending lots of money on but I don't know the answer, or some of these more fundamental questions, as they say, about how do I get beneficiary control across my different programs.
The other key thing is you're not going to be able to answer all of them by your own impact evaluations. And as they say, it's expensive to do them. So the first thing to do is to go out and see if somebody else has done a really good impact evaluation that's relevant to you to answer your questions already. Or half answer or more gives you the hypotheses to look at.
How many can I answer just from improved process? Because if my problems are about logistics, and getting things to people, and getting cooperation from people, then I can get that from process evaluation. So from that you can select your top priority questions for an impact evaluation and establish a plan for answering them. So then you've go to look for opportunities where you can develop an impact evaluation that will enable you to answer those questions. So am I rolling out a new program in a new area? And I can do an impact evaluation there.
Or you might even want to say, I want to set up an experimental site. I don't really know whether to go this way or that way. So I'm just going to take a place and try different things. And it's not going to be really part of my general rollout But I'm going to focus in on the questions. Should I be charging for this or not? How much should I charge? Or how should I present this to people? And you can take a site and kind of try a bunch of different things against each other, figure out your design, really hone it down, and then roll that out. So those are two kinds of different options of thinking about how to do it.
And then, when you've got those key questions of your impact, you can combine that with process evaluations to get your global impact. What do I mean by that? Let's go back to our textbook example. If you're giving out textbooks across many states or throughout the country, you've evaluated it carefully in one region. And you find that the impact on test scores is whatever it is. And then you know very carefully, and maybe you've tested it in two different locations in the country and you've got very similar results.
So then you can say, well I know that every time I give a textbook, I get this impact on test schools. Then from the process evaluation, you know how many textbooks are getting in the hands of kids. Then you can combine the two, multiply up your impact numbers by the number of textbooks you give out.
Malaria control with bed nets, if I hand out this many bed nets, then I'm saving this many lives. I've done that through a careful impact evaluation. And then all I need to do is just count the number of bed nets that are getting to people and I know my overall impact. So that's a way that you can combine the two. You don't have to do an impact evaluation for every single bed net you hand out. Because you've really got the underlying evaluation impact model, and you can extrapolate.
AUDIENCE: Rachel?
RACHEL GLENNERSTER: Yeah?
AUDIENCE: Do you think in the beginning when you got a program that you're interested in, do you think that's the moment to think about the size of the impact that you're looking at that people expect? And also, as part of that, what's going to be the audience, the ultimate audience that you're trying to get to if you're successful with a scale-up. And those two things, I think, frequently come together. Because it's the scaling up process where people are going to start to look at those cost-effectiveness measures and cost-benefit.
RACHEL GLENNERSTER: I mean, I would argue that you've always got to be thinking about your ultimate plans for scaling it up when you're designing the project. Because you design a project very differently if you're just trying to treat a small area than if you're thinking about, if I get this right, I want to do it on a much wider area. If you've always got that in mind, you're thinking a lot about is this scalable? Am I using a resource that is either money or expertise that is in very short supply, in which case there's no point in designing it this way because I won't able to scale it beyond this small study area.
So if that's your ultimate objective, you need to be putting that into the impact evaluation from the moment. Because there's no point in doing the impact evaluation, very resource-intensive project, and say, well, that works. But I can't do that everywhere. Well then what have you learned? You want to be testing the thing that ultimately you're going to be able to bring everywhere.
So in a lot of our cases, we're encouraging our partners to scale it back. Because you won't be able to do this on a big scale. So scale it back to what you would actually be doing if you're trying to do the whole of India or the whole of this state. Because that's what's useful to learn. And you want to be able to sell to someone to finance the scale-up. So I think having those ideas in your mind at the beginning is very important, and as they say, making it into a strategy, not a project by project evaluation, but thinking about where do I want to go as an organization. What's the evidence I need to get there, and then designing the impact evaluations to get you that evidence.
And people often ask me about how do you make sure that people use the evidence from impact evaluations. And I think the main answer to that is ask the right question. Because it's not about browbeating people to make them read studies afterwards. If you find the answer to an interesting question, it'll take off like wildfire. It will be used. But if you answer a stupid question, then nobody is going to want to read your results.
So we're learning from an impact evaluation, so learning from in a single study did the program work in this context? Should we expand it to a similar population? Learning from an accumulation of studies, which is what we want to get to eventually, is did the same program work in a range of different contexts, India, Kenya, south of the United States? And that's incredibly valuable because then your learning is much wider and you can take it to many more places.
Did some variation in the same program work differently, ie., take one program and try different variants of it and test it out so that we know how to design it.
Did this same mechanism seem to be present in different areas? So there's a lot of studies looking at the impact of user fees in education and health. You seem to get some very similar results. And again, that's even more useful. Because then you're not just talking about moving deworming to another country. You're talking about user fees. What have we learned about user fees across a lot of different sectors? There's some common understandings and learnings to take to even a sector that we may not have studied before.
And then, as they say, putting these learnings in the place, in filling in an overall strategy of what were my gaps in knowledge? And am I slowly filling them in? So, I think that's it.
So I'm sorry the last bit was a little bit rushed. The idea was to kind of motivate why we're doing all of this. Today you're going to be in your groups. The task for your groups today, as well as doing the case, is to decide on a question for an evaluation that you're going to design over the next five days.
So hopefully that's made you think about what's an interesting question. What should we be testing? Because I think often an underlooked element of designing an evaluation is what's the question that we want to be answering with this evaluation? Is it a useful question? How am I going to use it? What's it going to tell me for making my program, my whole organization more effective in the future? So any questions?
AUDIENCE: What would you say are some of the main limitations of randomization? So I assume one of them is extrapolate the populations that are different? Are there other main ones that you can think of?
RACHEL GLENNERSTER: So it's important to distinguish when we talk about limitations. One is just general, what's the limitation to say, extrapolating beyond? But the other thing is to think of it in the context of what's the limitation versus other mechanisms?
Because, for example, the extrapolating to other populations is not really a limitation of randomized evaluations compared to any other impact. Any impact evaluation is done on a particular population. And so there's always a question as to whether it generalizes to another population. And the way to deal with that is to design it in a way that you learn as much as you possibly can about the mechanisms, about the routes through which it worked.
And then you can ask yourself when you bring it to another population, do those routes seem like they might be applicable, or is there an obvious gap? This worked through the local organization. But if that organization isn't there, is there another organization that it could work through there or not?
If you think that deworming works through the mechanism of anemia, well it works between the mechanism of there being worms and of anemia, you can go out and test. Are there worms in the area? Is the population anemic because there may be worms and they're not anemic.
So that's a way to design the evaluation to limit that limitation or reduce the problem of that limitation. But it's not like the very active flipping a coin and randomizing causes the problem of not being able to extend it. It's true of any impact evaluation.
I think one limitation which you will find in your frustration as you want to try and answer every single question that you have, and you get into the mechanics of sample size and how much sample size do I need-- and again, that's not necessarily just of a randomized evaluation, but any quantitative evaluation-- you can test a limited number of hypotheses. And every hypothesis you want to test needs more sample. And so the number of questions you can answer very rigorously is limited. And I think that's the limitation that we often find very binding. Again, any rigorous quantitative evaluation will have that limitation.
We'll talk a lot tomorrow about sometimes you just can't randomize. Freedom of the press is not something that you can randomize except by country. And then we'd need every country in the world. It's just not going to happen. So we'll look at a lot of new techniques or different techniques that you can use to bring randomization to areas where you think it would be impossible to bring it to.
Compared to other quantitative evaluations, you sometimes have political constraints about where you can randomize. But as I say, quantitative versus qualitative, the qualitative isn't so limited by sample size constraints. And you're not so limited to answer very specific hypotheses.
The flip side is you don't answer any specific hypotheses. And it's the same rigorous way. But it's much more open. So very often what we do is we combine a qualitative and quantitative, and spend a lot of time doing qualitative before to hone our hypotheses, and then use a randomized impact evaluation to test those specific.
But if you sit in your office and design your hypotheses without any going out into the field, you will almost certainly waste your money because you won't have asked the right question. You won't have designed it. So you need some element of qualitative to make sure some needs assessment, some work on the ground to make sure that you are asking, you're designing your hypotheses correctly because you've only got a few shots. Yeah?
AUDIENCE: I was wondering, do you know of some good evaluation or randomized impact evaluation on conservation programs?
RACHEL GLENNERSTER: On conservation programs? I can't think of any, I'm afraid, but eminently doable. But we can talk about that if you can persuade your group to think about designing one. Anyone else think of a conservation program? Yes?
AUDIENCE: I don't have an example. I wish I did. And you've mentioned this. I just need to really underline it for myself. A lot of the programs that my organization does are comprehensive in nature. So they have lots of different elements meant to in the end, collectively, [UNINTELLIGIBLE PHRASE]. What I'm understanding here is that you could do an impact evaluation of all of those collectively. But really it would be more useful to pull them out and look at the different interventions side by side or something. Because that way you'll get a more targeted--
RACHEL GLENNERSTER: It's true. The question was, if you have a big package of programs that does lots of things, you can do an evaluation of the whole package and see whether it works as a package. But in terms of learning about how you should design future programs, you would probably learn more by trying to tease out, take one away, or try them separately. Because there might be elements of the package that are very expensive but are not generating as much benefit as they are cost. And you would get more effect by doing a smaller package in more places. You don't know unless you take the package apart.
Now then if you test each one individually, that's a very expensive process. Because it needs a lot of sample size to test each individually. There's also a very interesting hypothesis that's true in lots of different areas. People often feel, where there are lots of barriers, so we have to attack all of them. It only makes sense. You won't get any movement unless you do. There are lots of things stopping kids going to school. There's stopping, say, girls going to school. They're needed a home. There are attitudes. There is their own health. There's maybe they are sick a lot. So we have to address all of those if we're going to have an impact.
We don't know, the answer is. And indeed, in that example where we're working with Save the Children in Bangladesh, they had this comprehensive approach. Where there are all these problems. So let's tackle them all. We convinced them to divide it up a bit, and test different things, and see some of their own worked, or whether you needed to do all of them together before you changed anything, which is a perfectly possible hypotheses and one that a lot of people have, but hasn't really been tested. The idea that you've got to get over a critical threshold. And you've got to build up to it. And only once you're over there do you see any movement.
Well actually on girls going to school, it's quite interesting. Most of the evaluations that have looked at, just generally, improving attendance at school, have had their biggest impact on girls. I should say most of those were not done in the toughest environments for girls going to school. They're not in Afghanistan or somewhere where it's particularly difficult. But it is interesting. Just general things and approaches in Africa and India have had their biggest impacts on girls, which suggests that you've got a hit every possible thing is maybe not right. Yeah?
AUDIENCE: [INAUDIBLE PHRASE] and the political constraints [INAUDIBLE PHRASE].
RACHEL GLENNERSTER: Right. So we'll talk actually tomorrow quite a lot about the politics of introducing an evaluation or at least the different ways that you can introduce randomization to make it more politically acceptable. That's slightly different from whether the senior political figures want to know whether the program works or are willing to fund an evaluation.
I've actually been amazingly surprised. Obviously we find that some places. There are certain partners or people we've started talking with. And you can see the moment the penny drops that they're not going to have any control. Because you're going to do a treatment comparison. You're going to stand back. At the end of the day the results going to be that. There's no fiddling with it, which is one of the beauties of the design. But it will be what it will be, which is kind of why it's convincing. But there are certain groups who kind of figure that out. And they run for the exit because there's going to be an MIT stamp of approval evaluation potentially saying their program doesn't work. That's life. Some people don't want to know.
The best thing I can say in that situation is test alternatives. It's much less threatening to test alternatives. Because there's always some alternative of this versus that, that people don't know. And then you're not raising the does it work. You're saying well, does this work better than that? And that is much less threatening. It doesn't tell you quite as much. But it's much less threatening.
There's a report called When Will We Ever Learn, looking at the politics of why don't we have more impact evaluations, which was very pessimistic. But if you look at somewhere like the World Bank that just put a purse of money for doing randomized impact evaluations out there. And anybody in the bank could apply. And people were like why? There's no incentives for them to do it. Program office's, they've already got a lot on their plate. Why would they add doing this? It's going to find out that they're opening themselves to all these risks because maybe their program doesn't work.
Massively oversubscribed, first year, six times more applicants then there was money. It just came out of the woodwork as soon as there was some money to do it. So I'm not saying every organization is like that. Obviously not everybody in their bank did that. But it was, to me, actually quite surprising how many people were willing to come forward.
Now we have the luxury of working with the willing, which if you're working within an organization, you don't necessarily have that luxury. You will see as you get into the details of these things, that you need absolutely full cooperation and complete dedication on the part of the practitioners who were doing these evaluations alongside the evaluators.
You can't do this with a partner who doesn't want to be evaluated. It just doesn't work. They are so able to throw monkey wrenches in there if they don't want to find out the answer. Then it's just not worth doing it because it's a partnership like that. It's not someone coming along afterwards and interviewing. It is the practitioners and the evaluators working hand in hand throughout the whole process. And therefore if the practitioners don't want to be evaluated, there's not a hope in hell of getting a result.
We should wrap up. A lot of these things we're going to talk about. But I'll take one more. Yeah?
AUDIENCE: How important, or how relevant is it, or how much skepticism can there be about a case where the evaluators and the practitioners work for the same people or are funded by the same people?
RACHEL GLENNERSTER: Yeah, we've even got practitioners as co-authors on our studies. This is another place where I kind of part company from the classic evaluation guidelines, which say that it's very important to be independent. I'd argue if your methodology is independent, what you want is not independence. You want objectivity. And the methodology of a randomized evaluation can provide you the objectivity. And therefore you don't have to worry about independence.
Now there's one caveat to that. The beauty of the design is you set it up, as I say. Well you don't stand back in the sense. You've got to manage all your threats and things. But you can't fiddle very much with it at the end. The one exception to that is that you can look at subgroups.
So there was an evaluation in UK of a welfare program. And it was a randomized evaluation. And there was some complaining. Because at the end, they went through and looked at every ethnic minority. and then you know I can't remember whether it did work in general. But it didn't work for one minority, or it didn't work. But anyway, you can find one subgroup for whom the result was flipped. And that was the thing on the front page of the newspapers, rather than the overall effect.
So there's a way to deal with that, which is increasingly being stressed by people who are kind of looking over the shoulder and making sure that what is done in randomized evaluations is done properly, which is to say that you need to set out in advance-- we'll talk about this a bit later on-- but you need to set out in advance what you're going to do.
So if you want to look at a subgroup like does it affect the lowest performing kids in the school differently from the highest performing kids-- do I care most about the lowest performing kid-- if you want to do that, you need to say you're going to do that before you actually look at the numbers. Because even with a randomized evaluation, you can data mine to some extent. Well if I look at least ten kids, does it work for them? If I look at least ten kids, does it work for them?
Statistically you will be able to find some subset of your sample for whom it does work. So you can't just keep trying 100 different subgroups. Because eventually it will work for one of them. So on the whole, you need to look at the main average effect. What's the average effect for the whole sample? If you are particularly interested in a special group within the whole sample, you need to say that before you start looking at the data.
So that's the only way in which you get to fiddle with the results. And otherwise it provides an enormous amount of objectivity in the methodology. And therefore, you don't have to worry so much about a Chinese wall between the evaluators and the practitioners, which, I think, is incredibly important. Because we couldn't do the work that we do if we had that Chinese wall. It just wouldn't make sense, doing your theory of change, finding out how it's working, designing it so it asks the right questions. None of that would be possible if you had wall between you. So it just wouldn't be anything like as useful. So getting your objectivity from the methodology allows you to be very integrated with the evaluation, and practitioners to be very integrated.