Remarks
U.S. Department of State Third Annual Conference on Program Evaluation - Democracy and Governance Track
Washington, DC
June 8, 2010


MR. KULCHINSKY: Good morning, everybody. If we could take our places, and we are ready to begin our second workshop.

Good morning. My name is Yarro Kulchinsky (phonetic), and I am a performance analyst in the Office of Strategic and Performance Planning. It is with great privilege I am able to introduce the second workshop, Evaluating Good Governance and Rule of Law Programs.

Our presenter is Sophia Sahaf, who's the Senior Program Officer with the Millennium Challenge Corporation's Economic Analysis Division. She's responsible for advising countries and implementers on the monitoring and evaluation during proposal development, improving M&E systems, data collection during implementation, and managing impact evaluations.

Prior to joining MCC, Ms. Sahaf performed legal analysis and whistle blower rights in international organizations, assisted with Ecuadorian political parties with organizational development, voter outreach, and managed survey implementation and analysis of the Bhutanese refugees in the Nepal region.

Ms. Sahaf received a Masters of Arts in Law and Diplomacy from Tufts University, Fletcher School.

Welcome.

[ Get Adobe Reader View slide presentation ]

MS. SAHAF: Thank you. Hello, everyone. I'm going to go through the presentation, but I'd like to ask you to feel free to ask for questions as I go through.

I'll talk a lot about different types of methods of different evaluation methods we've used in our different programs and I'm not sure you have a lot of familiarity with experimental, quasi experimental, qualitative methods. So particularly on that, if you have any questions, just let me know as I go, and then if, on the programs, you'd like any more detail. I'm going to keep it pretty brief. I don't want to go into the weeds and bore everyone. So feel free to ask questions on that side, as well.

MR. KULCHINSKY: Sophia, yes, please, if you do have a question, please raise your hand and then if you are at the table, please, if you could just press the button and then if you're not, we have a wireless microphone that we would bring over to you.

MS. SAHAF: Okay.

MR. KULCHINSKY: Thank you.

MS. SAHAF: So I think people have some idea of what MCC's all about, but I'll just give you the brief overview.

Most of our work is on economic growth and reducing poverty through income growth and generating economic growth. A small portion, less than 10 percent of our investments, are devoted to what we call the threshold program and that is our program that focuses on policy reform which, you know, has a broad definition, but over 75 percent of the work under the threshold program is in democracy and governance type activities.

So the threshold program, unlike compacts, has not had as much monitoring and evaluation integrated from the get go. So MCC has been around five or six years and I'd say probably the first two or three years of the threshold programs we didn't necessarily integrate like baseline data collection and data collection would permit us to talk about our outcomes of interest after the program's end. So we've been grappling a lot with how to deal with that retroactively.

MCC, as an institution, has made a commitment to evaluate just about a 100 percent of all of our activities in compact and threshold, which is a very big deal, I think. It's very intensive in terms of resources and just finding the appropriate methods for all of the different types of programs we do, whether it's under our compacts or thresholds.

For compacts, about 50 percent of our programs are under sort of quantitative impact evaluations. So about 50 percent of our compact programs use, like, statistically rigorous methods to evaluate whether or not we achieve our objectives. In the threshold program, it's 5 to 10 percent. So that just gives you an idea of sort of some of the challenges we're confronting in evaluating our programs.

So just I think on MCC, one of the few founding principles that we've put out there a lot is that we focus a lot on measurable results and tracking them and so that goes hand in hand with our commitment to evaluate all of our programs. If you have any more questions on MCC and sort of how we operate, you can ask me, but I'll leave it at that.

So I'm going to talk just about a few of our programs that we've been trying to integrate evaluations into either from the beginning. That will be the Rwanda example where we've sort of integrated an experimental method, quasi experimental method, and then I'll talk about a couple of programs, Tanzania and Zambia specifically, where we're going in after the fact, that's the ex post example, and trying to do the best we can with the data we have there and sometimes it's literally almost no data, so we're really scraping at the bottom of the barrel.

And I think the ex post evaluation �'�' so by making this commitment to do a 100 percent evaluation, I think some people see it and nod their head and think what are you doing, you're going to end up wasting resources, but I think that there �'�' although there definitely are challenges and misses we'll have in terms of what the data tell us, there are also opportunities.

I think Tanzania presents an interesting case, so I'll go into that a little bit more, and in terms of why we've made this commitment to do all these evaluations and why we invest so heavily on doing experimental evaluations, for us the issue is we want to know what works. I think we think that those are enduring questions and that we really don't know a lot about what works and we continue to spend on activities that aren't producing results and we want to change that as much as we can, and we want to know what's cost effective.

So there might be a great program that we spend 50 million on and there might be a great one that we spend a million on. So if we know the difference and why not just devote more to those $1 million type programs? That's a big thing for us.

QUESTION: Sorry. I don't know the difference between ex post and pre post.

MS. SAHAF: Oh, sure, sure, sure. So ex post is just going in after the fact and you haven't established baseline and mid-term and end line data collection and so you're working without much data and so you're kind of going in trying to capture the picture of what did it look like when we started the program three years ago.

We're here in 2010 and we started in 2007, but we don't have anything that tells us what it looked like in 2007. Maybe we have some anecdotes here or there but nothing that's concrete, like survey data or systematic qualitative data collection, you know, like their judicial review indices that an institute might put together that kind of unbundles what the judiciary looks like in a country, 24 criteria.

So it doesn't have to be survey data. It can be sort of this qualitative analysis, but if you don't even have that from three years ago, then, you know, we consider that ex post.

QUESTION: And pre post?

MS. SAHAF: Is when you have the before and after data. So for Zambia, I'll talk about later, we have data on business registration from before the program started to during the program and after.

What that doesn't tell us is it doesn't allow us -- so what experimental -- the difference between experimental and pre post is experimental gives you a kind of factual which is our ideal gold standard.

So if you have before and after data, you know, it gives you some useful information, but it doesn't �'�' but there are a lot of things happening when we're supporting business registration in Zambia. The economy might be growing. It might be going down. They might be upping the staff in their offices. They might be hiring smarter people. There might be a lot of things going on, other than what we do, which might be just to, let's say, install IT systems.

So if we have pre post data, we have before and after data only, we're going to be maybe taking credit for all of the staffing up they did and the training they did and, you know, the economy going up or the economy going down. We might not give ourselves due credit.

So when you have a counter factual that's considered an experimental method and you'd have a counter factual, let's say, if you compare 30 districts to another 30 districts that have business registration offices, 30 that have the business registration offices and 30 that don't. It's kind of a with and without.

So this is, I was just basically telling you already about this sign, which is why we want to do the experimental evaluation. So MCC puts a lot of emphasis on this and I think we've gotten credit for it and we've sometimes gotten criticism that we overly focus on quantitative methods.

I think that isolating the intervention to know what that contributed to increased business registration or reduced corruption or whatever your outcomes of interest are is key for finding out what's cost effective and what works or doesn't work. So I think that's �'�' for us, in order to answer the what question, like how much impact you had, that's really the only way to do it and a reliable way. Everything else is going to be secondary and potentially misleading or biased.

So for us, that is a gold standard. I think integrating qualitative methods is a great way to go, so you can find out a little bit more about the why questions, but if you want to know what, then you want to have a counter factual, and I think in the field of democracy and governance, there's a dearth of evidence. I think that is probably a widely accepted premise.

So this is just one slide that gives you three examples of three different impact evaluations that use counter factuals and each of the projects kind of try to use specific participation and engaging communities to improve their outcomes measures.

So in Uganda, they're trying to improve the immunization rates, reduce mortality rates. In India, they're trying to improve education outcomes, like teacher quality, teachers in the classroom, etcetera. In Indonesia, they're trying to reduce corruption at the district level. So they're trying to get district government bureaucrats to basically spend the money where it was supposed to be spent.

All three programs use community participation as their means of getting to the three different goals and they show conflicting outcomes. So in Indonesia, we found, Ben Olken, the guy who did the research, found that the increased participation had no effect on corruption. It had an effect on getting attention to worker wages, but it had no effect on corruption and differences in spending. Audits made a difference, having top down audits done by the central government, I think it was, but not the community participation.

In India, we also found community �'�' they found that community participation didn't have an effect. In Uganda, they found that it did. So I think people often talk about community participation and civic engagement as a panacea or as like that's the way to go, and it might be for a lot of projects. This just shows you, though, that it's not a sure shot. It's not �'�' I mean, it shows you it's not a sure shot, that's for sure, but it doesn't �'�' it tells us, too, that we need to gather more evidence to get a better sense of what are the processes to get to where we want to go.

Yes?

QUESTION: These three are not all MCC's?

MS. SAHAF: None of these are MCC. These are all �'�'

QUESTION: Thank you.

MS. SAHAF: Yes. None of these are MCC. I don't even know if they were done �'�' I know India and Indonesia, I think, were just government projects that like these MIT researchers were partnered up with.

So now I'll just talk about one of our programs where we've had the benefit of thinking from the beginning about how to integrate an evaluation method.

So in Rwanda, we spent about $25 million on various programs trying to increase civic participation and engagement, in particular working at the district level with local government staff and citizens and civic groups, to try and get them to work together more, so that they can sort of prioritize how the budget should be spent.

That's a goal, is to really focus in on that, and so the programs are going to last anywhere from two to three years and so we're doing that district level government and civic engagement work and then we're also working with police throughout the country to train them on working better with the communities and on sort of improving their internal affairs, so like case management, actually like taking action against police officers where they have cases where they're found to have made, you know, abuses or things of that sort.

So the Rwanda program gives you a really easy opportunity to do an evaluation because we're working at the local level. So any time you're working on a decentralized project, you can have a �'�' you have an easier way to do a counter factual.

So here we can't do every �'�' we can't support every district government in the first year of the three year program. So we just have to phase up, practically speaking, you know. The contractors can't get everywhere in the first year. So because of that, we're just able to say, all right, let's just randomly select in clusters where we go first.

So we split it up, I think, you know, basically in seven clusters. So we're working in about four districts in seven different waves and so what that will let us do is just in year one, we'll work with one set, in year two another set, in year three a third set. We're going to do like two, two, and three or three, two and two, something like that, in terms of clusters of districts, and then we'll be able to track them over time and sort of the districts that we don't support in the first year will be our counter factual to the districts we support in the first year.

And so we'll be able to take the survey data we collect on various outcomes of interest, like, you know, whether or not citizens are engaging on the budgets, whether or not the budgets are reflecting the priorities that are, you know, spoken up about in the town hall meetings and things like that, and so then we'll get an idea and be able to tease out, all right, so what did our interventions contribute to.

Does that make sense? Yeah? All right. So we had adaptive project realities. I mean, in our ideal world, we would decide you start in this district one, then we'd randomly select one to two to three all the way up to 30 but we couldn't because we needed to do it in some sort of clusters for cost reasons and just for the way the program was designed. It wasn't only cost basis, also just sort of how they wanted to roll it out. They wanted to pair like low performing districts with high performing districts and so we were able to adapt to that.

It takes away a little bit of the statistical validity, but it keeps a lot of the rigor. So I think that's a good example of getting in early on and being able to integrate it.

For the police, they're doing complaints boxes, like 200, all over Rwanda, which is extremely densely populated, and so we're just going to be able to sample people around the complaints boxes and then people further away and sort of do like interesting comparisons for them.

So now Tanzania is an example of where we went in after the fact and the program didn't collect very much data at all. It was $11 million. It was a two year program, ended about two years ago.

We started evaluating it in late 2009, so about a year and a half after everything had closed up. None of the projects have continued. You know, everything is shut down.

Implementers are gone. The USAID program manager is gone. No one's there. Okay? So actually the program manager is there, he's just reviewing now. So the program was trying to reduce corruption, improve Rule of Law. We worked with prosecutors, investigators. We worked with media to get the report on corruption. We also were trying to get civil society to do more procurement expenditure tracking surveys and sort of at the local district level. It's kind of similar to some of the work in our audit but not. It's a parallel. It's not the same.

Then we were doing audits of about 20 central government procuring entities to find out, you know, what errors they're making in their procurements and that kind of thing.

So here we had one piece of data that was useful. The program, they had an indicator that was number of corruption articles. So what they did is they collected all corruption articles in like 10 different main newspapers in Tanzania. So I was just there and they have binders this thick with articles. They have like about 10 or 15 binders with just articles like this.

So you can find some little pieces of treasure. You know, I never would have thought those would have ended up being useful but they have, and then I'll tell you a little bit about other data we found, but I didn't mention this before, but our goal is to �'�' so my boss, our chief economist, he's always saying that we have to reject the null hypothesis.

So we have to go in assuming we have had no effect and prove that we've had an effect. He's very big on that and I think it's a right instinct since I think people have a tendency to take more credit than is due. So it's probably a good counter action.

So what we were able to do with the media is we're able to do like a quality analysis of articles by journalists that we trained and so we have data. We know exactly when they're trained and by who in terms of months and we have articles from people that were trained and articles from people that weren't trained because they were just tracking all corruption articles all throughout the country and just a side note. I mean, they're taking credit for all of the articles on corruption which is not �'�' which we shouldn't have done.

So I'm glad we tracked articles, not that they took credit for the articles.

QUESTION: Social norms. I mean, if there's more articles produced in papers that might have an effect on other journalists. If they read other journalists

MS. SAHAF: Spillover effect. Yes, there are spillover effects.

So there are ways to isolate those. I think here it's really short periods of time. So I think it's more the environment was more permissive of reporting on corruption. So actually, one thing I know right now, we're still getting the report out. It's going to take another few months, but they don't really see an effect on numbers of articles that were produced.

With one trainer they found an effect on numbers of articles but that's one out of, I don't know, several trainers and they haven't done a quality analysis and so that's one where I think that you're going to see less of a spillover. That will be more a direct �'�' I think it's, you know, theoretically much more clear that that would �'�' you can link that more closely to just the training and isolating it to that.

So we'll have the journalists we didn't train but we did train over time and takes a lot of use of data but then we'll be able to come up with some information on the quality. I think on quality, I think there will naturally be questions about how we're going about assessing quality of articles.

So we basically create an assessment tool. IREX has done some work on putting together assessment tools, too, and so what we have is we have these enumerators that we've trained. It actually takes a lot of work. You really have to train them and make them do a lot of piloting of use of the instrument and see how they're using it next to articles and then you have to double up, so everybody has to �'�' I think they have two or three people reading every single article.

So this one, that was a pretty simple way to go about it. That's just to give you an idea of, you know, going after facts, you face challenges but you can find some of these opportunities.

With the procurements, we had the audits done by our contractor during the program and then the agency that does procurement regulation in Tanzania, they also do follow up audits. So they track the same measures as our implementer did. So we're going to compare and contrast sort of what the follow up audits shows have been taken care of or not by the procuring entities.

I think the challenge here is data quality and so we're still struggling how to determine �'�' like to basically work with people to find out how much faith we should put in the data from the procurement agency. I always worry about data quality. So that's one of the challenges we face, but that's the method we're using there.

With the police and prosecutors, the Anticorruption Commission there does a great job of collecting data. They collect it on regional level and national level. They collect from the data that comes in, a report comes in, to when it's filed, to when they start the investigation, and all the way up till sentencing for any crimes that are handed out �'�' any sentences that are handed out to those that are convicted or not convicted and so because of that, we can do sort of what we're doing with the journalists, sort of looking at the time series of when the cases and investigations and prosecutions, etcetera, are happening and how they intersect with our training times and we can do it by region, as well, because we have the regional data.

Yes?

QUESTION: When you're depending on either the implementer to feed you that, any nuggets on how you pursue your data quality analysis?

MS. SAHAF: When we're going to get data from our implementer or the Anticorruption Commission?

QUESTION: If it's somebody other than you who is feeding you the data, any lessons learned on how you verify what they're reporting is what you really wanted to know or the data quality that you want?

MS. SAHAF: So I think with �'�' I'm trying to think with �'�' I feel like it's been clear in a couple cases when we're not going to have confidence in the data and times when we will.

So in one of our country programs, they're only sharing summary statistics with us. They won't let us go in the database and see how it's collected and the like and that raises eyebrows. Like I just right there, if you can't go to the source, that's a problem. So I would take it with a lump of salt.

If they show you the data source and allow you to see sort of how, you know, their mechanism for cleaning the data, sort of their quality checks and the like, I mean that's the best you can do. That's the data quality, basically going in and seeing how they handle it, how they manage it, like what the management information system looks like, and so with the PCCB, they've been �'�' that's the Anticorruption Commission in Tanzania. They've been really open with us and the data doesn't always show great things.

So I understand still having skepticism, but I think that's the closest you can get to it.

QUESTION: I think that people who are collecting the data that aren't us at least know what factors that we want to have.

MS. SAHAF: Yes, go ahead.

QUESTION: And I'm wondering if you

MS. SAHAF: Well, I do think the first thing is if they don't share with you their tools or instruments or system that's a red flag.

In Zambia, for example, I'll talk about the baseline survey they did, I can't give it that much, like we barely are going to use it in our evaluation report because we don't know much about it or things like that. So I think if you can get information on those basic facts about how they clean it, organize it, manage it, and they let you into it, you can get confidence from that, but you have to basically do a data quality �'�' that's doing a data quality review. It takes time. So that's Tanzania.

So the last program I'll talk about is our Zambia evaluation and here we've had the most challenges. It's ex post, very little data, $23 million program, ended a year and a half ago, again nothing's continued, nothing's staying on, none �'�' only one of three program managers are around, lack of access to people, things like that.

We are trying to reduce administrative corruption. We worked on a lot of IT reforms and governance reforms here. We probably provided like IT and networking and the like to four different entities and then we were working on business registration, streamlining the business environment, worked with Customs, as well, and so this just gives you an idea of how all over the place the program is.

We have 10 different ministries, departments, and agencies that we worked with over a two year program with $23 million, and then we had Comonics is our main and three subs and, you know, getting in touch with everyone was a challenge, but just the idea of evaluating all those projects is really hard after the fact if you want to do a good job.

So we narrowed it down. We prioritized. So we basically decided to pursue where we had heavier investment, had a better opportunity to talk reliably about results, and then the last thing was where we thought there actually might be a chance to pick up on the effects versus where we could say pretty easily we don't think there's an effect.

So we weren't trying to cherry pick where there were effects. We were just trying to say, all right, here we can say we don't think there's an effect, so there's no need to really go that much further, let's focus on the places where there's a question that we answered. So no counter factual, no baseline. There is one baseline.

So the business registration program was basically to create several regional one stop shops for business registration. Right now �'�' before the program, they only had one. It was in the capital, Lusaka, and we were going to automate procedures in Lusaka and the new centers we built. We ended up only doing two of the centers, regionally and Lusaka, and so we wanted to improve transparency, never really defined improve transparency. So it was kind of hard to evaluate whether or not we did, other than to say there's information on the management information system.

The findings were that there was a decrease in registration time, not the goal of getting it to one day but there was a decrease from like 30 to 15, more or less, and people are more satisfied with the services.

So the way we went about doing this, this was the best case out of all the Zambia programs. So we basically just did exit surveys, like small ones, 40 people in the one stop shopping in Lusaka, and just asked them, you know, a series of like 15, 30 questions, probably 20 or so questions, and then there are bulk filers, people that �'�' firms that basically are the interlocutor, people go to them and say I want to register my business, and they handle huge amounts and they have a relationship with the one stop shop and we did anecdotal collection.

In the baseline, Comonics did a survey of applicants and they did surveys for various different institutions, but we never were able to see the survey instruments, so we don't know what the questionnaire had.

We saw the final report, but they didn't have the sample for things like that. So we have a baseline but it's not really good. So we basically relied on recall and the survey. So what we found is, you know, basically 75 percent or so said that they were able to do it in less than two weeks, 80 percent, I guess, and compared to the other places, yeah, these were the people in Lusaka.

In Livingston, we did a survey in one of the regional offices. We said, you know, 60, 75 percent, I guess. So on the earlier slides, I'm more comfortable talking about the results that we found in the surveys and saying that our program led to them because it was one stop shops that never existed before and so talking about increasing business registration time using recall, I'm much more comfortable with that than with something like this where you do a before and after and so these two lines just represent the two services the one stop shops provide, registering companies and registering names.

So you see a pretty dramatic increase with the business names registered and because our program ended in 2008 and also with the companies registered, but again here's where, you know, the Zambian economy is booming, perhaps, more staff.

I mean, there's too many confining factors for us to be able to take credit for that, and I know the reports would take credit for that, like the implementer reports, and I just don't think that's fair. So I think that's spurious and so I'm only putting it up there for that reason.

So as I said, we don't have a great baseline but what it did save was 28 days and what our small, small survey said was less than 80 percent, so less than two weeks. So that, like I say, I feel more comfortable with.

We have now, our other projects is, where I think it was a little bit more challenging because now we're working with established institutions. We're not creating anything brand new. We're automating procedures and we're doing this e governance reforms in the Ministry of Lands, the Revenue Authority, Department of Immigration. We're building capacity as the Anticorruption Commission.

We also did an e governance reform with the Customs people, too, and I think we basically are finding mixed results to negative results in most of them and so business registration, I think, is one of our positive findings, the Revenue Authority more positive, and the rest are mixed to negative and so we basically just did some small exit surveys for all of the institutions and I think that's an easy way to go in and do some after the fact evaluation work.

If you have clients of an agency and you're working to help the agency serve their clients better, it's pretty cheap, too, and it's a good way to get some gauge on how things have or haven't improved, but when it comes to asking about corruption, this is a tough nut to crack because nobody wants to talk about corruption or answer your questions about corruption and so here it's a wash.

How would you compare levels of corruption now to before the changes for immigration? Fifty percent say better but then the rest say no opinion or worse. I mean that doesn't tell you anything. But when you ask about customer care, they'll tell you it's better. So we weren't able to find out information or outcomes metrics, whether or not we reduced corruption, but, I mean, I don't even know if I'd call that a proxy. That's not a proxy, but we were able to find out other sort of ancillary effects.

The same with land here, we didn't really get it's a wash, the question on corruption, and then on application processing times, we were able to get a little bit better information.

So at the Revenue Authority, we asked about bribe seeking and they answered here which is surprising. I'm not sure what that's about, but that at least gives me a little bit more confidence in that and that one is a bust.

So I would just talk a little bit about our lessons learned. So I think the reason why we do our evaluations, like I said, we want to know what's most cost effective. We want to learn lessons. We want to hold ourselves accountable, etcetera.

So we found our programs are a little bit too short. So in the future and in the last year or so, we've been doing three year programs. Before, thresholds were always two years.

Did you want to ask a question?

QUESTION: To what extent do you set targets and how do you measure those targets?

MS. SAHAF: Yes, I think it's really important to set targets.

QUESTION: I mean, for example, you're talking about building capacity. What does that mean?

MS. SAHAF: I agree. I don't think it means much. I think the challenge we face is getting implementers and program managers to focus on identifying like what outcome do you want and how are we going to measure it and it doesn't have to be a number. It can be �'�' like I was mentioning, there are indices, there are qualitative analyses that are done in structured ways and you can use those. It will be a little bit more complicated, but you can use those.

We force them. I mean, we force it so it happens, and we've been doing that for the last few years, but before that it wasn't being done.

QUESTION: Does that mean you're still having a problem getting implementers to do this or is it generally accepted that MCC is going to require this and you'll do it?

MS. SAHAF: Oh, well, one thing we do is �'�' so our programs are designed in a slightly different way probably than State or AID or I'm not sure who else is here, but a lot of other programs.

So the Government of Zambia will come to us and say here's our proposal for a program. Here are the activities and the results we want to achieve and they might not do a good job or they might do a good job of identifying sort of their targets and whatever they send to us, we work with them to sort of establish here's the activity, here's the goal, and how are we going to measure it and track it.

So we're able to tackle it, I'd say about the half the time, in what way, but half the time we're not. I mean half the time it's just not fully fleshed out enough to know. I mean, it wouldn't even be smart because then you'd create a target you're stuck with and it doesn't even make sense once you find out more information.

But I'd say it's always better to have some targets early on and then say, all right, we'll modify them because if you don't have higher level targets from the get go, if you don't have higher level outcome targets from the get go, it's going to be really hard to get anyone to say I'm going to track anything other than outputs because it's against my manageable interests. I can't tell you how many times I've heard that phrase. It's against my manageable interests.

So I think even if you're setting up �'�' in my opinion, I'd rather set up not so fantastic targets from the beginning and then modify them once you begin implementation with better information, better data, and the like, but if you don't do it, you're going to have a really rough battle and you're going to lose some of them.

QUESTION: You're signing the checks. Why is this an issue? I mean, you all were established with a totally different style of mandate.

MS. SAHAF: It was a totally different style of mandate from the other agencies.

QUESTION: I mean, I think that your mandate was use performance, to not even begin until there was a level of performance that was an acceptable level and go from there to build upon it.

Now I think one of the major mandates that you were constructed with was the idea that there was going to be an ongoing assistance program at a level that existed before you walked into a country, that you were an additionality, which has not turned out to be the case, but that's a political debate there's no use getting into here. But, you know, there is a compact process. You don't give away money until a country has a level of performance

MS. SAHAF: With the compacts, that's true.

QUESTION: (Inaudible), and then you sign on and there's this whole process. So I'm a little �'�' are you saying that this difficulty in establishing parameters is true at the compact and the threshold level?

MS. SAHAF: No.

QUESTION: So it's only the threshold level?

MS. SAHAF: Yeah. Okay. So is that �'�' are those the two questions?

So I think that, yes, at MCC one of our core principles is focusing on results that are measurable and that we will then measure. I completely agree with that. I'd be hesitant to say that other agencies don't have that within one of their core mandates, but I don't know that, as well.

I'd say that the threshold has followed a very different path from compacts. I think the level of detail �'�' for example, on budgets, compacts, your activity budget, your �'�' so you have components. So let's saying they're working on budget registration and then you'll have the activities they're doing, e governance and capacity building, etcetera.

You'll have budget items for all of those coming in every quarter, every �'�' on a semi-annual basis. For threshold programs, the only budget we get is how much an implementer contractor �'�' an implementer has spent each quarter. So Comonics did work with all of those 10 different agencies. We have no idea how much is spent on each agency in terms of dollars.

QUESTION: Can I just jump in a little bit since I worked �'�' I'm sorry. I didn't see you.

QUESTION: That experience was a little bit different in Uganda where I helped implement the threshold country program. In fact, the threshold country plan which produced the program did have component by component activity by activity budgets which then went to you, then to USAID which then contracted it out the indefinite quantity contract request for task order proposal.

MS. SAHAF: Beforehand.

QUESTION: That's correct. And then, you know, the implementing organization, my former employer, we were responsible for actually producing both technical and cost proposals that reflected as closely as possible the initial intention as laid out by the Government of Uganda in its threshold country plan.

But I think what's really important to do is just take a step back from the sort of implementation modality to appreciate that, especially with the threshold country programs that are trying to move the World Bank Institute Control Corruption Indicator, you are dealing with a performance measure that the authors themselves have said is not ready for prime time.

Danny Kaufman said do not use this for such high level decision making as whether or not a country now graduates from threshold into compact status.

The other thing to appreciate is that you're dealing with trying to instigate or to catalyze institutional reform in countries where you have the, if you will, shell of a structure of national integrity systems which include a lot of these agencies that were described in your case studies here, Anticorruption Commissions, your supreme audit institutions, your Revenue Authorities, and court systems, national police forces, that are ostensibly all supposed to work together in, you know, harmonized fashion and they don't and that's not because of lack of capacity.

It's because of political interest and it's because that we're dealing with horrendously challenging situations where you have, you know, not so democratic governments that are trying to get lots more money and they have an incentive to play ball and at least, you know, promote the appearance that they are performing and responding to those incentives when, as you know, it's very interesting to me in this presentation, a year and a half or two years later when you all show up, there's not much left from what the initial project tried to support and/or build up from scratch. I think that that's an important thing to look at.

The last thing I'd say is, again back to implementation modalities, it's good to see MCC is now more consistently moving from the two year to the three year program, but I still �'�' my general impression, having worked in these projects and now stepping away a bit, is you are taking what the World Bank would do in their public financial management reform project, you're taking what they would do in five to 10 years, three, four, five times as much money, and you're condensing it into a three year project for a fraction of the funding and so to me it's not surprising that results have been uneven and that it has been very difficult in fact to measure.

I'm sorry. Last point. From an implementation perspective, you have a right, as long as you're able to work with your USAID partner in country, to access the performance management plan because

MS. SAHAF: Oh, we have them all.

QUESTION: Okay. Yes, because that should provide all the details that you're saying �'�' well, that's on the contractor. That's a shame.

All right. Thank you.

MS. SAHAF: It often doesn't, yes.

QUESTION: Actually, you picked up on most of the points that I was going to make.

There's only two things I would add is in a lot of cases, the threshold programs were in fact follow-ons to existing AID programs. So even though you might think that they're two years, the World Bank would do five or 10. In fact, elements of them have been around for five or 10 years before in previous AID programs. So there was some degree of continuity.

I'd say the key question, however, is sort of what's the purpose of the threshold program. It was an add on to the original intent of the MCC. It is only 10 percent of the MCC's business. So what you're looking at is really a small fraction and the purpose of the threshold program was allegedly to get countries eligible for the compacts, and the problem with that is that you then have to use the control of corruption indicators but in fact those are much more dominated by relative movements rather than any program effectiveness.

So right off the bat, you have sort of a conceptual problem at its core that people didn't really feel comfortable talking about, but I think with some experience the MCC has come forward and been relatively clean about, gee, this program didn't have the effect we thought it might have, but it was otherwise maybe not a bad program, maybe not a good program, and you could certainly take all the follow on questions and say continue it, don't, do something else.

MS. SAHAF: Could I just respond to one thing because I do think it's important to make a distinction between how MCC operates in compacts versus thresholds because the compacts have a really robust monitoring and evaluation system and it's not just on budget figures. I think that's just an indication of somewhere where it shouldn't be so difficult to get data and we don't.

But they have, you know, hundreds of indicators on the various �'�' I don't even think lots of indicators necessarily is the point but just the right types of indicators to be able to �'�' so that we can say how many actors we're working on or how many roads in kilometers and income is related to those, to the populations around those roads, and they have integrated data collection on, you know, whether it's survey data to get at the kind of factuals and then monitoring data, whether it's on indicators.

So I just think that's important, you know, that threshold is the anomaly of MCC in terms of that commitment to tracking.

MR. KULCHINSKY: Last question. Virginia from USAID.

QUESTION: Hi. My name is Virginia Lambert from USAID, and I just wanted to say that I think your presentation underscores the importance of developing strong indicators for especially some of our newer programs.

I come from a health background. It's real easy to measure relatively two things like governance and so forth, democracy building, and things like concepts like transparency. So I think we need to unpack some of these concepts and try to get to the measurable components and then build some indicators and we need to continue to improve the skills of people to create those indicators and to come to consensus with those.

Second point is I was just at the presentation next door, and that was on a retrospective of USAID's EGAT, Economic Growth and Trade Programs, and they did a large scale retrospective filtering out from over a couple thousand programs, mainly USAID and some other USG, and it's interesting because they actually liked the data that they found in a lot of the reports.

MS. SAHAF: From the threshold program?

QUESTION: From the USAID programs that they looked at and they were impressed actually, and I thought that was interesting.

MS. SAHAF: Reporting from whom?

QUESTION: USAID's projects.

MS. SAHAF: So I think we're going to find there are going to be conflicts between implementer reports and USAID reports and our evaluations and I think part of it might be, but it's the point in time at which you're going in and I think part of it will be there's an inherent conflict of interest.

QUESTION: Yeah. I think the point was that she said that the monitoring data that comes from quarterly reports that is vetted from AID's AORs and COTRs, you know, builds up to the annual report and so they were actually happy with those and also AID requires data quality assessments on any indicators sent to Washington.

MS. SAHAF: I question the quality always of all of the indicators, but there is a DAQ assessment every three years.

QUESTION: Yes, but there's also interesting some of the methods that they were using to look at many of �'�' because it's difficult to look at a whole bunch of programs at once, but they developed some simple tools to look at those.

MS. SAHAF: Yes. We tried to do a cross country comparison and came up thinking that we weren't able to dig deep and really get at granularity to give us tangible information about what did or didn't change and so we didn't find success in that but we tried it once. That's not representative.

MR. KULCHINSKY: With that said, and also we want to thank you very much, Sophia.

And also Jack Molyneaux, the other presenter, called. He had to unexpectedly travel overseas over the weekend. So he apologized that he couldn't present.

So, Sophia, thank you very much.

(Applause.)

MR. KULCHINSKY: And I'd invite you now to break for the lunch and the networking break.

Thank you.