Experimental Evaluation Design for Program Improvement. Laura R. Peck

Чтение книги онлайн.

Читать онлайн книгу Experimental Evaluation Design for Program Improvement - Laura R. Peck страница 3

Experimental Evaluation Design for Program Improvement - Laura R. Peck Evaluation in Practice Series

Скачать книгу

accompany them, academic programs to train people in evaluation methods. Since then scholars, practitioners and policymakers have increased their awareness of the diversity of questions that program evaluation pursues. This has coupled with a broadening range of evaluation approaches to address not only whether programs work but also what works, for whom, and under what circumstances (e.g., Stern et al., 2012). Program evaluation as a profession is diverse, and scholars and practitioners can be found in a wide array of settings from small, community-based nonprofits to the largest of federal agencies.

      As those program administrators and policymakers seek to establish, implement, and evolve their programs and public policies, measuring the effectiveness of the programs or policies is essential to justifying ongoing funding, enacting policy changes to improve it, or terminating. In doing so, impact evaluations must isolate a program’s impact from the many other possible explanations that exist for any observed difference in outcomes. How much of the improvement in outcomes (that is, the “impact”) is due to the program involves estimating what would have happened in the program’s absence (the “counterfactual”). As of 2019, we are amid an era of “evidence-based” policy-making, which implies that the results of evaluation research inform what we choose to implement, how we choose to improve, and whether we terminate certain public and nonprofit programs and policies.

      Experimentally designed evaluations—those that randomize to treatment and control groups—offer a convincing means for establishing a causal connection between a program and its effects. Over the last roughly 3 decades, experimental evaluations have been growing substantially in numbers and diversity of their application. For example, Greenberg and Shroder’s 2004 Digest of Social Experiments counted 293 such evaluations since the beginning of their use to study public policy in the 1970s. The Randomized Social Experiments eJournal that replaced the Digest beginning in 2007 identifies additional thousands of experiments since then.

      The past few decades have shown that experimental evaluations are feasible in a wide variety of settings. The field has gotten quite good at executing experiments that aim to answer questions about average impacts of policies and programs. Over this same time period there has been increased awareness of a broad range of cause-and-effect questions that evaluation research examines and corresponding methodological innovation and creativity to meet increased demand from the field. That said, experimental evaluations have been subject to criticism, for a variety of reasons (e.g., Bell & Peck, 2015).

      The main criticism that compels this book is that experimental evaluations are not suited to disaggregating program impacts in ways that connect to program implementation or practice. That is, experiments have earned a reputation for being a relatively blunt tool, where program implementation details are a “black box.” The complexity, implementation, and nuance of a program itself tends to be overlooked when an evaluation produces a single number (the “impact”) to represent the program’s effectiveness.

      Box 1.1 Definition and Origins of the Term “Black Box” in Program Evaluation

      In the field of program evaluation, “black box” refers to how some impact evaluations are perceived to consider the program and its implementation. It is possible to evaluate the impact of a program without knowing much at all about what the program is. In that circumstance, the program itself is considered a black box, an unknown.

      Perhaps the first published reference to black box appeared in a 1993 Institute for Research on Poverty discussion paper, “Prying the Lid from the Black Box” by David Greenberg, Robert Meyer, and Michael Wiseman (although two of these authors credit Larry Orr for using the black box term before then). This paper seems to have evolved and was published in 1994 as “Multisite Employment and Training Program Evaluation: A Tale of Three Studies” by the same trio, with follow-up papers in the decade that followed (e.g., Greenberg Meyer, Michalopoulos, & Wiseman, 2003).

      In the ensuing two decades, the term—as in getting inside the black box—has become associated with the idea of understanding the details of a program’s operations. A special section of the American Journal of Evaluation (volume 36, issue 4) titled ‘Unpacking the “Black Box” of Social Programs and Policies’ was dedicated to the methods; and three chapters of the 2016 New Directions for Evaluation (issue 152) considered “Inside the Black Box” evaluation designs and analyses.

      Indeed, recent years have seen policymakers and funders—in government, private, and foundation sectors—desiring to learn more from their evaluations of health, education, and social programs. Although the ability to establish a program’s causal impact is an important contribution, it may be insufficient for those who immediately want to know what explains that treatment effect: Was the program effective primarily because of its quality case management? Did its use of technology in interacting with its participants drive impacts? Or are both aspects of the program essential to its effectiveness?

      To answer these types of additional research questions about the key ingredients of an intervention’s success with the same degree of rigor requires a new perspective on the use of experimentals in practice. This book considers a range of impact evaluation questions, most importantly those questions that focus on the impact of specific aspects of a program. It explores how a variety of experimental evaluation design options can provide the answers to these questions and suggests opportunities for experiments to be applied in more varied settings and focused on program improvement efforts.

      The State of the Field

      The field of program evaluation is large and diverse. Considering the membership and organizational structure of the U.S.-based American Evaluation Association (AEA)—the field’s main professional organization—the evaluation field covers a wide variety of topical, population-related, theoretical, contextual, and methodological areas. For example, the kinds of topics that AEA members focus on—as defined by the association’s sections, or Topical Interest Groups (TIGs), as they are called—include education, health, human services, crime and justice, emergency management, the environment, and community psychology. As of this writing, there are 59 TIGs in operation. The kinds of population-related interests cover youth; feminist issues; indigenous peoples; lesbian, gay, bisexual and transgendered people; Latinos/as; and multiethnic issues. The foundational, theoretical, or epistemological perspectives that interest AEA members include theories of evaluation, democracy and governance, translational research, research on evaluation, evaluation use, organizational learning, and data visualization. The contexts within which AEA members consider their work involve nonprofits and foundations, international and cross-cultural entities and systems, teaching evaluation, business and management, arts and cultural organizations, government, internal evaluation settings, and independent consultancies. Finally, the methodologies considered among AEA members include collaborative, participatory, and empowerment; qualitative; mixed methods; quantitative; program-theory based; needs assessment; systems change; cost-benefit and effectiveness; cluster, multisite, and multilevel; network analysis; and experimental design and analytic methods, among others. Given this diversity, it is impossible to classify the entire field of program evaluation neatly into just a few boxes. The literature regarding any one of these topics is vast, and the intersections across dimensions of the field imply additional complexity.

      What this book aims to do is focus on one particular methodology: that of experimental evaluations. Within that area, it focuses further on designs to address the more nuanced questions what about a program drives its impacts. The book describes the basic analytic approach to estimating treatment effects, leaving full analytic methods to other texts that can provide the needed deeper dive.

      Across the field, alternative taxonomies exist for classifying evaluation approaches. For example, Stern et al. (2012) identify five types of impact evaluations: experimental, statistical, theory based, case based, and participatory. The focus of this book is the first. Within the subset of the

Скачать книгу