Experimental Evaluation Design for Program Improvement. Laura R. Peck

Чтение книги онлайн.

Читать онлайн книгу Experimental Evaluation Design for Program Improvement - Laura R. Peck страница 4

Experimental Evaluation Design for Program Improvement - Laura R. Peck Evaluation in Practice Series

Скачать книгу

kinds of evaluation models, which I classify here as (1) large-scale experiments, (2) nudge or opportunistic experiments, (3) rapid-cycle evaluation, and (4) meta-analysis and systematic reviews.

      Large-Scale Experiments

      Perhaps the most commonly thought of experiments are what I will refer to as “large-scale” impact studies, usually government-funded evaluations. These tend to be evaluations of federal or state policies and programs. Many are demonstrations, where a new program or policy is rolled out and evaluated. For example, beginning in the 1990s, the U.S. Department of Housing and Urban Development’s Moving to Opportunity Fair Housing Demonstration (MTO) tested the effectiveness of a completely new policy: that of providing people with housing subsidies in the form of vouchers under the condition that they move to a low poverty neighborhood (existing policy did not impose the neighborhood poverty requirement).

      Alternatively, large-scale federal evaluations can be reforms of existing programs, attempts to improve incrementally upon the status quo. For instance, a slew of welfare reform efforts in the 1980s and 1990s tweaked aspects of existing policy, such as changing the tax rate on earnings and its relationship to cash transfer benefit amounts, or changing the amount in assets (such as a vehicle’s value) that a person could have while maintaining eligibility for assistance. These large-scale experiments usually consider broad and long-term implications of policy change, and, as such, take a fair amount of time to plan, implement, and generate results.

      This slower process of planning and implementing a large-scale study, and affording the time needed to observe results, is also usually commensurate with the importance of the policy decisions: Even small effects of changing the tax rate on earnings for welfare recipients can result in large savings (or costs) nationally. Although we might hope for—or seek out—policy changes that have large impacts, substantial, useful policy learning has come from this class of experimental evaluations (e.g., Gueron & Rolston, 2013; Haskins & Margolis, 2014). For example, the experimentation that focused on reforming the U.S. cash public assistance program was incremental in its influence. That program’s evaluation—Aid to Dependent Children (ADC) and Aid to Families with Dependent Children (AFDC) from 1935 until 1996 and Temporary Assistance for Needy Families (TANF) since then—amassed evidence that informed many policy changes. Evidence persuaded policymakers to change various aspects of the program’s rules, emphasize a work focus rather than an education one, and end the program’s entitlement.

      Nudge or Opportunistic Experiments

      In recent years, an insurgence of “opportunistic” or “nudge” experiments has arisen. An “opportunistic” experiment is one that takes advantage of a given opportunity. When a program has plans to change—for funding or administrative reasons—the evaluation can take advantage of that and configure a way to learn about the effects of that planned change. A “nudge” experiment tends to focus on behavioral insights or administrative systems changes that can be randomized in order to improve program efficiency. Both opportunistic and nudge experiments tend to involve relatively small changes—such as to communications or program enrollment or compliance processes—but they may apply to large populations such that even a small change can result in meaningful savings or benefits. For example, in the Fall of 2015, the Obama administration established the White House Social and Behavioral Sciences Team (SBST) to improve administrative efficiency and embed experimentation across the bureaucracy, creating a culture of learning and capitalizing on opportunities to improve government function.

      The SBST 2016 Annual Report highlights 20 completed experiments that illustrate how tweaking programs’ eligibility and processes can expand access, enrollment, and related favorable outcomes. For instance, a test of automatic enrollment into retirement savings among military service members boosted enrollment by 8.3 percentage points from a low of 44% to over 52%, a start at bringing the savings rate closer to the 87% among civilian federal employees. Similarly, waiving the application for some children into the National School Breakfast and Lunch program increased enrollment, thereby enhancing access to food among vulnerable children. Both of these efforts were tested via an experimental evaluation design, which randomized who had access to the new policy so that the difference between the new regime’s outcomes and the outcomes of the status quo could be interpreted as the causal result of the new policy. In both cases, these were relatively small administrative changes that took little effort to implement; they could be implemented across a large system, implying the potential for meaningful benefits in the aggregate.

      Rapid-Cycle Evaluation

      Rapid-cycle evaluation is another relatively recent development within the broader field of program evaluation. In part because of its nascency, it is not yet fully or definitively defined. Some scholars assert that rapid-cycle evaluation must be experimental in nature, whereas others define it as any quick turnaround evaluation activity that provides feedback to ongoing program development and improvement. Regardless, rapid-cycle evaluations that use an experimental evaluation design are relevant to this book. In order to be quick-turnaround, these evaluations tend to involve questions similar to those asked by nudge or opportunistic experiments and outcomes that can be measured in the short term and still be meaningful. Furthermore, the data that inform impact analyses for rapid-cycle evaluations tend to come from administrative sources that are already in existence and therefore quicker to collect and analyze than would be the case for survey or other, new primary data.

      Meta-Analysis and Systematic Reviews

      The fourth set of evaluation research relevant to experiments involves meta-analysis, including tiered-evidence reviews. Meta-analysis involves quantitatively aggregating other evaluation results in order to ascertain, across studies, the extent and magnitude of program impacts observed in the existing literature. These analyses tend to prioritize larger and more rigorous studies, down-weighting results that are based on small samples or that use designs that do not meet criteria for establishing a causal connection between a program and change in outcomes. Indeed, some meta-analyses use only evidence that comes from experimentally designed evaluations. Likewise, evidence reviews—such as those provided by the What Works Clearinghouse (WWC) of the U.S. Department of Education—give their highest rating to evidence that comes from experiments. Because of this, I classify meta-analyses as a type of research that is relevant to experimentally designed evaluations.

      Getting Inside the Black Box

      Across these four main categories of experimental evaluation, there has been substantial activity regarding moving beyond estimating the average treatment effect to understand more about how impacts vary across a variety of dimensions. For example, how do treatment effects vary across subgroups of interest? What are the mediators of treatment effects? How do treatment effects vary along dimensions of program implementation features or the fidelity of implementation to program theory? Most efforts to move beyond estimating the average treatment effect involve data analytic strategies rather than evaluation design strategies. These analytic strategies have been advanced in order to expose what is inside the “black box.”

      As noted in Box 1.1, the black box refers to the program as implemented, which can be somewhat of a mystery in impact evaluations: We know that the impact was this, but we have little idea what caused the impact. In order to expose what is inside the black box, impact evaluations often are paired with implementation evaluation. The latter provides the detail needed to understand the program’s operations. That detail is helpful descriptively: It allows the user of the evaluation to associate the impact with some details of the program from which it arose. The way I have described this is at an aggregate level: The program’s average impact

Скачать книгу