PRESS RELEASES
New Directions for Program Evaluation at the U.S. Department of Education

Background Four Types of Program Evaluation

Background

What's the Problem?

The Department spends upwards of $100 million per year on program evaluation and program data collection. Yet key decision makers in the Department, in the Office of Management and Budget, and on Capitol Hill continue to operate without the information they need. At the same time, our evaluation studies are not as helpful as they could be to practitioners at the local level, nor can they answer questions about causation. There must be a better way to conceptualize, organize, and execute program evaluation.

We propose a significant shift in program evaluation, away from a compliance model and towards a system of research and evaluation focused on results and the effectiveness of specific educational interventions.

From Compliance to Performance

The Department has recently developed a business case for OMB for a "Performance-Based Data Management Initiative." This initiative seeks to replace the hundreds of program compliance reports with an integrated electronic system of data collection focused on outcomes. One major goal of the initiative is to reduce data burden on our partners, but we also aim to collect higher quality data, data related to what matters most: results.

These data will tell us whether the education system and its components are performing well, and they might help us understand which of our programs are having the greatest impact, but they will not tell us why. How can we supplement this data management system with program evaluations that give decision makers the information they need to allocate funds, make policy changes, and consider new directions? How can we build on the knowledge base so that practitioners know "what works" and can spend their federal dollars wisely?

Four Types of Program Evaluation

Generally speaking, there are four types of program evaluation, each with their own key audiences, questions and methodologies. We need to develop a balanced evaluation portfolio that embraces each of these four types of studies:

Type of Evaluation Audience Key Questions Timeline Method-
ologies
Continuous Improvement Program
Staff
How can we continuously improve our communication and guidance in order to achieve our objectives? ASAP Market research methods like fast response surveys, focus groups, etc.
Performance Data Appro-
priators/
OMB
Which federal programs are working? Are some programs more effective than others? Annual Analysis using Performance-Based Data Management System
Impleme-
ntation
Studies
Authorizers How well are programs being implemented? Are the policy changes we made leading to improved outcomes? 5-7 Years Passive, descriptive evaluation studies, using methods like self-reported surveys and case studies
Field Trials Practitioners What works? What specific educational interventions lead to increased student achievement? Long-term Random assignment field trials with longitudinal data

Evaluation Type #1: Continuous Improvement

The No Child Left Behind Act gives the Department a mandate to make sweeping changes in federal policy. We are currently launching a national campaign to transform the culture of education. Especially in areas like research-based reading, we need to know immediately if our message is having an impact. We need good information, right away, to fine-tune our materials and approaches.

We also need to know right away if there are any snags at the state or local levels in the implementation of this complex law. Waiting for an annual performance report or a five-year evaluation study is simply too long.

Methods common in market research would be appropriate for this purpose. Fast-response surveys--even via telephone--might solicit quick feedback from target audiences (like reading teachers). Focus groups would allow program staff to probe reactions to communications pieces. Because these data would not be shared externally, but would provide a feedback loop for continuous improvement, speed is more important than rigor.

Evaluation Type #2: Program Performance Data

Every year, Congressional appropriators and the Office of Management and Budget want to know if individual programs are working so that budget decisions can be based on data. The Government Performance and Results Act requires program-by-program performance information for this very reason.

Unfortunately, answering the question, "Is this program working?" is surprisingly hard to do. This is for four reasons:

  1. Many Department programs, like Title I, Perkins, and Adult Education, are not programs, but funding streams. In other words, they are not specific educational interventions, but rather sources of flexible funds that can be spent in a myriad of ways.
  2. New flexibility provisions encourage a blending of resources. For good reason, federal policy encourages States and localities to view federal funds holistically. New provisions encourage our partners to take funds targeted for one purpose and use them for another. This is good policy (and an acknowledgment of the appropriate federal role in education) but it makes evaluating the impact of individual federal funding streams quite tricky.
  3. Linking interventions to impacts takes time. Even for the (relatively few) Department programs that mandate a specific educational intervention, it is very difficult to provide annual impact data. Ideally, studies to track these impacts would be longitudinal and would use field trials with random assignment of treatment and control groups. These methods will not tend to produce annual or nationally representative data.
  4. Isolating the impact of Federal policy or funding (versus state or local efforts) is difficult or impossible. Rightly, federal policy increasingly seeks to coordinate federal programs and approaches with state and local efforts. A good example is Title I, which in effect mandates standards-based reform, a strategy already practiced by many states. It is impossible, then, to attribute outcomes to federal policy changes, versus state efforts.

So with these caveats in mind, how can we provide good information to Congress and OMB on an annual basis? How can we identify program-by-program GPRA indicators that are linked to student outcomes? It depends on the type of program.

Flexible Funding Streams

For large formula-based grant programs, like Title I, we will report:

  • National achievement trends, as appropriate for the program (ideally aligned with the Department's strategic plan indicators)
  • Achievement trends for schools receiving program dollars

The data for these indicators will come from the Performance-Based Data Management System. Already, through the comprehensive database built for us by AIR, we can track achievement trends for every school in America. As the system becomes more sophisticated, we will know which schools receive which pots of federal funds.

Several funding streams will, in effect, be lumped together with specific achievement indicators. For example, Title I, Title II, Title III and Reading First of the ESEA all aim to increase student achievement (through various means). So national achievement trends will be used as indicators for all of them. But the ability to analyze achievement by school will also allow us to find out if certain pots of money appear to be more effective than others. For example, we'll know, over time, if schools receiving both Title I and Reading First funds do or do not outperform schools receiving only Title I funds.

There are many methodological concerns with this approach, but it will give appropriators a rough idea of which formula grant programs are having the largest impact.

Competitive Programs

For the Department's smaller, competitive programs, we will also look at progress in student achievement, but only for the schools or students served by the program. For example, for the American History program we would look at history achievement indicators for schools receiving money under the program.

In many cases, our smaller programs will struggle to find relevant achievement data. We might decide that no performance data is better than bad, compliance-based information.

This approach to annual performance data is not perfect, but is a step in the right direction. To more fully understand the impact of federal programs, or specific educational interventions funded by them, we must use other, longer-term methods.

Evaluation Type #3: Descriptive Studies of Program Implementation

Authorizing committees on Capitol Hill rightly want to know how well various federal education programs and policies are being implemented, and what impact they are having. This information would feed into subsequent program reauthorizations, typically on a five-year cycle.

The bulk of evaluations produced by the Planning and Evaluation Service (PES) are targeted toward this purpose. Their methodologies include a variety of "passive research" designs, usually including:

  • Nationally representative, self-reported surveys of districts, principals, and/or teachers
  • Case studies of representative sites
  • Co-relational analysis

PES is very good at producing these types of descriptive studies. They can provide a wealth of information to Congress about what's happened in the real world since the previous reauthorization, and what role federal program and policy changes have played.

For example, descriptive studies of the Title I program have helped us to understand the challenges faced by state and local agencies when implementing the accountability provisions of the law. They also give some insight into the relationship between the law's requirements and the actions taken by Title I schools.

However, these descriptive studies have failed to provide solid evidence proving a causal link between specific policies or interventions and changes in student achievement. It is impossible for these studies to make that causal link, since they do not include randomly assigned treatment and control groups. (Plus, as explained above, Title I is not in itself an intervention, making a study of its effectiveness difficult.)

Descriptive implementation studies play a crucial role in understanding the impact of policy changes, but they are no substitute for rigorous field trials of specific interventions.

Evaluation Type #4: Rigorous Field Trials of Specific Interventions

Even with high-quality fast-response surveys, annual performance data, and descriptive studies, we still cannot answer the question on the minds of practitioners: "What works?" To be able to make causal links between interventions and outcomes, we need rigorous field trials, complete with random assignment, value-added analysis of longitudinal achievement data, and distinct interventions to study.

This approach might be considered "research" rather than "evaluation." Whatever the name, the Department's evaluation agenda would be incomplete without it. It is a fair use of evaluation dollars because federal program funds are paying for the interventions to be studied.

In some cases, these types of field trials will be able to answer the question, "Is the federal program working?" There are a few federal programs that are, in and of themselves, specific interventions.

In other cases, these trials will instead answer the question, "Does a specific intervention (funded by federal dollars) produce results?" For example, field trials might examine the effectiveness of specific reading interventions, or whole-school designs, or professional development regimes. All of these are funded by large formula programs (like Title I) but are not the program itself.

Who's Responsible for What?

While coordination in program evaluation is essential, each type of evaluation will be assigned to a distinct part of the Department:

Continuous Improvement Studies will be the responsibility of the Program Assistant Secretaries. For example, the assistant secretary for elementary and secondary education will be responsible for developing and implementing market research studies related to Title I and Reading First. These studies are integral pieces of the day-to-day work of the program offices.

Program Performance Data will be the responsibility of the Performance-Based Data Management Team, housed within the Executive Management Team. The Program Assistant Secretaries will play a crucial role, though, in helping to define the performance measures for individual programs.

Descriptive Implementation Studies will continue to be the responsibility of the Planning and Evaluation Service, reconstituted as the "Policy and Program Studies Service." Evaluation questions should be developed in cooperation with the Program Offices, but also with the needs of the authorizing committees in mind. This new office will also serve as an incubator of new policy ideas, and will continue to commission timely policy studies.

Field Trials of Specific Interventions will be the responsibility of a new evaluation unit within the Office of Educational Research and Improvement. OERI will need to develop the capacity to oversee numerous high quality evaluation studies; a regular funding stream should allow it to do so.

Top


 
Print this page Printable view Send this page Share this page
Last Modified: 09/15/2004