 |
|
Evaluation Primer
Observing Behavioral Outcomes and Attributing Changes to the Program
Many schools and community agencies will consider process evaluations to be sufficient for program assessment. They view their primary responsibility as delivering services in an appropriate and efficient manner. They use evaluation to determine whether those goals are being met and to improve service delivery where needed. But other schools, districts, and community agencies will want to find out whether their programs are effective for clients, whether they "make a difference" for recipients of services and others in the community. They will want to know whether a set of interventions changes student behavior or other indicators in order to decide how to proceed with their program. Outcome and impact evaluations identify changes that have occurred and analyze the changes to determine whether they are attributable to the program, that is, whether the changes would have occurred without the program activities.
Demonstrating that changes in behavior occur as a result of a program is not always simple, because behaviors are likely to change over time. Children in programs mature, social norms change, new variables are introduced and others are removed. These conditions and many others can affect participation or behavior rates independent of the effects of any specific activity (also called a "treatment").
The first step in conducting an assessment of out-comes is to decide exactly which changes are most important to measure. Demonstrating that changes in behavior occur as a result of a program is not always simple, because behaviors are likely to change over time.
Once the specific outcomes have been identified, evaluation planners must explore how the assessment will relate changes in those outcomes to a service or package of services. There are a number of possible approaches, some of which are relatively simple and inexpensive, while others are ambitious and require considerable resources and planning. Some approaches focus exclusively on participants, while others compare participants with similar persons or groups.
Assessment of treatment group only
There are a range of possible comparisons that examine only the treatment group. The most common method is to collect data from the participant group both before and after the treatment. The responses would be compared for evidence of changes in knowledge and behavior. An even more limited inquiry might focus on one part of the program for example, a two-week instructional/skill-building component with pretests and posttests about knowledge or social skills before and after an instructional unit. This method would help the teachers to know whether students learned the information and skills.
One-group pretest/posttest approaches are relatively inexpensive and easy to administer, but their main drawback is that it may not be possible to attribute changes in outcomes to the treatment. One way to address this concern might be to compare changes in a treatment group to changes in some generally available local "standard" of change in knowledge or behavior.. Of course, if students in a single school or district are not "typical" of students in the state, comparisons with that standard may be inappropriate. Local programs may emphasize topics or behaviors different from those indicated in statewide surveys. National trend data can also provide a point of comparison. Year-to-year changes in knowledge or behavior among students might be compared to changes reported in national surveys.
Assessment of treatment and comparison groups.
A more rigorous way to determine the effects of a treatment is to compare the performance of those who receive the treatment with similar persons who do not receive it. Such persons form a comparison group. Their knowledge, attitudes, or behavior are measured over the same interval as that of the participants. One of the best ways to ensure that participant and non-participant groups are comparable is to assign people randomly to the treatment group and the comparison (or "control") group. This procedure reduces the possibility that the treatment group is different from the comparison group in a manner that can affect program outcomes. This procedure is commonly used in testing the efficacy of new medicines, but it is hard to accomplish in education programs. In a school, denying a new or potentially more effective treatment to a group of students is frowned upon, and even if it were not, the treatment and control students might interact, thereby contaminating the comparison group. The process can be adjusted, however, so that classes or school buildings or community centers are randomly assigned to treatment or control status.
An alternative means to create comparison groups is to divide the potential participants into several groups and stagger the treatment, with some groups participating in the first offering of a program and the rest in subsequent offerings. This approach is particularly attractive when a program does not have the resources to provide the services to all likely participants at one time, or when all students are not required to receive the intervention. Those in the first group become the treatment group, and participants in subsequent offerings provide the "comparison" group. This approach only allows for short-term comparisons between groups, however, because eventually everyone receives the treatment.
To ensure that staggered groups are comparable, background information (e.g., gender, race, age, school attendance rates, academic test scores, etc.) should be analyzed. The more similar the groups, the more likely that any post-treatment differences between the groups are the result of the program. Even if the groups are somewhat different, background information can sometimes be used in statistical analyses to adjust for the differences. Or specific individuals from each group can be artificially selected for comparisons by the evaluators. However, approaches that "match" students artificially are risky, and require considerable knowledge of evaluation methods.
Perhaps there is simply no way to compare persons or groups in the same population that do and do not receive the treatment. This situation might occur if staggered treatment is impossible or if all possible subjects must be served together. Even in these cases, comparison groups may still be found. One possible comparison group might be students with similar personal and community characteristics (perhaps students in a neighboring high school or district). Once again, background information (including gender, race, age, school attendance, test scores, etc.) must be used to identify the similarities between the groups at the outset and may be used to aid in statistical analyses of findings.
The greatest problem in creating such matched comparison groups is knowing just what variables ought to be included in the match. If a match misses critical characteristics, the groups cannot be said to be truly comparable. It is easy to think of reasons why participant and comparison groups could be different. One common reason is that participants are chosen for a program in a purposeful manner. For example, participants in an agency-sponsored program may have volunteered to participate in the program. Volunteers are probably more likely to be helped by the program than are people in the same community, even when they share the same age, race, sex, socioeconomic status, or educational ability. Or participants may have been selected to receive the program because their school or community had specific problems. Such participants would be more likely to be troubled than a population that is similar with respect to age, sex, race, etc. As a result, they might share academic characteristics that could affect their behavior. The selection of appropriate treatment and comparison groups is an area in which program officials may want to consult with evaluation specialists within or outside of the district or agency.
-###-
Next: Ensuring Evaluations Yield Valid and Reliable Findings
Back: Documenting and Analyzing Program Installation and Operations
Return to Education Evaluation Page

Last modified -- September 21, 1998, (lyp)
|