Little is known about the characteristics of state and local Even Start evaluations[11]. This is the first study to describe some of the types of evaluative activities that are being undertaken with Even Start funds at the state and local levels. Research questions in this area include the following:
Local evaluations are conducted for multiple purposes. First and foremost, Even Start projects have to comply with the legislative requirement for a local evaluation. Given this requirement, many local evaluations choose to focus on project outcomes in an attempt to provide evidence so that local project directors can obtain political and/or financial support for the program from school boards, civic organizations, corporate sponsors, or foundations.
Ideally, Even Start projects would collect and use data as part of an ongoing continuous improvement effort and the local evaluation activity would be conceived and implemented with that in mind. The evidence contained in the reports that we reviewed shows that Even Start projects rarely engage in the systematic use of data to manage and improve their programs. Instead, program improvements/alterations typically are made on the basis of anecdotal evidence obtained through observations and stories gathered from the personal experiences of program implementers. A few reports that we reviewed noted recommendations from the previous year, described whether they had been addressed, and provided additional recommendations for the current year.
One reason for the apparent lack of use of data to improve Even Start projects is the distinction between the work that is done for a local evaluation and the work that is documented in the local evaluation report. Based on discussions with Even Start project directors and evaluators it appears that local evaluations may focus more on program improvement than would appear from reading the reports from those evaluations. When faced with limited time and resources, a local evaluator is likely to prepare a report which documents the gains made by families, rather than the types of programmatic improvements that he/she might have diagnosed and recommended, and the response that was made by the project.
Another reason is a misunderstanding of the purpose of the local evaluation, and who can benefit from the data collected. Perhaps local evaluation reports rarely provide information on continuous improvement efforts, testing for placement, and other local uses of data because Even Start project directors and/or local evaluators do not see these as important aspects of a local evaluation. The emphasis on simply reporting outcomes in local evaluations is understandable since the Even Start legislation and the guidance that has been provided to grantees both refer to studies of effectiveness and outcomes. Ideally, local evaluators would work with local projects to help them choose appropriate assessments and evaluation techniques to measure progress toward program goals. Even more important is the need for local evaluators to help local projects with interpretation of data and use of what is learned to improve services.
Finally, an important gap exists between the data and the conclusions of many local evaluation reports. While local evaluations almost always report glowingly positive conclusions about the effectiveness of Even Start projects, an independent reading of the same reports would rarely come to the same conclusions. These conflicting interpretations of the same data occur for two reasons. First, as described in this report, data gathered in local evaluations are almost always collected only on participants in the program. Those data typically show that children and adults improve over time on relevant outcome measures, and local evaluators generally report that the program helps participants. However, attributing improvements in test scores or other outcomes to Even Start is incorrect without considering alternative explanations. For example, children and adults make gains on measures such as the PPVT, the PSI, the TABE, and the CASAS due to normal development and maturation. Without a control group to assess the size of the normal developmental gain, it is inappropriate to attribute these gains to Even Start.
A second reason for the discrepancy between the conclusions reached by local evaluators and unbiased readers is that local evaluators are rarely independent evaluators. Although local evaluators are not allowed to be Even Start staff members, sometimes they are school district employees (e.g. a curriculum specialist or a school principal), sometimes they are self-employed researchers with small practices, and sometimes they are university-based consultants with varied research training. Local evaluations are rarely done by independent research firms. In any case, continued funding often rests on a local evaluator's ability to make some positive conclusions with whatever data are collected.
These evaluation issues have implications for using data for continuous improvement. If it is politically unacceptable for an evaluation to point out program weaknesses, to state that a program is not meeting its goals, or to demonstrate that children or parents are not attaining hoped-for literacy skills, then it is difficult to conclude that a program needs improvement.
The great majority of local evaluations are conducted by persons outside the grantee agency. The typical local evaluator is a university-based consultant, though many local evaluations are done by research organizations, and some are done by an outsider in concert with a grantee staff member. Many local evaluators have advanced degrees in a relevant field while others seem to have fewer formal credentials. While we have no direct evidence about the training of local evaluators, there is great diversity in the quality of local studies, just as there is great diversity in every aspect of Even Start.
Some local evaluation reports are clearly conceptualized, carefully conducted, and well written. They often provide information describing Even Start families, the amount and level of participation in Even Start, the gains made by Even Start participants on various measures, and the degree to which parents are satisfied with the program. Some of these studies also provide information about the ways in which local collaborations are or are not working, methods of family recruitment that seem to work well, and recommendations for improving project operations. These well-done studies have the potential to affect positively the development of local projects. If these studies are being done for $5,000 to $10,000 per year, local projects are getting a lot for their money.
Other local evaluations are of quite poor quality - incoherently conceived and presented, poorly written, and of little assistance to a local project. They are evaluations in name only, conducted to satisfy the letter of the law. Anecdotal information from local evaluators suggests that variation in state-level policies about the importance of local evaluation and the amount of funding to be spent on local evaluation contributes to the variation in the quality of local studies; high-quality local evaluations cannot be purchased for under $2,000 per year.
Of the 122 studies in the sample, we judged that 113 (93%) contained data about the implementation of the program being studied (Exhibit 3). This matches reports from the Even Start Information System in which 94% of all Even Start projects reported that they collected data on participants, services, and interagency collaboration, and 95% reported that they assessed program implementation.
Design of implementation studies. Almost all of the local implementation studies (95%) included some sort of description of the project structure and activities undertaken by families served by Even Start. This ranges from one-paragraph statements about the nature of the project and services provided to detailed, multi-page charts showing the exact length and nature of the planned services in each component, as offered or delivered by several different service providers.
Almost two-thirds (62%) of the local implementation studies provided information about the level of participation in Even Start. Typically, this includes data on the average length of time (months) that families participate; less often it includes data on the average amount of time (hours per month) that families participate. Many studies commented on the difficulty of recruiting and retaining families in Even Start, but none contained a systematic accounting of the number of families contacted, the number that showed up for at least one contact, the number that officially enrolled in Even Start, and the numbers that stayed in the program for different lengths of time.
Sample for implementation studies. Data to understand local implementation issues most often came from the local project director (81%), and from studies which intended to measure all Even Start teachers (72%) or all Even Start parents (70%). Rarely did local evaluations call for samples of teachers or parents. This is a reasonable approach since for small studies it generally is easier to collect data from all potential respondents than it is to select a sample and collect data only from the sample; for Even Start projects, the universe of respondents often is small enough to warrant a census approach.
Measurement methods used in implementation studies. Self-report was the most common measurement approach for Even Start local implementation studies. This strategy was used to obtain information from project directors (75%), teachers (72%), and parents (40%). Other commonly used measurement methods included abstraction of information from project records (66%), logs of activities or participation (32%), observation (23%). Only 15% of the local implementation studies relied on data collected for the national evaluation.
Of the 122 studies in the sample, we judged that 94 (77%) contained some information on program outcomes (Exhibit 4). This is less than the 93% of all Even Start projects which reported through the Even Start Information System that they assessed growth in child and adult literacy, and parenting skills.
Design of outcome studies. Many local evaluations reported on program outcomes. Without the resources to conduct high-quality outcome studies, local evaluators often measured the gains which families in the program made on standardized tests, but were not able to determine whether those gains are larger or smaller than would be expected in the absence of the program.
The most common design for local evaluations was the one-group pre-post study, used in 76% of the outcome studies. In this design, Even Start families are assessed as they enter the program and again at a later point in time, often at the end of a school year, or when they leave the program. No control or comparison group families are measured in this design, meaning that while the local evaluator can calculate the gains made by Even Start families, there is no way of knowing how much the families would have gained if they were not in Even Start.
While the lack of a comparison group makes the one-group pre-post study a weak design for estimating the effectiveness of Even Start, pre-post data can be used to assess the effectiveness of a project at helping families to meet a pre-identified set of literacy standards (e.g., at entry to kindergarten, 80% of the children who participated in Even Start will be able to perform tasks a, b, and c), and to determine whether parents and children achieve above, at the same level, or below the levels of parents and children in national norms groups on tests of literacy skills. Though it would be useful to program staff, this type of analysis is rarely included in local evaluations, in large part due to the difficulty of agreeing on performance standards to be met by program participants. The indicators that states must develop under recent amendments to Even Start have the potential to address this difficulty.
In a weaker design, 31% of the outcome studies used a one-group post-only design in which Even Start families are administered a posttest, but not a pretest. This design allows calculation of whether Even Start adults and children achieve at a given level, but it is not possible to determine how much was gained. Again, it would be possible to use data from this design to assess the performance of Even Start participants against a set of literacy standards, but setting those standards is a task which is only now beginning with the state work on indicators of program quality.
Only 10% of the local outcome evaluations used a two-group quasi-experimental design in which the gains of Even Start families were compared to the gains of families in a non-equivalent comparison group, e.g., families in a parallel program, children in Head Start, etc. Local outcome evaluations rarely have the resources to conduct a study in which funds are used to collect data from non-Even Start families.
Finally, no local outcome evaluation used an experimental design in which families eligible for Even Start are randomly assigned to participate in the program or in a control group. While this is the strongest approach for estimating the effectiveness of Even Start, its non-use in local settings is not surprising, given that designing and implementing a randomized study is a costly, time-consuming enterprise that requires considerable expertise. Local evaluations, rarely if ever funded for more than $10,000 per year, cannot be expected to undertake this type of expensive and complicated approach.
Time period covered by outcome studies. Families were followed for one project year in 75% of the local outcome studies, and for more than one year in 11% of the outcome studies. In general, these multi-year studies tracked children into the public schools in an attempt to learn about school-based child performance. The focus on performance over a one-year period is reasonable because most families participate in Even Start for a year or less.
Measurement methods used in outcome studies. Local outcome evaluations used many measurement methods. Children were most often measured by administering a test such as the PreSchool Inventory (PSI), the Peabody Picture Vocabulary Test (PPVT), or the PreSchool Language Scale (PLS) (64%); through parent interviews about the child's behaviors or progress (38%); or through teacher reports (29%). Adults were most often measured by administering the Comprehensive Adult Student Assessment System (CASAS) or the Tests of Adult Basic Education (TABE) (70%); through self-reports (65%); or through teacher reports (27%). Some of these measures (the PPVT, PLS, and TABE) have national norms which can be used as one basis of comparison for gains made by Even Start adults and children. The PSI has its own Even Start "norms" based on developmental data collected in the first national evaluation.
Other forms of measurement included abstraction of data from school or project records (35%), observation of the child/adult (19%), observation of the family/home (11%), and use of data from the national evaluation (9%). Given its high cost, it is not surprising that data collection through observation is seldomly used. The same holds for teacher reports; although it is relatively easy for teachers to complete a rating scale for a given child, it requires substantial resources to "track" Even Start children into many different public schools and negotiate to obtain the time of teachers to do "non-school" work.
| Exhibit 3 Implementation Studies: Description of State and Local Evaluations | |
|---|---|
| VARIABLE | PERCENTAGE (n) (total n = 113) |
|
95% (107) 62% (70) |
|
81% (91) 72% (81) 2% (2) 1% (1) 70% (79) 0% (0) 4% (5) |
|
75% (85) 72% (81) 66% (75) 40% (45) 32% (37) 23% (26) 15% (17) |
| Exhibit 4 Outcome Studies: Description of State and Local Evaluations | |
|---|---|
| VARIABLE | PERCENTAGE (n) (total n = 94) |
|
0% (0) 10% (9) 76% (71) 60% (56) 0% (0) 6% (6) 31% (29) 18% (17) 0% (0) 1% (1) 12% (11) 6% (6) 5% (5) |
|
75% (70) 11% (10) |
|
70% (66) 65% (61) 64% (60) 48% (45) 42% (39) 38% (36) 35% (33) 29% (27) 27% (25) 19% (18) 11% (10) 9% (8) |
|
[3.0 Study Methods] |
|
[5.0 Evaluation Review] |