A r c h i v e d  I n f o r m a t i o n

National Evaluation of The Even Start Family Literacy Program, 1998


Chapter 7

What Were the Child Development Outcomes?

Two measures of children's school readiness and/or development were used in the second national evaluation:

The PSI was used in the first national evaluation of the Even Start program, and the PLS-3 was chosen for this evaluation to replace the Peabody Picture Vocabulary Test-Revised Edition (PPVT-R), which was used in the first evaluation.94 (For a summary of the content validity of the child development outcome measures as well as the adult outcome measures, please refer to Appendix D.) Each measure is described in more detail below.

Description of the PSI

The PreSchool Inventory (PSI) was developed by Bettye Caldwell as a sixty-four-item inventory of basic concepts important for preschool children to know before entering school (CTB/McGraw-Hill, 1970). A thirty-two-item version has been adapted (Abt Associates Inc., 1991) for use in large-scale evaluations.

The PSI is an individually administered measure that assesses a range of school readiness skills such as identifying shapes and colors and understanding numerical concepts. The PSI requires fifteen minutes to administer and is appropriate for children between the ages of 3 and 5 years. English and Spanish versions of the test are combined on a single form. Each correct item counts as one point, and a total score is computed. The PSI contains no subscales.

The thirty-two-item version of the PSI has been used in numerous large-scale evaluation studies, including: the observation study of Chapter 1 preschool programs (Seppanen et al., 1993); the evaluation of Project Giant Step (Layzer, Goodson, and Layzer, 1990); the National Day Care Study (Bache, 1980); the Head Start Planned Variation study (Walker, Bane, and Bryk, 1973); the National Home Start Evaluation (High/Scope Educational Research Foundation, 1973; 1975); and the Child and Family Resource Program evaluation (Travers et al., 1982).

The PSI was developed to be sensitive to instruction and has shown positive effects of preschool programs in previous research, but since it does not have national norms, we cannot compare the performance of children in Even Start to any norming sample. The psychometric characteristics of the test have been investigated extensively.95 In the Sample Study, the PSI was administered to children between the ages of 3 and 5 years who were expected to participate in early childhood education. The test was administered to children by program staff or staff they designated (e.g., local evaluator, staff from collaborating agency). Project staff were trained to administer the test in the summer of 1994.

Administration rules for the Sample Study were that for Cohort 1 families entering in the fall of 1994, the test was to be given at entry (in the fall of 1994), again in the spring of 1995 (or at the time of exit from Even Start), and once again in the spring of 1996 (or at exit). Similarly, for families entering in the fall of 1995 (Cohort 2), the test was administered in the fall of 1995, again in the spring of 1996 (or at the time of exit from Even Start), and once again in the spring of 1997 (or at exit). Project staff were asked to administer the PSI as a pretest within thirty days of the start of services to serve as a baseline. Staff were asked to administer posttests with a minimum of three months between pretest and posttest dates. These were the same rules of administration as used in the first national evaluation. Staff recorded the PSI raw score, the test date, and the language of administration. We have wave one data for over 1,000 children, wave two data for over 650 children, and three waves of PSI scores for over 150 children.

Description of the PLS-3

The Preschool Language Scales (PLS-3) was selected for this evaluation to replace the Peabody Picture Vocabulary Test-Revised (PPVT-R) used in the earlier evaluation in order to obtain more detailed information about children's language development. The PLS was first developed in 1969 to assess the language development of young children, based on information about language development from the fields of psycholinguistics, human development, and speech-language pathology (Zimmerman, Steiner, and Pond, 1992). The measure can be used with children as young as 2 weeks and as old as 7 years.

The PLS-3 measures both receptive and expressive language skills and provides scores on two subscales (auditory comprehension and expressive communication) in addition to a total score. The auditory comprehension subscale assesses children's ability to process and understand language they hear, including skills in the areas of the meaning of words and concepts (content), the structure of language and syntax (form), and integrative thinking skills. The expressive communication subscale evaluates children's ability to produce language, including skills in vocal development, use of words and concepts (content), syntax (form), and integrative thinking skills.

The version of the test used in the Sample Study was revised in 1992. The test was standardized on a sample of 1,200 children, with equal numbers of males and females within each age range. The nationally representative sample was stratified on the basis of parent education, geographic region, and race/ethnicity.96 The PLS-3 takes approximately thirty to forty minutes to administer and is available in English and Spanish. Raw scores are converted into standard scores based on the age of the child; national norms and age-equivalent scores also are available.

In the Sample Study, the PLS-3 was administered to children between the ages of 2 years, 6 months and 5 years, 6 months at the time of the pretest and who were expected to participate in early childhood education. The test was administered to children by program staff or staff they designated (e.g., local evaluator, staff from collaborating agency). Project staff were trained to administer the test in the summer of 1994. We have wave one data for over 1,000 children, wave two data for over 700 children, and three waves of PLS-3 scores for over 150 children.

Performance on the Child Development Outcome Measures

Over the course of the Sample Study, we had hoped to be able to collect data from families over the span of up to two program years—once at entry into the program, a second time later in the same program year, and a third time in the subsequent program year. Relatively few families (in the Sample Study projects) remained for a long enough time, however, to have participated in all three waves of data collection. As noted above, we have one score for over 1,000 children; we have a second score for over 650 children, and a third score for approximately 150 children (the actual numbers differ somewhat for the PSI and the PLS-3 due to different age eligibility criteria). In our discussions below, we use information obtained from all children with valid test score data to describe the trajectories of Even Start participants' PSI and PLS-3 test scores over time.97 (Refer to Appendix D for more detailed information on pretest scores.)

Analytic Techniques

To investigate changes in children's test scores over time, we fit a series of individual growth models (Diggle, Liang, and Zeger, 1994; Willett, 1988). These models, which can be viewed as a special case of hierarchical linear models (Bryk and Raudenbush, 1992), multilevel models (Goldstein, 1995), and random coefficient models (Longford, 1993), allowed us to examine changes in children's test scores over time on both the Preschool Inventory (PSI) and the Preschool Language Scale-3 (PLS-3) while taking into account the statistical concern that multiple measures on the same children over time are not statistically independent. Another advantage of the growth modeling approach (over traditional repeated measures analysis of variance models) is that models can be fit to data structures, like this one, in which each individual has his or her own unique data collection schedule. The number of waves can vary across individuals and the spacing of the follow-up waves need not be identical either. Given that the number of observations per child varied from one to three and the spacing of the multiple measurements varied as well, individual growth models represented the method of choice for examining how children grow and develop during their participation in Even Start.

Individual growth models can be expressed in at least three different yet totally equivalent ways: (1) by writing separate within-person and between-person models; (2) by writing separate equations at each level and then substituting to arrive at a single equation; and (3) by writing a single equation that specifies the multiple sources of variation. Although all three methods are equivalent, in what follows, we have chosen the third approach because it highlights a particular feature of individual growth models that we think is especially important for understanding how these models represent data on individuals over time: by expressing each individual's score as a function of some fixed effects—effects assumed to be identical across people—and some random effects—effects assumed to vary across people.

In fitting these models, we sought answers to three linked sets of questions:

The results for the PSI are presented in Exhibit 7.2 and the results for the PLS-3 are presented in Exhibit 7.3. All models were fit using the procedure PROC MIXED in SAS (Singer, in press). Before turning to the specific substantive results, we first describe our approach to building the statistical models.

Approach to Model Building

For each measure, we began by fitting what is known as the "unconditional means model," a model with no substantive predictors (see, e.g., Bryk and Raudenbush, 1992). In the unconditional means model, child j's score on the outcome Y on occasion i (i.e., Yij) is expressed as the sum of two components:

where u0j is assumed to be normally distributed with mean 0 and variance t 00 and rij is assumed to be normally distributed with mean 0 and variance s 2. In this model, b 00 is a fixed effect, which measures the average test score for the average child; u0j is a random effect associated with child j, and rij is the within-child random effect that reflects the occasion to occasion variation in the child?s test scores. Shown in each table as "Model 1," the unconditional means model served two purposes. One, it provided baseline estimates of variance components (t 00 and s 2) which we then used to evaluate the fit of subsequent models, and two, it allowed us to estimate the intraclass correlation for each outcome, an indicator of the degree of consistency in individual children?s test scores over time.

We then fit a set of "unconditional growth models" in which we expressed each individual child?s test score data as a function of time. For example, the second model we fit for each measure (Model 2) was the unconditional linear growth model:

where the u0j are still assumed to be normally distributed with mean 0 and variance t 00 and rij are still assumed to be normally distributed with mean 0 and variance s 2. Notice that in writing this model, we have not entered the child's age directly, but rather have subtracted the child's age in months from the value 36. Because of this subtraction, known as "centering," we are able to interpret b 00 as the average child's test score at 36 months. Had we not centered age at a substantively meaningful value, the parameter b 00 would have been more difficult to interpret, as it would have represented the average child's test score at age 0 months, an obviously meaningless point in time. (For a discussion of centering, and its effects on multilevel models, see Kreft, de Leeuw, and Aiken, 1995.)

Fitting the series of unconditional growth models served two purposes: (1) it allowed us to select a functional form for modeling growth over time (to decide whether the linear model above was sufficient, or whether a quadratic or even cubic model was needed); and (2) to determine whether two random effects in the model were sufficient (the u0j and the rij) or whether we should also allow the growth rates to vary randomly across children, by adding another random effect (AGEij-36) uij. When modeling the PSI, we found that a linear growth model was preferred; when modeling the PLS-3, we found that a quadratic model fit better than a linear model. For both measures, we found no evidence to support allowing the growth rates to vary randomly across children. In all subsequent models we therefore constrained the growth rates to be fixed (as in the above equation).

Having decided on an unconditional growth model for each measure—a linear model for the PSI and a quadratic one for the PLS-3, each with random intercepts and fixed slopes—we then fit a series of conditional growth models, in which we investigated the effects of three substantive predictors. Following the advice of many experts in multilevel modeling and individual growth modeling (e.g., Bryk and Raudenbush, 1992; Kreft and de Leeuw, 1998), we restricted attention to a very small number of predictors: (1) the number of waves of measurement (one-wave vs. multi-wave); (2) maternal education; and (3) the need index.99 In investigating the effect of each predictor, we evaluated its main effect—its effect on the child's test score at age 36 months—and its interaction effect—its effect on the growth rate. As we will show, we found that while maternal education had no effect (at least after controlling for need index), there was an effect of both need index and the number of waves of data collection.

Exhibit 7.2: Multi-level Models for Examining Growth Over Time on the PSI

 

Models

Fixed Effects

1

2

3

4

5

6

7

8

Intercept

14.40***
(.24)

4.50***
(.34)

4.55***
(.34)

3.17***
(.44)

6.10***
(.64)

4.29***
(1.0)

7.76***
(.79)

7.03***
(1.37)

Age

 

0.64***
(.02)

0.64***
(.02)

0.64***
(.02)

0.46***
(.03)

0.46***
(.03)

0.46***
.03)

0.46***
(.03)

Single
Wave

     

-2.16***
(.42)

1.90*
(.76)

1.87*
(.76)

1.97**
(.75)

1.95**
(.76)

A*W

       

-0.26***
(.04)

-0.26***
(.04)

-0.26***
(.04)

-0.26***
(.04)

Mhigrd          

0.17*
(.07)

 

0.05 (ns)
(.08)

Grdflag          

0.55 (ns)
(.64)

   
Needindx            

-0.50***
(.14)

-0.45**
(.16)

Needflag            

0.87 (ns)
(.86)

0.83 (ns)
(1.16)

Random Effects

T00

19.46

23.28

23.28

24.06

24.76

24.55

24.17

24.23

T01     0.06          
T11     -0.00          
sigma2 38.00 13.28 13.28 13.23 12.25 12.27 12.29 12.29

Goodness of Fit

Deviance
Statistic
8944 8189 8188 8162 8127 8123 8113 8114
AIC -4474 -4096 -4098 -4083 -4065 -4064 -4059 -4059

Note: The standard deviation for estimates of fixed effects are included in parentheses below each estimate.

(ns) p>.05
* p<.05
** p<.01
*** p<.001

Exhibit 7.3: Multi-level Models for Examining Growth Over Time on the PLS

 

Models

Fixed Effects 1 2 3 4 5 6 7 8
Intercept 87.41***
(.46)
84.68***
(.80)
84.66***
(.80)
87.00***
(.98)
88.05***
(1.08)
86.24***
(1.19)
83.77***
(2.00)
89.64***
(1.60)
Age  

0.15***
(.04)

0.15***
(.04)

-0.22*
(.10)

-0.27**
(.10)

- 0.12 (ns)
(.11)

-0.13 (ns)
(.11)

-0.13 (ns)
(.11)

Age2      

0.01***
(.002)

0.01***
(.002)

0.009***
(.002)

0.009***
(.002)

.009***
(.002)

Single
Wave

       

-2.52*
(1.13)

2.35 (ns)
(1.81)

2.29 (ns)
(1.81)

2.50 (ns)
(1.80)

A*W          

-0.32***
(.09)

-0.31***
(.09)

-0.32***
(.09)

Mhigrd            

0.26 (ns)
(.17)

 
Grdflag            

- 0.917 (ns)
1.53

 
Needindex              

-0.99**
(.32)

Needflag              

-1.15 (ns)
(2.10)

Random Effects

T00

109.35

117.78

120.83

117.99

116.34

118.89

118.66

117.54

T01     -1.21(ns)          
T11     .001(ns)          
sigma2 149.69 143.06 138.24 141.05 141.48 138.81

138.83 138.58
Goodness of Fit

Deviance
Statistic

14204 14192 14187 14186 14179 14170 14166 14157
AIC -7104 -7098 -7097 -7095 -7091 -7087 -7085 -7080

Note: The standard deviation for estimates of fixed effects are included in parentheses below each estimate.

(ns) p>.05
* p<.05
** p<.01
*** p<.001

Results for the PSI

Model 1 of Exhibit 7.2 presents the unconditional means model for the PSI, in which we find that the average child in the Sample Study had an average score of 14.40. More important than the totally expected finding that this mean is significantly different from 0 (as indicated in the fixed effects portion of the table) are the two estimates of the random effects in the bottom part of the table. The estimated variance component for the means (t 00, also known as the variance component for the intercepts) is 19.46 and the estimated variance component within child (s 2) is 38.00. The fact that the variance component within child is approximately twice as large as the variance component between children tells us that there is more within-child variation than there is between-child variation. But this is not to say that there are not consistent differences in PSI scores between children. We can assess this degree of consistency by computing the intraclass correlation, which here is 19.46/(19.46 + 38.00) = .34. This tells us that one third of the variation in children?s PSI scores occurs between children.

All subsequent models are built with the goal of explaining some of the variation in the scores within children and between children. The unconditional growth models (Models 2 and 3 in Exhibit 7.2) attempt to explain the variation in children?s PSI scores within children over time. Each includes an additional fixed effect reflecting the child?s growth over time. The difference between the models is that while Model 2 only has two random effects (as in Model 1), Model 3 adds two additional random effects—for the age slopes (the growth rates) and for the covariance between the intercepts and slopes. Comparing the goodness of fit statistics for these two models reveals a difference in deviance statistics that is so small (approximately 1) for two additional degrees of freedom that there is no evidence that the model with random slopes is to be preferred to the model with fixed slopes (p>.50). Model 2 is therefore preferable to Model 3 because it fits nearly as well and is more parsimonious.

What does the unconditional growth model (Model 2) indicate about the behavior of children?s PSI scores over time? The parameter estimate for the fixed effect of the intercept (4.50) tells us that we estimate the average child in Even Start to score 4.50 on the PSI at age 36 months. The parameter estimate for the fixed effect of age (.64) tells us that we estimate that with each additional month, the average child?s score is .64 points higher. Comparing this slope coefficient of .64 to its standard error (of only .02), tells us that this growth rate is not only "statistically significant" by all conventional standards (p<.0001), but that it is also estimated quite precisely. Multiplying by 12 to yield a predicted annual gain, we estimate that the PSI score for a randomly selected child in Even Start is 7.68 points higher for each extra year of participation.

The random effects in Model 2 provide two interesting windows on the behavior of PSI scores both over time and within children. First, we can compare the estimates for the within-child variance components from Model 2 to Model 1 to see how much of the within-child variation is "explained" by age. The original estimate of s 2 (38.00) has declined to 13.28, a decrease of 65.1 percent ((38.00-13.28)/38.00); this tells us that approximately two-thirds of the within-child variation in PSI scores is attributable to age. Second, we can use these new estimates of the variance components to compute the residual intraclass correlation, a measure of how similar children?s test scores are after taking into account the within-child predictor, Age. Using the two variance component estimates in Model 2, we estimate the residual intraclass correlation for the PSI to be 23.28/(23.28+13.28)=.64, telling us that after we control for child age, nearly two thirds of the residual variation in PSI scores occurs between children.

All remaining models in Exhibit 7.2 investigate the fixed effects of the three potential predictors. Models 4 and 5 add the variable Single Wave, which contrasts children with only one wave of data collection to those who had multiple waves. (We should note that we also tested whether those with two waves were significantly different from those with three waves, and found no effects.) Model 4 investigates the main effect of this predictor; Model 5 investigates whether the effect of the predictor varies over time (which it does, in a way that will be described shortly). In Model 6, we add the main effect of mother?s education (using the two predictors, Mhigrd and Grdflag, as described earlier). In Model 7, we add the effect of need index (also using two predictors, Needindex and Needflag, as described earlier), not controlling for mother?s education and in Model 8, we add its effect after controlling for mother?s education. In results not shown here, we also tested whether either need index or mother?s education interacted statistically with child age, and found no effect. We therefore focus our interpretation of these models on the results for Model 7.

Exhibit 7.4 presents fitted individual growth models for four prototypical children: those with need index scores of 2 (low levels of need) and 5 (high levels of need) with multiple waves of data (the solid lines) and only one wave of data (the dashed lines). To emphasize that the models for the children with multiple waves are longitudinal and therefore really describe growth whereas the models for the children with only one wave are cross-sectional comparisons of children who entered Even Start at different ages, we have graphed the former using solid lines and the latter using dashed lines. A child who remained in Even Start for two or more waves of data collection grows on the PSI by an average of nearly half a point per month (.46 to be precise) for an annual increase of 5.52 points. For every extra point on the need index, the child?s score is an average of .50 points lower. The growth trajectories for the prototypical children in the plot, who differ by three points on the need index, are therefore separated by 1.5 PSI points.

What does this model tell us about growth in PSI scores over time? Because need index did not interact statistically with Age, we have no evidence that the growth rates for children differ by need. But we do have evidence that the growth rates for children who remain in Even Start longer (and who therefore have additional waves of data collection) are steeper than we would predict, based on the cross-sectional comparisons of children who were in Even Start for only one wave of data collection. We must, however, be very careful with this interpretation. The dashed lines in the graph are not trajectories in that they do not describe the behavior of individual children over time. Instead they simply describe the average PSI scores for Even Start children who were in the program for such a short period of time that they participated in only one wave of testing. We believe that if used with caution, these children may represent a suitable comparison group for evaluating the magnitude of the growth over time for children who remained in Even Start for two or more waves of data collection (the solid lines). With this caution in mind, we see that children who remain in Even Start longer have steeper growth trajectories than we would have predicted based on the cross-sectional testing data. The difference in the growth rates for the two groups (.46 for those with multiple waves of data and (.46-.26)=.20 for those with only one wave of data) is statistically significant at the p<.0001 level (as evaluated by the fixed effect for Age*Wave, which remains stable throughout Models 5 through 8). Thus, with the caveat that the children with only one wave of data cannot describe a growth trajectory, we conclude tentatively that children who remain in Even Start for longer periods of time may grow at a faster rate on the PSI than we would have predicted had they not remained in Even Start.

Exhibit 7.4: Predicted PSI Scores, by Child Age, in Months

Note: The figure above is based upon Model 7 (displayed in Exhibit 7.2).

Findings for the PLS

The general findings for the PLS-3 closely parallel those for the PSI. There is evidence that children with multiple waves of data collection have steeper growth rates than we would have predicted based on cross-sectional comparisons of children at different ages who have only one wave of testing data. So, too, children whose families have higher scores on the need index have lower PLS-3 scores (as they had lower PSI scores).

The fundamental difference between the analyses is that while we were able to model the raw scores on the PSI, because the tests are identical at every occasion of measurement, the same is not true of the PLS. To allow the PLS to be administered validly across a wider range of ages, the test uses somewhat different items depending upon the child?s age. We were therefore not able to use the raw scores, and instead we used the standardized scores recommended by the instrument?s publisher, The Psychological Corporation.

Because standardized scores theoretically should not change with age at all, the behavior of the measures for the unconditional means model (Model 1 of Exhibit 7.3) and the unconditional linear growth models (Model 2 and 3) differ sharply from those for the PSI. First, in terms of the unconditional means model, we find a higher intraclass correlation across the multiple measures for individual children. Taking the estimated variance components for Model 1 of Exhibit 7.3, we find an intraclass correlation of 109.35/(109.35+149.69)=.42, indicating greater similarity among the multiple measures for each child than we found for the PSI. This is to be expected when using a standardized outcome measure, in that the differences in the scores associated with age (or growth) are expected to be minimal (unless, of course, the children are actually growing on this measure over time—which seems to be happening here, for at least some children). Similarly, because the effects of age are theoretically removed by the standardization, adding the fixed linear effect of Age to the model should have little effect on the size of the within-person variance component (s 2). Comparing the estimates for this variance component from Model 2 to Model 1, we find a trivial reduction, from 149.69 to 143.06, or 4.4 percent. Contrasted with the 65.1 percent reduction on the inclusion of linear Age in the model for the PSI, we see that taking the child?s age at PLS administration into account has very little effect on the residual intraclass correlation, which has increased only slightly to .45.

But it is not as if there is no fixed effect of age in these models. Indeed, not only is the fixed effect of linear age statistically significant (in Model 2), there is also a curvature to this variable?s effect (as shown in Model 4).100 This curvature component—the quadratic term, Age2—which remains statistically significant in all subsequent models that include the substantive predictors, tells us that the effect of Age on the PLS is not linear. Coupled with the statistical interaction between the linear component of Age and the dummy variable distinguishing individuals with only one wave of measurement from those with multiple waves, we find (as we will soon show), that children who remained in Even Start for two or more data collection occasions do, on average, also grow on the PLS over time.

Before describing this effect in detail below, we describe the remaining models in Exhibit 7.3, which investigate the fixed effects of the remaining two substantive predictors. Model 7 shows that for the PLS, we find no main effect of mother?s education; children?s PLS scores, on average, are totally unrelated to their mother?s level of education. In results not presented here, we also find no statistical interaction between Age and mother?s education, indicating that the growth rates for the PLS also are unrelated to maternal education. Model 8, however, shows an effect of need index that is virtually identical to that found for the PSI. Here we see that for each increment of one point on the need index, children?s average PLS scores (at 36 months, the centering value) are .99 points lower. In results not presented here, we find no statistical interaction between Age and need index either.

We therefore focus our interpretation on Model 8, the results of which are graphed in Exhibit 7.5. Because this test can be administered to children at much older ages than the PSI, the fitted trajectories are drawn from age 30 months through age 84 months. Like the equivalent graph for the PSI, we have chosen to plot the results for four prototypical children: those with need index scores of 2 (low levels of need) and 5 (high levels of need) with multiple waves of data (the solid line curves) and only one wave of data (the dashed line curves). Because the effect of Age on the PLS is quadratic, the trajectories are represented as curves, and not as lines. Because there is an interaction between the linear component of these curves and the presence of multiple waves of data collection, the curves for the two groups of children are dramatically different.

Focus first on the cross-sectional curves for the children with only one wave of data. Although we do find that average scores are lower with increasing levels of need—the curve for the children with a score of 5 on the need index is 2.98 points lower than the curve for the children with 2 on the need index—we do not find any evidence of systematic growth over time. If anything, the children who entered Even Start later (at age 50 months, for example) have somewhat lower scores than those who entered earlier. This suggests that children who enter earlier have higher test scores, on average, than children who enter later.

Next focus on the growth trajectories for the children who remained in Even Start for two or more waves of data collection. At the early ages, there is little difference between the children with only one versus multiple waves of data; in fact, the test of the fixed effect of the variable Single Wave in Model 8 is non-significant, indicating that at age 36 months, we observe no difference in average PLS scores between those with only one wave of data and those with multiple waves. So, too, notice that we continue to have an effect of need index; children with higher levels of need have lower scores, on average. But most importantly, notice the way in which the growth curves for these children escalate over time. Regardless of the level of need, those who remained in Even Start long enough to be eligible for two or more waves of data collection actually grow on the standardized scores on the PLS over time. This growth occurs in the face of two factors which would suggest that no growth should occur: one, we are modeling standardized scores, which theoretically should remain constant over time, and two, for the Even Start children with only one wave of data, we see no parallel age differences.101 Coupled with the growth evidence from the PSI, this suggests that children who remain in Even Start for longer periods of time may indeed experience growth in outcome measures tapping into the domain of cognitive achievement.

As before, however, we must be very careful with this interpretation. The dashed lines in this graph are not trajectories describing the behavior of individual children over time. Instead they simply describe the average PLS scores for Even Start children who were in the program for such a short period of time that they received only one wave of testing. We believe that if used with caution, these children may represent a suitable comparison group for evaluating the magnitude of the growth over time for children who remained in Even Start for two or more waves of data collection (the solid lines). Thus, with the caveat that the children with only one wave of data cannot describe a growth trajectory, we conclude tentatively that children who remain in Even Start for longer periods of time may grow at a faster rate on the PLS than we would have predicted had they not remained in Even Start.

What do all of these growth curve analyses mean? To summarize, these analyses of children?s growth over time on two different measures provide us with credible evidence that children who continue to participate in Even Start make greater gains than one might anticipate based on age or development alone. Our analyses indicate that children progress at the same rate regardless of family need, although children from families with greater needs consistently score lower, on average, than children from families with fewer needs. Further, it is clear that the longer children participate in Even Start, the greater the gain, or the steeper the growth rate. By contrast, the distribution of PLS scores for children who have only one wave of data suggests that the later a child enters Even Start, the lower the score, on average. Our analyses reveal similar patterns for multi-wave children for both measures, the PSI and PLS-3; the fact that we have observed this pattern in the PLS, a standardized measure, provides stronger evidence that participation in Even Start has a demonstrable and positive effect on children.

Exhibit 7.5: Predicted PLS Scores, by Child Age, in Months

Note: The figure above is based upon Model 8 (displayed in Exhibit 7.3). A standardized score of 100 represents an average score for an average child, regardless of age.


Footnotes:

94 The Peabody Picture Vocabulary Test-Revised was replaced for two reasons. First, results from the first national evaluation found few differences between Even Start and comparison children, and because it seemed as though the PPVT was not sensitive to participation in Even Start, it seemed to have limited effectiveness as a measure of the impact of participation in the Even Start program. Second, because the test assesses only receptive vocabulary, we sought to replace it with a measure that is also sensitive to children's expressive vocabulary.

95 The reliability of the measure has been assessed in each of the studies cited above, with Cronbach's alpha ranging from .77 to .87. Test-retest reliability ranged from .67 to .77. In the first Even Start evaluation, the reliability of the PSI, as assessed via Cronbach's alpha, was .86.

96 Test-retest reliability coefficients range from .82 to .94, depending on the subscale and the age of the child. The interrater reliability was found to be .89. Reliability coefficients in this range are considered to be quite good.

97 Because the number of children varied so widely-from as few as three to as many as 118 in one project-we elected to limit the maximum number of records from an individual project to fifty, in order to minimize extreme leverage of a single project. Consequently, we randomly selected fifty child-level observations from the one site with 118 observations.

We encountered some missing data in estimating these models. For example, for two of the three substantive predictors (need index and mother's education), we were missing data for approximately 10 percent of the children. Rather than set these cases aside from analysis, we used Cohen and Cohen's (1982) approach to this problem (see also, Hedeker and Gibbons, 1997). For each child who was missing a predictor value, we imputed the mean value for all non-missing individuals. We then created a missing data flag for the predictor to indicate whether the value for the variable was real or imputed. When we used the predictor in any statistical analysis, we entered both the predictor itself and its missing data indicator as well. This approach allows the researcher to include all cases in the analysis, while not allowing the imputed value to affect the parameter estimates for the non-missing cases. As illustrated in both the tables summarizing our model-building, the missing data indicator was never statistically significant, indicating that there were no mean differences between those individuals with missing values of these predictors and those who had valid data.

98 Because there are so few fathers in the Sample Study (under 1 percent) with any education history data, we refer to the education level of the parent as "maternal education."

99 Although we were interested in the amount of exposure children had to the Even Start program as a predictor, because early childhood education hours mean different things for children of different ages (e.g., the hours for children at 2 years, 5 months and the hours for children at 4 years represent different activities), the values are not equatable over time. Additionally, the data submitted by different projects were highly variable; in some projects, all participating children had identical quantities of received instruction across multiple years. As a result, we could not include amount of exposure directly, and we used wave as a proxy.

100 As in the analyses for the PSI, we found no evidence that the effect of Age (either linear or quadratic) varies across individuals. Comparing the deviance statistics for models that allowed the slopes (and separately, curvatures) to vary across individuals revealed no statistically significant differences. We therefore conducted the PLS analyses similarly to those of the PSI, with randomly varying intercepts and fixed slopes.

101 In fact, when we examine the age equivalent scores solely for those children with two or more waves, and compare those scores to the norming population (in other words, to the scores of those children who comprised the population upon whom the test was normed), we can see that the distance between the norming population and Even Start children is decreasing over time. See Exhibit D.2a in Appendix D.

-###-



[ What Are the Educational and Developmental Outcomes? ]
[ Table of Contents ]
[ What Were the Parenting Outcomes? ]