The issue persists because of the tension between the research findings and the cost of implementation. A great deal of empirical data have been collected. However, they have so far been less than convincing and not consistent enough to justify the expense of the additional classrooms and teachers that would be required. Targeted remedial programs are generally less costly and easier to deploy. They tend to be adopted for a portion of the school day to address learning problems in one or a small number of subject areas. In contrast, maintaining small classes throughout a grade level or school requires pervasive organizational changes. Of course, proponents would argue that the benefits are also pervasive--being realized throughout the school day and affecting the entire range of school subjects--unlike the band-aid approach of experimenting with one targeted program after another.
Without doubt the most widely cited review is the classic Meta-analysis of research on the relationship of class size and achievement (Glass & Smith, 1978). The authors collected and summarized nearly 80 studies of the relationship of class size with academic performance that yielded over 700 class-size comparisons on data from nearly 900,000 pupils. The two primary conclusions drawn from this material are:
Although the extensiveness of the Glass-Smith meta-analysis was commendable, the selection of studies to include was subject to justifiable criticism. A number of studies were of short duration; many compared normal-sized classes to one-on-one tutoring; other studies did not include "realistic" class sizes as their comparison groups; and at least one study related to instruction in non-academic subjects (i.e., tennis). In spite of these deficiencies, however, the two conclusions drawn by Glass and Smith have endured and have received further support.
A compilation of studies examined by Educational Research Service (Robinson & Wittebols, 1986; Robinson, 1990) is noteworthy because of its extensiveness--more than 100 separate studies were reviewed. Robinson's (1990) conclusions added an important set of qualifications to the findings of Glass and Smith:
[R]esearch does not support the expectation that smaller classes will of themselves result in greater academic gains for students. The effects of class size on student learning varies (sic) by grade level, pupil characteristics, subject areas, teaching methods, and other learning interventions. (p. 90)
In particular, the review concludes that small classes are most beneficial in reading and mathematics in the early primary grades and that: "[t]he research rather consistently finds that students who are economically disadvantaged or from some ethnic minorities perform better academically in smaller classes" (p. 85). Unfortunately, the wide-ranging review failed to distinguish even the best designed studies from those using the poorest methodology, and thus the conclusions must be viewed as tentative.
A third review is noteworthy because of its focus on high-quality research conducted in accordance with accepted scientific standards. Using a procedure termed "best evidence synthesis," Slavin (1989) reviewed only those studies that lasted a minimum of 1 year; involved a substantial reduction in class size, that is, larger classes were compared to classes that were at least 30 percent smaller and had 20 students or fewer; and involved either random assignment of youngsters to class sizes or matching to assure that the groups were initially equivalent.1
Of the research summarized by Glass and Smith (1978) and others, Slavin identified only eight studies that met all three criteria. From these eight studies, Slavin concluded that substantial reductions in class size have a small positive effect on students (the median effect size for the eight studies was only 0.13
); and the effect was not cumulative and even disappears in later years.2 Slavin's reinterpretation of the Glass-Smith findings is that large effects are not likely to be seen until the class size is reduced to one (e.g., one-on-one tutoring).
Other research syntheses. In a brief overview of research, Finn and Voelkl (1994) identified three approaches to studying the issue of class size: the classroom-focus approach, the cost-related approach, and the ecological approach.
The reviews by Glass and Smith (1978), Robinson and Wittebols (1986), and Slavin (1989) summarize classroom-focus studies; this research examined the number of pupils in each classroom, the interactions between the teacher(s) in that classroom, and the outcomes that were realized by the pupils in that classroom. It provides the most direct and intensive view of the effects of a small class setting.
The cost-related approach examines the actual or potential costs of implementing small classes and weighs them against the benefits that may accrue. This approach is discussed in considerable detail in the next chapter of this paper.
The ecological approach views class size in historical or geopolitical perspectives. For example, Tomlinson (1988, 1989) examined the changes in median class size in the United States over several decades and related them to changes in standardized test scores. The analysis does not show performance benefits for smaller classes, and it ignores a multitude of intervening factors, including population shifts and both cultural and institutional changes over the same time period. Likewise, the comparison of class sizes between countries introduces a number of confounding variables including national differences in educational expenditures, educational goals, teacher preparation, and student characteristics, to name a few. Class sizes also may vary dramatically within a country over time or among schools at one point in time (see Finn & Voelkl, 1994). Thus, ecological associations with pupil performance only obscure the effects of having a smaller or larger number of individuals in a particular class setting.
Class size is not pupil/teacher ratio. The analysis of pupil/teacher ratios is characteristic of the ecological approach and shares some of the same difficulties. Although the number of pupils can be compared to the number of teaching staff in a single school, the ratio obfuscates the workload faced by a teacher in one classroom, the amount of attention the teacher gives to any one pupil, and dynamics of a small or large class that may impact on pupil participation;3 these interactions may be especially important for students at risk. At the same time, pupil/teacher ratios are often smaller in urban districts (because of Title I programs, special education programs and remedial teachers), while actual class sizes may be larger. One significant study (Boozer & Rouse, 1995) found that average class size--a more direct measure of classroom organization--was more important to academic achievement than the pupil/teacher ratio. Although several studies discussed in this paper did examine pupil/teacher ratios, the emphasis is on classroom-focus research.
The outcomes of PRIME TIME are summarized in numerous publications (e.g., Center for School Assessment, 1986; Chase, Mueller & Walden, 1986; Malloy & Gilman, 1989; McGiverin, Gilman, & Tillitski, 1989; Mueller, Chase, & Walden, 1988). In brief:
More unfortunately, PRIME TIME did not implement a single, well-defined, small-class intervention. While the average class size of 18 pupils was viewed as a target, actual class sizes ranged from 12 to 31; classes of 24 pupils with a teacher aide were considered to be small despite the number of pupils in the classroom. As a result, the evaluations of PRIME TIME cannot be interpreted as confirming or refuting a class-size effect.
Tennessee's Project STAR. Project STAR, the only large-scale, controlled study of the effects of reduced class size, was conducted in 79 elementary schools in the state of Tennessee from 1985 to 1989. The design drew heavily upon previous research findings, namely, that any benefits of small classes are likely to be realized in the primary grades, that there may be different outcomes for students based on race or economic disadvantage, and that only substantial reductions in class size are likely to have noteworthy impact.
Within each participating school, children entering kindergarten were assigned at random to one of three class types: small (S) with an enrollment range of 13 to 17 pupils; regular (R) with an enrollment range of 22 to 26 pupils; or regular with a full-time teacher aide (RA) with 22 to 26 pupils. Teachers also were assigned at random to the class groups. Teachers in the STAR classrooms received no special instructions of any sort, and the duties of teacher aides were not prescribed but were left to the teacher's discretion.4
Classes remained the same type (S, R, or RA) for 4 years, until the pupils were in grade 3. A new teacher was assigned at random to the class each year. Standardized achievement tests (Stanford Achievement Tests, or SATs) were administered to all participating students at the end of each school year. Also, curriculum-based tests (Basic Skills First, or BSF) reflecting the state's instructional objectives in reading and mathematics were administered at the end of grades 1, 2, and 3. Finally, a measure of motivation and self-concept intended for young children also was administered to each pupil (Milchus, Farrah, & Reitz, 1968). In all, about 7,500 pupils in more than 300 classrooms participated in the 4-year longitudinal study.
Comments on the design. Before reviewing the outcomes of Project STAR, the particular strengths of this initiative should be underscored. The within-school design was an effective way to control for differences among school settings including, but not limited to, the economic status of the student body, per-pupil expenditures, and the manner in which schools were administered. The value of this type of design cannot be underestimated. The random assignment was monitored carefully by state- level evaluators. A large and diverse population of students was longitudinally tracked over the 4 year period, and the data were collected, cleaned, and collated with a high degree of care. Both norm-referenced and criterion-referenced achievement data were collected. The norm-referenced tests, based on item-response theory, permitted comparisons of achievement levels from one grade to the next. The design of STAR, together with its magnitude and the follow-up research conducted after the 4-year period, led Harvard's Frederick Mosteller to term Project STAR "[a] controlled experiment which is one of the most important educational investigations ever carried out" (1995, p. 113).
The primary results. The main analysis of STAR outcomes consisted of four cross-sectional analyses, one at the end of each school year.5 The statistical methods were variations of common confirmatory procedures for evaluating experimental outcomes, for example, analysis of variance, multivariate analysis of variance, and analysis-of-covariance procedures (see Finn & Achilles, 1990). In addition to tests of significance, "effect size" measures were derived each year for all students and for white and minority students separately. The results were compiled into a Tennessee State Department of Education report (Word,et al. 1990).
Four primary results were reported consistently across the 4 years of analysis:
The results are given in the form of small-class effect sizes in Table 1.8 Each effect size is the mean score for small classes minus the mean score of regular and teacher-aide classes [S - (R+A)/2] in standard deviation units. Since they all favor small classes, the researchers referred to the difference as the "small-class advantage." For the criterion-referenced Basic Skills First (BSF) tests, the difference is computed for the percentage of students exceeding the state's mastery criterion.
Table 1.
Small-class effect sizes, grades kindergarten (K) through 3,
by skills, motivation, and self-concept data
| Scale | Group | Grade Level | |||
|---|---|---|---|---|---|
| K | 1 | 2 | 3 | ||
| Word Study Skills | W M ALL |
0.15 0.17 0.15 |
0.16 0.32 0.22 |
0.11 0.34 0.20 |
N/A |
| Reading | W M ALL |
0.15 0.15 0.18 |
0.16 0.35 0.22 |
0.11 0.26 0.19 |
0.16a 0.35a 0.25a |
| Total Reading | W M ALL |
- - 0.18 |
0.17 0.37 0.24 |
0.13 0.33 0.23 |
0.17 0.40 0.26 |
| Basic Skills First (BSF) Reading | W M ALL |
N/A | 4.8% 17.3% 9.6% |
1.6% 12.7% 6.9% |
4.0% 9.3% 7.2% |
| Total Mathematics | W M ALL |
0.17 0.08 0.15 |
0.22 0.31 0.27 |
0.12 0.35 0.20 |
0.16 0.30 0.23 |
| Basic Skills First (BSF) Mathematics | W M ALL |
N/A | 3.1% 7.0% 5.9% |
1.2% 9.9% 4.7% |
4.4% 8.3% 6.7% |
| Motivation | W M ALL |
0.00 0.03 0.01 |
-0.02 -0.01 0.00 |
-0.03 0.07 0.01 |
-0.01 0.11 0.00 |
| Self-Concept | W M ALL |
0.10 0.10 0.11 |
0.07 0.05 0.7 |
0.00 0.03 0.02 |
-0.05 0.04 0.02 |
NOTE: The values for BSF Reading and BSF Mathematics represent differences in the percent passing (no standard deviation). All other values are mean differences: Small - (Regular + Aide)/2, divided by the standard deviation of the scale. Standard deviations computed for all students in regular classes, and all white (W) and minority (M) students separately.
aTotal Language scale in grade 3 (not Reading).
In every instance, small classes outperformed the other class types; effect sizes for the total sample (All range from about 0.15
in kindergarten to about 0.25
in grades 1, 2, and 3.9 And like the research that preceded STAR, the small-class advantage was consistently greater for minority students (most of whom were black) than for whites. In most comparisons, the impact on minorities was about twice as large as it was for white students. This resulted in a considerably reduced achievement gap. In reporting this effect, Finn and Achilles (1990) noted that the difference between minorities and whites in mastery rates on the grade 1 reading test was "reduced from 14.3 percent in regular classes to 4.1 percent in small classes" (p. 568).
Two additional points should be noted. First, the effect sizes in Table 1 show that small classes present up to a 1/4
advantage compared to larger classes in every subject tested.10 Although the researchers did not devise methods for computing the total impact on achievement, it is greater than any single difference would indicate. Second, the effect sizes in Table 1 actually underestimate the true small-class advantage. An unavoidable phenomenon during the 4-year project was the "drifting" of some classes out of the target size range, as students transferred into or out of a class or school. Preliminary indications are that the effect sizes would be substantially greater if out-of-range classes were removed from the data.11
In sum, due to the magnitude of the Project STAR longitudinal experiment, the design, and the care with which it was executed, the results are clear:
At the same time, the research leaves behind a wealth of data that have only begun to be analyzed for what they can tell us.
The follow-up: the Lasting Benefits Study. After the positive STAR findings, Tennessee authorized a study to see how long the initial benefits of small classes would persist. Although all children were returned to regular-size classes in grade 4, the Lasting Benefits Study (LBS) continued to follow a significant portion of these pupils.12 In the 1995 1996 school year, the majority of STAR students were in grade 10 and were still being tracked.
The grade 4 evaluation included standardized and criterion-referenced achievement tests plus a new measure of student engagement in learning activities, the Student Participation Questionnaire (SPQ) (Finn, Folger, & Cox, 1991). The SPQ is a 28-item scale on which each pupil is rated by his or her teacher. It yields reliable, valid measures of student "effort" that the student allots to learning, "initiative-taking" in the classroom, and "nonparticipatory" behavior (disruptive or inattentive- withdrawn behavior). The grade 4 results (Finn,et al. 1989) showed that, even after the small-class intervention was disbanded:
in social studies to 0.16
on the criterion-referenced mathematics test; and
Positive achievement results continued to be obtained in later grades. The median small to regular difference in grade 5 for the total sample was approximately 0.18
; in grade 6 it was approximately 0.16
; in grade 7 it was approximately 0.14
0. As in earlier grades, the differences were statistically significant on all norm-referenced and curriculum-based tests.13
The carry over effects are consistent with findings from other early interventions, for example, the Perry Preschool Project (Berrueta-Clement,et al. 1984). They raise the possibility that small classes in the early grades have significant long-term consequences for all students generally and that they may begin students at risk of educational failure on a positive trajectory that will increase their chances of school success through the years.
As of this writing, resources are not available to explore these data in any but the most cursory ways. The data base continues to grow, however. In grade 8, two teachers rated each student on the SPQ and each student completed a self-report "Identification with School" scale (Voelkl, 1996). Achievement test scores have been obtained for grades 8 and 9. In sum, STAR and the LBS have laid the groundwork for building an important data base for examining educational effects longitudinally. Its potential to address both basic and policy-relevant research issues is elaborated in a later section of this report.
Other STAR-related studies. Based on the positive findings of STAR and the LBS, Tennessee implemented Project Challenge in 17 of the state's poorest school districts, that is, districts with the lowest per capita income and highest percentages of pupils in the subsidized lunch program. Beginning in 1990, small classes (pupil to teacher ratio of 15:1) were introduced in all schools in these counties in the primary grades; grades 2 and 3 in 1990, grades 1 through 3 in 1991, and grades kindergarten through 3 in 1992 and later years. Project Challenge was not a controlled experiment as was Project STAR, but was a thorough effort to implement small classes in particular targeted districts.
The project was assessed through an analysis of district rankings on statewide achievement tests (Achilles, Nye, & Zaharias, 1995). Since Tennessee has 138 districts, a rank of 69 would be considered average. In terms of the mean rankings of the 17 Challenge districts, the results were:
It is also interesting to note that because of the staggered introduction of small classes, grade 2 students in 1991 had been in small classes for just 1 year, whereas the grade 2 students in 1991 had been in small classes for 2 years (grades 1 and 2), and the 1992 and 1993 grade 2 students had been in small classes for 3 years (kindergarten through grade 2). That is:
This study adds non-experimental evidence that small classes are beneficial in the primary grades. The data also indicated that in-grade retentions were reduced when small classes were implemented (Achilles, n.d.).
Two smaller studies of class size were conducted in North Carolina pursuant to STAR. In 1991 educators, citizens, and the school board in Burke County, North Carolina began a project to reduce the class size to 15 in grade 1, followed by grades 2 and 3 in subsequent years (Achilles, Harman, & Egelson, 1995; Egelson, Harman, & Achilles, 1996). And in a related effort, the principal of the Oak Hill elementary school in the Guilford County, North Carolina system restructured classes in grades kindergarten through 3 into a small-class format (15 students). The initiative was termed Success Starts Small (Achilles,et al. 1994; Kiser-Kling, 1995). Oak Hill school was fully Chapter 1 eligible, with 78 percent of its students in the subsidized lunch program. Matched comparison groups were used in both studies.
The results of both projects favored small classes in academic achievement small-class effect sizes were in the range 0.4
to 0.6
(Achilles,et al. 1994; Achilles, Harman, & Egelson, 1995) 0. Significantly, Success Starts Small included systematic comparisons of teaching behavior in small and regular classes:
Conclusions. Both Project STAR and the LBS provide compelling evidence that small classes in the primary grades are academically superior to regular-size classes. The findings were confirmed for every school subject tested. Teachers of small classes received no special instructions or training; the outcomes result from class size and from whatever perceptions and advantages accompany having substantially fewer students in a room with one teacher. This is not to say, of course, that the effects could not be accentuated if additional teacher preparation initiatives were provided.
A clear small-class advantage was found for inner-city, urban, suburban, and rural schools; for males and females; and for white and minority students alike. The few significant interactions found each year indicated greater small-class advantages for minority or inner-city students. Targeting small classes in particular schools or districts may provide the greatest benefits at a cost that is contained, although it may also mean denying the benefits to other students or schools.
These studies were based on research suggesting that small-class benefits are most likely to occur in the primary grades. The findings of Project STAR are limited to grades kindergarten through 3--no reasonable extrapolation beyond those grades can be made from these data. At the same time, the LBS results indicate clearly that the effects carry over into later years. The large, diverse database created through STAR, the LBS, and ongoing data collections offers the opportunity to answer a number of significant questions about the long-term effects of small classes on achievement, pupil engagement in school, and student behavior.
2 Slavin also commented that while teachers may change their behavior in small classes, the changes are so slight that they are unlikely to make important differences in student achievement. This issues is discussed more fully in a later section of this paper.
3 Of the studies described in the next section, Project PRIME TIME manipulated pupil/teacher ratios but failed to find a significant impact on academic achievement. In contrast, Project STAR controlled the number of pupils in each classroom; this was accompanied by differences in student performance.
4 There was a training component for some teachers in grade 2. The effects on student achievement were found to be negligible. The results reported here do not include classes taught by that subsample of teachers.
5 Several longitudinal analyses have been completed as well, including a K-1 analysis (Finn & Achilles, 1990)and a K-2 analysis (Finn, et al., 1990). Many important longitudinal analyses remain to be conducted.
6 The exceptions did not contradict the finding of a small-class advantage. They indicated that, to some extent, the advantage was greater for students attending inner-city schools.
7 One possible reason for the negative findings may lie in the difficulties in assessing noncognitive characteristics of young children. Of course it is also possible that small classes improved learning but did not affect pupils' motivation or self-concepts.
8 Unpublished table obtained directly from the analyses.
9 Although precise grade equivalents are not available, these differences correspond to an advantage of about .1 grade equivalents (or about 1 month)by the end of kindergarten, about 0.2 grade equivalents (or about 2 months) at the end of first grade, and somewhat more by the end of grade 2.
10 Including several subtests not listed in Table 1.
11 In the range 0.3* and upward (Zaharias, et al., 1995).
12 Each year (1990-1994) the number of students tested was between approximately 4200 and 6000.
13 Later follow-ups through grade 11 are being conducted by H.P. Bain and J.B. Zaharias of HEROS, Inc. Preliminary results indicate that the positive effects of small classes persisted at least through grade 10.
14 This finding is discussed further in the later section on instructional practice and student behavior.
-###-