Assessment of Student Performance April 1997
Educators and policy makers across the nation have invested a fair amount of faith in performance assessments as a promising tool of education reform, the goal of which is to enhance students' development of critical thinking skills, writing skills, multidisciplinary understanding, and social competencies. The essential question we now face is: is such faith warranted?
Assessment reform the shift toward performance-based assessments and away from multiple-choice, norm-referenced tests is based upon the assumption that performance assessments are more pedagogically valuable and more accurate reflections of student achievement than are multiple-choice tests. Specifically, assessment reform is based upon the assumptions that:
Findings from our study indicate that the efficacy of using performance assessments as a strategy for education reform is not unequivocally demonstrated in terms of enhanced student achievement, but that some positive changes that support student learning have, indeed, occurred in educational structures and processes. Those changes, however significant, are far from spectacular. The diversity of experiences educators at our 16 school sites have had with performance assessments indicate that new approaches to student assessment alone are not sufficient to improve teaching and learning. Rather, the principles and ideas underpinning assessment reform must be clearly defined and understood at all levels of educational organization. Furthermore, if assessment reform is to meet its promise of fostering a better educational system and enhanced student achievement, other systemic reforms also must be mobilized, and several factors including appropriate reform timelines and public information campaigns must be purposefully incorporated in any reform plan.
The purpose of our study was to answer three basic questions:
Below, we summarize the study's most salient findings and then go on to outline policy and research implications that emanate from those findings.
The divergence in the paths our sites have taken in developing and implementing performance assessments does not permit us to draw a "national picture" about the status of assessment reform. We chose our sites to represent different stages of development and implementation of performance assessments and different levels of educational organization, but we found that they also represent their own constructions and understandings of assessment reform.
2. The tasks and assignments students must conduct for performance assessments take a variety of forms, including on-demand, open-ended problems, extended projects, oral presentations and demonstrations, and portfolios. In practice, any non-multiple-choice task that takes one of these forms is now referred to as a "performance assessment," whether or not the assessment possesses the other important characteristics reformers insist assessments possess.
For all of its focus on performance assessment, the assessment reform movement is fragmented, since "performance assessment" means different things to different people. In fact, our study indicates that the assignments or tasks students must complete range from on-demand, open-ended, timed problems, to long-term research projects that culminate in oral presentations, to portfolios of student work. The one feature these different types of assessment tasks have in common is the requirement that students actively construct responses to problems or prompts. This same feature distinguishes them from multiple-choice tests. Hence, any non-multiple-choice task or assignment is, in assessment reform parlance, referred to as a "performance assessment." However, not all performance tasks are based upon the principles of assessment reform (i.e., the teaching and assessing of problem-solving and critical thinking skills and mulidisciplinary understanding).
3. Scoring rubrics are an important component of performance assessments, as they articulate the criteria against which the quality of student work is evaluated. Clearly articulated generic scoring rubrics function as a powerful tool of assessment reform, as they embody the educational outcomes important to the educational organization. Their use in the classroom helps teachers and students to organize learning and teaching around those outcomes.
Student performance on assessment tasks is generally evaluated using a scoring instrument or method. These scoring instruments or methods range from checklists of the items that must be present in student work to generic scoring rubrics that articulate the general competency and skill outcomes that must be reflected in student work along with the criteria for judging the quality of student work.
Assessment reform to date has been served most powerfully by generic rubrics. In requiring teachers to design and use assessment tasks that elicit the skills and competency outcomes articulated in the generic scoring rubrics, states, including Kentucky and Vermont, have communicated and promoted the educational outcomes valued by the state.
4. Evidence indicates that interrater reliability on scoring performance assessments can be improved over time and with sufficient professional development opportunities. However, content validity, equity, consequential validity of performance assessments, and meaningfulness of assessment tasks to students remain to be adequately addressed and established.
Data about the technical characteristics of performance assessment systems are generally meager, but they are more likely to be available for state- and district- level systems than for school- and national-level systems. The states involved in full-scale implementation of performance assessment systems have instituted measures to investigate and ensure the content validity and interrater reliability of their systems. Such measures have yielded positive results. For example, interrater reliability for Vermont's portfolios improved between 1992-93 and 1993-94. Systematic information with regard to the assessment systems' consequential validity is just beginning to accrue in some cases. In some cases, surveys show that teachers have aligned their pedagogical strategies with the performance assessment systems. However, consequential validity in terms of student outcomes has not been clearly demonstrated.
On the other hand, most districts and schools involved in our study had not undertaken any formal technical evaluations of their systems. In part, this finding reflects a lack of resources or, in some cases, the developmental status of the assessment system itself. In the case of some school-level systems, teachers regard their homegrown assessments to be reliable and valid for use with their own students, and they perceive no need for an independent evaluation.
5. Performance assessment systems that are moderately prescribed and cast a wide pedagogical net are more likely to effect intended instructional and curricular changes than those that are either loosely or tightly prescribed and those that cast a measurement net. On the other hand, tightly prescribed assessments that cast a measurement net are more likely to be useful for accountability purposes.
Our findings show that assessment systems can be characterized along two dimensions:
Our data support the positing of a strong hypothesis: if assessment systems are moderately prescribed that is, if they provide a structure for implementation within a coherent educational framework and involve teachers in developing and implementing the assessments the purpose of informing and influencing instruction is more likely to be achieved, at least in the short run.
Our findings also indicate that performance assessment systems that cast wide pedagogical nets that involve teachers and students on an on-going basis and utilize different types of performance assessments also are more likely to achieve the purpose of informing and influencing instruction. Because such systems invite teacher involvement and engagement, teachers are more likely to "appropriate" the assessments, work with them, and integrate them into the classroom. On the other hand, systems that cast narrow pedagogical nets tend not to spark change in classroom instructional practices, again, at least not in the short run. That is, systems that are meant for accountability purposes tend not to affect classroom pedagogy as quickly because these systems invite minimal teacher involvement as task specification, administration procedures, and scoring conditions are quite highly standardized.
The state-level assessment systems in our study are tightly to moderately prescribed and the national-level and school-level systems are, more often than not, loosely prescribed. (District-level systems reflected the entire spectrum of prescription.) Kentucky's and Vermont's performance assessment systems also cast wide pedagogical nets. Both systems require students to compile language arts and mathematics portfolios based upon the tasks their teachers assign to them, and both are moderately prescribed, with guidelines and scoring methods that provide the framework for teacher and student participation in the assessment system.
6. Assessment reformers identify five purposes performance assessment systems are intended to serve. Most systems are intended to serve multiple purposes. The most frequently cited purposes are to influence instruction and curriculum and to monitor student performance. However, in some cases, failure to recognize points of conflict between the different purposes hampers assessment reform, at least in the short run.
The five purposes of performance assessment systems are:
Most assessment reformers involved with the 16 performance assessment systems included in this study identified multiple purposes of their assessments. However, they did not typically prioritize among purposes. Moreover, various purposes are not necessarily compatible with each other (at least not in all combinations), and emphasis upon one purpose can sometimes result in an abandonment, or neglect, of another. In short, one performance assessment may not be able to serve all purposes equally well.
Perhaps the most important potential point of conflict between assessment purposes emerges when an assessment is intended both to hold schools accountable for students' performance and to improve instructional practices in the classroom. When held accountable for student performance on an assessment, teachers will teach to the test. When teaching to the test means that students are learning the valued skills, there will not necessarily be a conflict among purposes. However, the depth of the impact on teaching practices will depend upon how well teachers understand the pedagogical bases of the assessment and upon their own repertoire of instructional practices. Teachers may teach to the test during a finite period of the school year but may not necessarily modify regular practices, and they may not change their pedagogy.
7. State departments of education face numerous obstacles when introducing performance assessments. These obstacles include developing a technically sound system; coordinating assessment reform with other elements of education reform; communicating effectively with teachers about the purposes and value of the assessment; and selling the assessment to the public.
State departments of education in the process of introducing performance assessments (either under their own volition or in response to legislative action) are undertaking extremely complex endeavors. State-level assessments have both political and pedagogical ramifications and, thus, must pass the muster of two different sets of criteria.
Coordination among elements of education reform most particularly coordination among assessment reform, curriculum revisions, and the development of content and performance standards will most likely be crucial to the long-term success of assessment reform. Furthermore, this coordination is equally important from political and pedagogical perspectives. Teachers who are unaware of the connections among reforms either because such connections are unclear or because they have not been made are left in a quandary about how much time and effort they should invest in an evolving system. The growing emphasis on the development of standards-based curricula and assessments would seem to testify to the late recognition of the need to coordinate reforms.
8. Innovative models of professional development and support are beginning to yield results in terms of building teachers' capacity to work with performance assessment techniques.
Some states and districts are attempting to shift the focus of professional development from communication of facts to capacity-building. Kentucky and Vermont, in particular, have expanded the traditional train-the-trainer model to vest more responsibility in individuals at the school level. They also have involved all teachers of the assessed grades in scoring activities, providing training for teachers in how to apply scoring rubrics to student work. These efforts seem to have paid off in Kentucky and Vermont, as teachers (at least those participating in this study) are becoming increasingly comfortable with both states' portfolio assessments.
National-level reform efforts, such as the New Standards Project, the Coalition of Essential Schools, and the College Board's Pacesetter Program, offer professional development opportunities unlike those typically supported by states and districts. However, the value of these conferences and workshops eventually will be measured in the classroom, where teachers must apply what they have learned to the real world of teaching and learning. Teachers participating in this study who have attended the conferences put on by these organizations have confirmed their value, but also suggest that what they learn at conferences must be modified to their own particular classrooms, schools, and districts.
In short, professional development and support activities, specifically those that focuses on expanding teachers' capacity to work with performance-based assessment techniques, are crucial to realizing the purposes of assessment reform. Professional development and support is a necessary, but not sufficient, factor in the success of the reform.
9. Several types of resources, monetary and non-monetary, are required for developing and implementing performance assessments systems.
Although we collected information on expenditures related to developing and implementing performance assessments at different levels of educational organization, this information was not always complete. Furthermore, this information is not comparable across sites as fiscal record-keeping is not uniform across the educational organizations in our sample. Nonetheless, our data indicate that developing and implementing performance assessments is a costly venture. It requires different types of resources, not all of which are accounted for in monetary terms.
Aside from money spent on the actual development and implementation of performance assessments, assessment reform activities that require financial investments include gathering and utilizing information about assessment development, organizing and delivering professional development sessions, and disseminating information about assessments to teachers, parents, schools, and others. Yet other cost categories include library resources and storage space for assessment products such as portfolios.
The costs that are frequently not taken into consideration at any level of educational organization are teacher time in developing, administering, and scoring assessments and student time in completing the assessments. Teachers often mentioned that the time they invested in implementing a performance assessment system over the school year or the time they spent preparing their students for a year-end assessment resulted in their having to curtail the coverage of some content areas. On the other hand, the benefits teachers saw with some of the assessments was their having to do in-depth teaching that resulted in enhanced student achievement in some areas.
10. The primary impetus behind the performance assessment movement the goal of improving teaching and learning in the classroom is best served when teachers are provided with sufficient opportunity and resources to "appropriate" the assessment technique. Teachers must use (in original or modified form) and value the assessment if they are going to shape classroom practice to reflect it.
It is self-evident that teachers are more likely to appropriate assessment tools they develop themselves for use in the classroom. Our findings also identify those factors that contribute to teachers' abilities to appropriate assessments they themselves do not develop. Teachers who work with moderately prescribed assessments assessments that allow them to exercise discretion over particular aspects of the assessment within an established structure have reacted more favorably to external assessments than have their counterparts working with assessments that do not allow them that discretion. Teachers who are involved in scoring student assessments also are typically more positive about the assessments. Both of these findings illuminate the importance of giving teachers opportunities to grapple with the issues involved in assessing student performance.
11. Where teachers have appropriated performance assessments, they are asking their students to write and to complete research-based assignments more often than before, but the quality of this pedagogical shift is unclear.
In several schools implementing performance assessment systems comprising portfolios, long-term research projects, or exhibitions of student work, teachers say they are asking students to write more and to conduct more research-based assignments than they did in the past. Such an instructional shift is driven by the requirements of the assessments in two ways: teachers must design and assign tasks that enable students to demonstrate their writing capabilities or research and presentation skills, or teachers assign activities throughout the year that help their students develop the skills a demonstration assessment might tap.
However, two related findings point to why it is difficult to judge the quality of this pedagogical shift. The first is that, because teachers are still learning how to incorporate performance assessments into their classrooms, they themselves find it difficult to evaluate any relationship between the pedagogical change and students' learning. The second reason rests in unclear, unarticulated, or variable standards for performance. In the cases of several district- and state-level assessment systems, the content and performance standards associated with the systems are not clear at the local level; therefore, teachers are making a pedagogical shift, but they are uncertain to what end. In contrast, in the cases of many school-level assessment systems or schools participating in national systems, teachers frequently individualize performance requirements for their students, making it similarly difficult to evaluate the extent to which the performance assessment system is challenging all students to meet equally high standards.
12. Both teachers and students report that students are more motivated to learn through research projects and other performance-based assignments than they are with other types of assignments, a finding that supports one of the assumptions underlying assessment reform.
Teachers and students noted that students are more motivated to learn with performance-based tasks and writing assignments than with textbook-generated homework exercises. This effect is due, they say, to the sustained effort and attention students must invest in conducting research and writing projects and in defining some of the parameters of their own work. Teachers also believe that as a result of investing in projects that require research and writing, students are developing good writing and thinking skills. However, clear, independent evidence that such is the case is not yet available.
13. Teachers have transformed scoring rubrics into pedagogical tools, using them for setting students' performance expectations. The power of this transformation has depended upon how well the rubrics are constructed.
That performance assessments can fundamentally transform teaching and learning is most clearly demonstrated through the use of scoring rubrics. Teachers are using scoring rubrics as "scaffolding" to set performance standards for their students, gradually building student performance to higher levels of proficiency. In addition, teachers share scoring rubrics with their students to communicate the criteria they use in judging the quality of students' performances.
Teachers note that sharing scoring rubrics with their students has had a positive effect on students' understanding of the purposes of their assignments. Because of this better understanding, students are better able to become participants in the assessment process itself. Teachers note, too, that students internalize what they learn and develop a common framework for evaluating their own or their peers' work.
However, how well a scoring rubric serves to enhance student learning and understanding depends upon how well it is constructed. Some rubrics teachers shared with their students were simply checklists, while others were more elaborate (and still others were developed specifically for student use). Though the former type of "rubric" has proven useful (according to teachers and students who use them), it is the latter type that seems to have a clear effect on students' understanding of what is expected of them.
14. The use of performance assessments with students with disabilities has yielded mixed results.
The appropriate inclusion of students with disabilities in performance assessment systems, and the appropriate accommodations that should be made to support their participation, remain controversial and unclear at the local level. On the one hand, one justification underlying the movement toward performance assessment to provide a forum in which students can demonstrate what they know and can do is compatible with the goals and methods of serving students through individualized educational programs. On the other hand, our findings suggest that the format and the time and skill demands of some performance assessments have posed problems for the participation of students with disabilities in the assessment system.
On the positive side, some teachers noted that their students with disabilities experienced academic success and enhanced learning by conducting the research and writing assignments that comprise the performance assessments. At the same time, however, other teachers indicated that these students often have difficulty completing such assignments. On-demand performance tasks appear to pose the most problems for these students. Because these tasks tend to have time restrictions for completion, and because they may require higher levels of language arts skills than most multiple-choice tests require, students with disabilities often experience a sense of frustration and failure during the assessment process.
15. Although the objective of all performance assessment systems is to assess students against clearly established standards, in some cases the standards are not clear. This lack of clarity impedes teachers' ability to integrate assessments into instruction.
The lack of clearly defined content and performance standards impedes teachers' ability to use the assessments in the classroom. Teachers who make the effort to incorporate assessments into the classroom despite unclear content and performance standards are unsure of the quality of their instructional changes and of the pedagogical utility of the assessment. Indeed, most teachers who find themselves working with state- or district-level assessments for which standards are unclear simply refrain from investing much time and energy into integrating the assessments with their current teaching practices. This finding calls attention to a significant barrier to education (not just assessment) reform: weak articulation between assessment reform and complementary reforms (particularly the development of content and performance standards) can severely compromise teachers' commitment to and investment in the reform process.
Summary
Assessment reform is occurring at all levels of educational authority state, district, and school. It also is being spearheaded and supported by national projects and networks such as the New Standards Project and the Coalition for Essential Schools. Regardless of where it is initiated, assessment reform's predominant purpose is to enhance student achievement in terms of critical-thinking, problem-solving, and good writing skills.
Assessment reform, however, cannot be evaluated as a single entity. Current reform efforts encompass different approaches to and stages of performance assessment development and implementation. Some states, districts, and schools have developed performance assessment systems that are congruent with the original tenets of assessment reform. Others' attempts are more piecemeal than not, resulting in difficulties in institutionalizing the reform. Our study indicates that different reform initiatives have experienced varying degrees of success in transforming teaching and learning in American classrooms. It is the range of assessment reform efforts that allows us to identify some findings that illuminate both the features of performance assessment systems that offer promise and the nature of the challenges that face assessment reformers.
These findings, however, remain preliminary, for few performance assessment systems have been in place long enough to allow conclusive evaluation. In the final analysis, the success of assessment reform as a tool to enhance student achievement remains to be rigorously demonstrated.
-###-
[Chapter 7: Cross-Case Analysis 4: Part 2 of 2]
[Chapter 8: Assessment Reform: Findings and Implications Part 2 of 2]