A r c h i v e d  I n f o r m a t i o n

Assessment of Student Performance April 1997

CHAPTER 8

Part 2

Implications for Policy and Future Research

Several implications for policy and future research emerge from the present study's analysis of performance assessments and assessment systems with respect to both the intermediate purposes of assessment (i.e., monitoring student progress; alignment of curriculum, instruction, and assessment; and accountability) and the larger purpose of improving teaching and learning. Below, we discuss 14 interconnected general policy implications, followed by some implications specific to particular assessment purposes, and 8 recommendations for future research that can build upon our findings.

General Policy Implications

Policy implications emanate from both successes and failures our sites experienced in developing, implementing, and institutionalizing performance assessment systems.

  1. 1. Clearly state the primary purpose of the assessment system.

    The development and implementation of the performance assessment system depends heavily upon the purpose of the assessment system. If the purpose is not clear, the assessment system itself will not be formulated well, and the scores deriving from the use of the assessment system will not be readily interpretable. Thus, for example, if the purpose is to affect instruction in the classroom, the assessment system must be formulated such that it enables teachers to use and to understand how performance assessments might be incorporated in their classrooms; the scores from the use of this system will be interpretable within a pedagogical framework. If the purpose is to monitor school quality and performance, the assessment system must be designed such that it yields high quality, reliable, and valid data, especially if sanctions and rewards are to be imposed upon schools based on the assessment results. If the assessment system is intended to meet multiple purposes, prioritize among the purposes, as one assessment system may not meet all purposes equally well.

    2. Match the format of the assessment system with the purpose of the assessment system.

    The format of the performance assessment system must be tailored to the purpose of the assessment system. For example, if the purpose of the assessment system is school accountability, the assessment system must include a battery of assessments that are comparable across schools and can be scored using the same scoring criteria. The format of the assessment system, then, has to be fairly tightly prescribed, and its reliability and validity have to be well established. If, however, the primary purpose of the assessment system is to influence pedagogy in a particular direction, the system must be moderately prescribed, allowing teachers to formulate, implement, and score the assessment tasks on an on-going basis. A moderately prescribed format may well result in non-comparable assessments and non-standard scoring procedures in the short term, but it will allow teachers, in the long run, to become familiar with the underpinnings and goals of the reform effort.

    3. Coordinate assessment reform with other elements of education reform and with other testing requirements.

    Coordinating assessment reform with other elements of education reform (in particular, the development of curriculum frameworks and content and performance standards) fosters both assessment reform specifically and education reform generally. Without such coordination, two dangers emerge. First, time and effort spent developing individual reforms is wasted when coordination is imposed late in the assessment reform process. (This effect is equally true for states, districts, and schools developing performance assessments.) Second, teachers may hesitate to invest their own scarce time to work with a new assessment technique when its connection to other planned reforms is not clear.

    Coordinating the introduction of performance assessments with other testing requirements (emanating from all levels of authority - state, district, and school) also is important, because such coordination will simultaneously minimize teachers' sense of assessment "overload" and further their sense that the new assessment represents "value added" to the entire system of assessments, not just an "add on."

    4. Articulate in clear and simple terms the content and performance standards the assessment system is intended to measure.

    The content and performance standards that the assessment system is to be based upon must be clearly and simply stated. Insofar as possible, state these standards in measurable, concrete, and content-based terms. (Standards that are stated as general outcomes and those that are perceived as having little connection to disciplinary areas create a sense of anxiety for teachers and provoke opposition on the parts of parents and school board members.) This approach will not only help teachers understand and adopt the assessment criteria, but it also will facilitate communication about the purposes of the reform with school board members, parents, and the general public.

    5. Institute procedures to ensure the technical quality and fairness of the assessment system.

    To ensure the technical quality of a performance assessment system, instituting procedures to monitor and to confirm the continuing validity and reliability of the system is essential. Procedures to ensure validity include developing assessments based upon established and accepted content standards (such as those developed by professional associations) and incorporating reviews of the assessments conducted by content area experts and classroom teachers. Procedures to ensure that scoring methods are reliable include developing scoring rubrics that state the scoring criteria in clear, simple, and unequivocal language and utilizing techniques that result in reliable scoring, including group-scoring and social-moderation methods. In the absence of such quality assurance procedures. performance assessments will likely be considered inferior to standardized multiple-choice tests, especially if they are used for high stakes accountability purposes.

    Also institute procedures to determine the fairness and consequential validity of the assessment system. Such procedures might entail investigating the performance of different gender and ethnic groups on the assessments, the effects of the use of such assessments on the education of disadvantaged groups, and the educational outcomes of all groups of students on measures other than the performance assessments themselves.

    In addition, conduct small-scale pilot projects to determine the developmental-appropriateness of various types of assessment tasks (particularly those intended to be administered to elementary and middle school students) and their meaningfulness to students.

    6. In order to obtain a comprehensive picture of student learning, design a performance assessment system that contains a mix of different types of performance assessment tasks and scoring procedures.

    Different types of assessment tasks and scoring procedures have different advantages. For example, performance assessment tasks that require students to choose the topic of the assessment reveal individual students' thinking, interests, and strengths. Such tasks, however, may not allow teachers to assess whether students have acquired a particular skill or a particular piece of knowledge that is a part of the curriculum. On the other hand, a task that poses a specific problem may allow teachers to assess whether students have acquired the understanding and skills necessary to solve the problem, but this task may not reveal much about students' interests and strengths. Hence, both types of tasks are essential for gaining a full understanding of students' skills, interests, academic development, and strengths and weaknesses. Assessment systems comprising a mix of assessment tasks and scoring procedures allow teachers, students, and education systems to evaluate more fully educational processes and students' achievements.

    7. Design an assessment system composed of assessments that reinforce each other and are based upon the same learning outcomes.

    In order to give a consistent message to teachers and students, design an assessment system that is composed of assessments that are linked to the same curriculum and learning outcomes. The assessment system may comprise both multiple-choice tests and performance assessments. However, assessments not based upon the same learning outcomes may give rise to the perception that one part of the assessment system counts and the other does not. If all parts of the assessment system are used for accountability purposes, the differences in their curricular bases may give rise to a superficial classroom curriculum.

    8. Tap existing resources when developing performance-based assessments and coordinated reforms.

    Using existing resources to develop performance assessment systems prevents multiple reinventions of the wheel, as it were (50 reinventions at the state level alone, as more and more states move to incorporate performance assessments in their testing systems), and the associated costs of those reinventions. Furthermore, existing resources - including work conducted by organizations such the National Council of Teachers of Mathematics, the American Association for the Advancement of Science, and the National Council of Teachers of English - often represent the current thinking of the field about the best ways of teaching and assessing in the various disciplines. By using the work of these organizations as a springboard, assessment system developers can be assured that the work they are undertaking is in line with state of the art endeavors.

    9. Plan the timeline of reform keeping in mind the length of time required to institutionalize the change.

    The timeline of reform can too often be artificial with respect to the work to be done and, consequently, serve as a barrier to reform in the long run. (This is particularly true in the case of state- initiated assessment reforms and, even more particularly, when state-level reforms are introduced in response to acts of the legislature: mandated education reform takes place in the context of a desire for long-term change in a short-term world.)

    First and foremost, the timeline for reform must be sufficient to ensure the development of a technically sound system. Performance assessments cannot compete with standardized, machine-scorable tests on certain criteria: they cannot always achieve as high levels of interrater reliability in scoring, and they cannot always achieve as high levels of standardization in administration. When performance-based assessments provoke opposition, it is on these fronts that they are particularly vulnerable. Given these disadvantages, performance assessment systems must be as technically sound as possible. Attempts to put a system in place too quickly can undermine the longevity of the system.

    The timeline of reform must also be sufficient to allow for people - teachers, students, parents, and administrators, as well as the general public - to become accustomed to and develop faith in the value of the assessment. Public support will, in general, be broadened in cases in which the purposes, format, timeline., and consequences of the performance assessment are clearly communicated.

    10. Communicate to the public the purposes of and the theory underlying the assessment.

    A good public relations campaign can ward off negative responses to early problems in the development and implementation of the performance assessment system. Communicating to parents accurately and in sufficient detail the different purposes and the implications of the assessment for their children can serve to prevent perceptions of unfairness in the assessment. Such campaigns can include "portfolio nights" where parents are invited to browse through their children's portfolios and to ask questions about the portfolio system, or sharing the assessment with the public by inviting community leaders and legislators to complete the assessment just as students are asked to do.

    Included in the public relations campaigns must be assurances to parents, school board members, and legislators that content does not have to be sacrificed with the use of performance assessments and that the format of the assessments does not imply that content knowledge cannot be adequately assessed.

    11. Provide and encourage professional development activities that help teachers expand their capacity to work with performance-based assessment techniques.

    For the assessment reform to be successful, teachers must develop common assumptions about teaching and learning and common frames of reference about what constitutes evidence of valued student outcomes. Innovative approaches to professional development can go a long way toward supporting teachers' understanding of assessment reform. State departments of education, whose assessment reform initiatives necessarily extend to large populations, face special challenges to ensuring high quality, useful professional development for all teachers. However, some states have improved on the traditional train-the-trainer model by vesting more responsibility in designated individuals at the school level and by expanding the focus of professional development from communication to capacity building. Hands-on professional development, during which teachers learn to develop assessment tasks, scoring rubrics, and performance standards, increases teachers' capacity to work with performance assessments by, guiding them through the issues involved in effective assessment of student growth. Opportunities for teachers to work together to understand performance assessment techniques allow teachers to broaden their thinking about the assessments. Other capacity-building approaches focus upon instructional strategies and other issues in pedagogy which, in turn, allow teachers to examine their pedagogical assumptions and beliefs.

    12. Involve teachers in the design and implementation of the system, and make the system as loosely prescribed as possible within the context of the purposes of the assessment.

    Involving teachers in the process of designing and implementing the assessment system is likely to promote their appropriation of the assessment and, consequently, to effect meaningful changes in their pedagogical practices. Even state departments of education developing performance assessment systems can involve large numbers of teachers when the system calls for teachers to specify tasks within a given structure and to score student efforts using state-developed scoring procedures. Teachers who are involved in developing and implementing systems are more likely to appropriate the assessment technique because they have had time to work through the issues and problems associated with accurate assessment of students' knowledge and achievement.

    Furthermore, by designing a loosely to moderately prescribed assessment system - one that allows teachers room to exercise their judgment in developing tasks and in setting scoring criteria and performance standards - teachers' ability to appropriate the assessment system is enhanced. The principal problem associated with assessment systems that are not tightly prescribed is that they typically lack standardization, in terms of administration procedures, standards of performance, or both. Thus, when introducing assessment systems that are to be used for accountability or certification purposes, states and districts will necessarily develop systems that are more tightly prescribed than systems that are not used for these purposes. However, assessment reformers should be aware of the trade-offs between the standardization that accountability and certification purposes require and the likely effect on teachers' appropriation - and, hence, changes in teaching practices - of the assessment technique. Therefore, depending on the purposes the assessment is intended to achieve, state and district assessment reformers may want to strive to design systems that are moderately prescribed - that is, systems that allow teachers some discretion to design and administer tasks within a specified structure.

    13. Encourage schools to provide teachers with time to develop assessments and to discuss their experiences with assessments with colleagues.

    Teachers who know that their schools, districts, and states value the time they devote to working with newly developed assessments and who, as a result, are provided with regular time to develop and discuss assessment techniques are more likely to use the assessments thoughtfully. Furthermore, they know that the school, district, or state is serious about the reform when they are provided with this time. Regularly provided time can take on several forms- e.g., release time from the classroom, compensated time spent at weekend and summer conference or doing independent assessment- related work, early release of students on a regular basis to provide more teacher planning time, and additional planning periods for teachers involved in developing and implementing reforms. When states and districts (even more than schools) make provisions for this time, they signal to teachers that the work they are doing is important and valued.

    14. Provide waivers from testing and reporting requirements to schools experimenting with innovative assessment techniques.

    School-level assessment reform efforts can be hampered by state-and district-level testing and reporting requirements that are incompatible with the assessment system being developed and implemented by the school. The provision of waivers from these requirements can free up teachers to experiment in designing assessment system that make sense to them pedagogically.

Specific Policy Implications

While the general policy implications discussed above are applicable to assessment systems intended from most any purpose, the degree of their importance for any given assessment system is a function of the primary purpose of that assessment system. Our research indicates that the success of assessment reform depends upon aligning the format of the assessments and other aspects of education reform with the major purpose of the new assessment system. (Conversely, the major purpose of the new assessment system must be aligned with the other aspects of education reform.) Below, we discuss the two major functions of performance assessments and the development and implementation issues that must be given priority.

Improve and Inform Instruction and Curriculum

If the major purpose behind the reform is to improve and inform instruction and curriculum at the local level, the following points deserve special attention:

  1. 1. Design a moderately to loosely prescribed assessment system to enable teachers to actually work with the system in the classrooms. Also design assessments that can be easily integrated into different subject areas and into the school day.

    2. Provide clearly written content frameworks, performance standards, and assessment guidelines to teachers and administrators.

    3. Pay more attention to issues of content quality and curriculum and assessment coordination than to attaining interrater reliability.

    4. Provide on-going professional development in the design, use, and scoring of new assessments and also in how to use pedagogical strategies that align with the new assessments, how to choose new curricular materials, and how to use new pedagogies with all students.

    These on -going professional development sessions must be designed, to some extent, to fit the local-level need, whether the "local level" is the district, the school, or the classroom. Also, provide information to teachers about how to explain the assessments to students and their parents.

    5. Since developing and using new assessments and associated curricula requires teacher time, consider changing the structure of the school day Devise new schedules through block- scheduling, team-teaching approaches, extended school days, and other methods that allow teachers more time to learn and to teach their peers. Teachers must have the time to use, reflect upon, and discuss new reforms.

School or District Accountability

If the primary purpose of a state- or district-level assessment system is to hold schools and districts accountable for student performance, assessment reformers should consider the following points :

  1. 1. To ensure the rigor of the assessment system in measuring whether or not standards have been met, design a tightly to moderately prescribed assessment system.

    Accountability must be based upon standards that are applied uniformly across the educational system. Thus, in order to assess whether or not standards have been met, the assessment instrument must be uniform, or comparable, across the accountability unit.

    2. Institutionalize rigorous quality assurance procedure to ensure the content validity, intertask-reliability, fairness, and interrater reliability of the assessments.

    Use information from reputable professional organizations and from publicly-recognized master teachers for developing assessment tasks and scoring methods. In addition, institutionalize procedures to evaluate the fairness of the assessment systems to schools and to individual students. Such procedures might entail evaluating whether differential performance on these assessments can be attributed to a poorly designed assessment or to opportunity-to-learn factors. The implications emanating from the two explanations of differential performance are quite distinct.

    3. Public information is an important component of the accountability mechanism. The "public" includes parents, school board officers, and legislators, and information must be tailored to fit each group's information needs.

    Parents typically want to know whether or not their children are receiving a good education that will provide them with the knowledge and skills they need for higher education or future employment. Therefore, information designed for parents must explain how implementing the assessment is connected to school quality and how assessment instruments are connected to the curriculum. In addition, parents must be assured that the assessment system is not biased against their schools or their children. Thus, school or district accountability scores must be contextualized within information about the assessment system's relationship to quality education.

    School board members and legislators, in addition to being informed about the issues outlined above, must see the costs of the program in relation to its benefits. Thus, information on expenditures must be contextualized within what teachers and administrators view as the short-term and long-term benefits of the use of the assessment system. Therefore, think through and clearly state the costs and benefits of the assessment system that extend beyond just the short-term accountability function.

    Involve school board members in disseminating information about the assessments and about other policies to parents and legislators. They could translate the meaning of assessment scores to parents and to the general public and explain why standards-based assessments are better than norm- referenced assessments.

    4. If the assessment system is to be used for high stakes accountability, collect information on known correlates of student performance on assessments.

    If high stakes accountability is based only upon students' performance on the assessment system, the accountability system may provoke opposition on the parts of teachers and school administrators, who also may be tempted to corrupt the assessment implementation procedures. If accountability measures take into account know correlates of student performance, then this information could be used to contextualize the assessment scores and to provide help to schools in improving their scores.

Implications for Future Research

The following topics deserve the attention of future researchers.

  1. 1. Continued research into how the technical properties and fairness of performance assessment systems can be improved

    Over the long run, the technical soundness will be, perhaps, the primary determinant of whether or not the movement toward performance-based assessments perseveres. Areas of concern include:

    Features of assessment tasks that are intended to be meaningful to students, especially tasks that are intended to motivate students to engage in the task.

    Further research into how technical soundness and fairness of assessment systems can be maximized is crucial to the future of the assessment reform movement.

    2. Research into the most effective combinations of instructional models and assessments (including multiple choice tests) that result in improved student learning.

    Our findings indicate that although teachers utilize a variety of instructional models in conjunction with performance assessments, they are not always satisfied with the fit between instruction and assessment. Some teachers have expressed the concern that the kind of instructional models that conform to performance assessments may be developmentally inappropriate and may also result in a narrowed classroom curriculum. Research programs that investigate the effectiveness of the different combinations of instructional models and student assessment systems (including different types of performance assessments) are critical for understanding the connections among instruction, assessment, and student outcomes for students at different age-levels and for different subject areas.

    3. Longitudinal research of facilitators and barriers in assessment reform.

    The current study was able to investigate the facilitators and barriers in assessment reform only in the development and early implementation stages of reform efforts. In particular, some of the barriers identified may, over the long run, be broken down or become less significant as systems become established. Thus, the tentative set of barriers and facilitators identified here will be better understood in the light of future research that analyzes their effects in the long run.

    In addition, future research must investigate further those school-level factors that hinder or facilitate the implementation of state- or district-initiated reforms at the school level.

    4. Research into how different types of performance assessments are or are not appropriate for assessing the progress of children with disabilities.

    Little is known about the preparedness of children with disabilities to handle performance assessments, or about how the inclusion (or lack of inclusion) of children with disabilities in large-scale performance assessment systems affects the educational experiences of these children. On the one hand, educators have long turned to "authentic" assessments to use with these children so that the time pressures of traditional methods do not hamper the child's ability to demonstrate what he or she does or does not know. On the other hand, the appropriateness of new performance assessments for use with these children has not yet been demonstrated. Topics for future research include how children with various disabilities handle portfolios and what support and accommodations they need to complete portfolio tasks; how children with disabilities respond to on-demand performance assessments and extended projects that are to be completed within a certain amount of time or require group activities; and how performance standards should or should not be adjusted for these children.

    5. Research into the types of professional development and support activities that best enable teachers to understand and implement different types of performance assessments.

    Our research clearly indicates that a variety of professional development and support activities is key to the successful implementation of assessment reform. However. the effectiveness of the different models of professional development aimed at teachers' understanding of assessment development procedures, validation and scoring procedures, use of assessments, and instructional models that support performance assessments is as yet poorly understood. Therefore, research that identifies professional support models for the different facets of assessment reform would help to utilize scarce time and precious resources in a more beneficial and concrete fashion.

    6. Research into the impact of the use of performance assessments and related teaching, strategies on student learning.

    Further research must be more specific than the current study was able to be in evaluating the extent to which particular performance assessment formats promote the acquisition of particular skills and knowledge. Furthermore, it must investigate whether the acquisition of certain skills and knowledge precludes the acquisition of certain other skills and knowledge within a given domain. Research of this nature must be tailored to specific assessment systems that are carefully chosen to represent different formats but the same content areas.

    7. Research into how opportunity-to-learn factors affect disadvantaged students' performance on different types of performance assessments.

    Several educators have raised concerns about the performance of disadvantaged children on performance assessments. However, equity concerns are not likely to be answered without taking into consideration the opportunity-to-learn factors that affect student performance. Thus, educators and policy makers must include in their agendas research regarding the effect of opportunity-to-learn factors on disadvantaged students' performance on the newer forms of assessments. Performance assessments by themselves may not be biased against disadvantaged students.

    8. Research into the long-term benefits of the use of performance assessments as compared with the long-term costs of developing and implementing performance assessments.

    Long-term costs and benefits must be conceptualized on an a priori basis and evaluated using a longitudinal research design. The present fiscal costs of developing and implementing performance assessments are substantial, but so are the projected benefits. Whether or not such is the case must be empirically judged.


-###-


[Chapter 8:Assessment Reform: Findings and Implications Part 1 of 2]  [Contents]  [References]