Assessment of Student Performance April 1997
The present report on the three-year research project, Studies of Education Reform: Assessment of Student Performance, begins by examining the impetus behind and issues associated with assessment reform a phrase that is commonly understood to mean the systematic shift at all levels of education organization toward performance assessments and away from multiple-choice tests for instructional, accountability, or student certification purposes. Developing and implementing non-multiple-choice methods of assessing student performance has become a major, and controversial, part of the education reform movement that is currently sweeping the nation.
This introduction provides an historical overview of the current movement toward the development and use of performance assessments, outlines the relationship of assessment reform to broader issues in education reform, and highlights the technical and measurement questions related to the use of performance assessments. The remaining chapters discuss the objectives and design, findings, and research and policy implications of this study.
The use of performance assessment is not an entirely new strategy in American education. Essays, oral presentations, and other kinds of projects always have been features of elite private education. In many classrooms, private and public, teachers for years have been assessing student progress through assigned papers, reports, and projects that are used as a basis for course grades. At the national level, the Advanced Placement Program of the College Board, from its inception, has assessed students by requiring at least one written essay in addition to responses to multiple-choice questions (as well as laboratory experiments in the sciences and demonstrations in music).
What is new in the current reform movement is the emphasis on the use of performance assessments for systematic, school-wide, instructional and curricular purposes at the school-level and, in addition, for accountability purposes at the district- and state-levels. In some instances, proponents of assessment reform view performance assessments as the lever for systemic curricular and instructional reforms at any level of the educational hierarchy. Theoretical writings, such as articles by Wiggins (1989; 1991), and descriptions of programs, such as Wolf's (1989; 1991) discussions of ARTS PROPEL activities in the Pittsburgh Public Schools, have influenced practitioners to adopt performance assessments and education systems to consider assessment reform.
As discussed in other sections of this chapter, controversy is not centered around the use of assessments for pedagogical purposes, but around their use for system accountability and student certification so-called "high stakes" purposes.
Performance assessment, as the term currently is being used, refers to a range of approaches to assessing student performance. These new approaches are variously labeled as follows:
Regardless of the term used, according to Mitchell (1995), performance assessments imply ". . . active student production of evidence of learning not multiple-choice, which is essentially passive selection among preconstructed answers" (p. 2).
The present focus on performance assessments as a systematic strategy of public education reform owes its origins to three related phenomena, all of which gained momentum during the late 1980s: (1) the reaction on the part of educators against pressures for accountability based upon multiple-choice, norm-referenced testing; (2) the development in the cognitive sciences of a constructivist model of learning; and (3) concern on the part of the business community that students entering the workforce were not competent enough to compete in an increasingly global economy.
In 1983, A Nation at Risk was widely interpreted as a call for school systems to tighten their curricula, and such "tightening" resulted in widespread testing for accountability purposes. Many school systems came to rely upon the use of norm-referenced, multiple-choice tests for school accountability, and educators felt this phenomenon came to have an undue amount of influence on teaching and learning in the classroom. Classroom teachers felt the pressure to prepare their students to do well on such tests and accordingly modified their approach to teaching. Thus, "teaching to the test" became an increasingly prevalent pedagogical strategy (Baker & Stites, 1991; Herman & Golan, 1991; Shepard, 1991).
Multiple-choice tests were based on the assumption that learning and knowledge could be decontextually tested. The effects (and, from many educators' viewpoint, pernicious effects) of the use of such testing models were subsequently highlighted by research (e.g., Cannell, 1987, 1989; Herman & Golan, 1991; Shepard, 1991; Oakes, 1985, 1990), causing many educators to rethink their accountability strategies. Many reformers argued, then, that multiple-choice, norm-referenced testing had assumed too much importance in the classroom, often displacing the more pedagogically sound practice of "assessing for teaching" in favor of "teaching for testing."
At the same time, the constructivist model of cognition began to transform educators' thinking about teaching and learning. Thus, the following idea gained currency in the reform movement: because each individual constructs knowledge in his or her own way, a customized rather an a mass approach is necessary to enable him or her to achieve high standards. Educators came to believe that, in order to strengthen all students' educational experiences and to better meet all students' needs, assessments that concurrently allow for an understanding of students' learning processes and extant knowledge base and that support variations in pedagogy are required. In addition, advocates of performance assessments suggested that the use of performance assessments would have salutary effects on student motivation and learning, especially the use of performance assessments that stress interdisciplinary skills and pose problems contextualized in the "real world" (i.e., assignments that emulate the kinds of multifaceted problems one encounters outside the classroom). They argued that students would be more involved in attempting and completing such assessments.
Concurrent with these trends within the education system, the demands from outside the education system for more workers with sophisticated thinking skills provided the fuel for the rebellion against the widespread use of multiple-choice tests. Business and industry executives demanded that their employees be able to think creatively, solve problems, write well, work flexibly, and possess social competencies in order to be able to operate in groups. The U.S. Department of Labor's Secretary's Commission on Achieving Necessary Skills (SCANS), after an extensive survey of the business community, reported that employers and employees share the belief that all workplaces must "work smarter" [italics added] (p. v). SCANS concluded that for a workplace to work smarter, its employees must possess certain competencies, such as interpersonal skills, and foundation skills, such as basic skills in reading, writing, and thinking. Such pressures added up to the widespread consideration of assessment reform as part of a solution to the problem of the incompetent worker.
Given these concerns, education reformers insisted that in order to function as a lever of education reform, assessments must: (1) be based on a generative view of knowledge; (2) require an active production of student work (not a passive selection from prefabricated choices); and (3) consist of meaningful tasks, rather than of that which can be easily tested and easily scored. Assessment reform also rests, explicitly or implicitly, within the notions that: (1) assessing student performance against established standards is better than against group norms; (2) teaching and assessing problem-solving, critical thinking skills, and good writing skills are essential for student achievement and growth; and (3) teaching and assessing procedural knowledge, such as the scientific method and writing processes, are as important as teaching and assessing factual knowledge.
Thus, in assessment reform theory, all performance assessments must require students to structure the assessment task, apply information, and construct responses, and, in many cases, students must also be able to explain the processes by which they arrive at the answers.
In theory, such assessments generate a wealth of information about the student, which can then be used for instructional purposes. Such information might shed light on the student's understanding of the problem, his or her involvement with the problem, his or her approach to solving the problem, and his or her ability to express himself or herself.
In sum, proponents of assessment reform argued that performance assessments would motivate and involve students in the learning process; such assessments would help students develop good writing and conceptual skills, establish a meaningful context for learning, and, therefore, achieve higher levels of desired outcomes.
An overview of the efforts in U.S. education systems to reform assessment of student performance is perhaps best organized by their level of initiation: national, state, district, or school. Although this organization is, in some ways, artificial, it nonetheless helps us to impose order on and to understand better a phenomenon that encompasses a wide range of purposes and methods of assessment reform.
National (non-governmental and governmental) involvement in assessment reform shares the limelight with state-level efforts. Several national, non-governmental projects tackling assessment, curricular, and instructional reform have gained national prominence in recent years. For instance, the New Standards Project (NSP), the Coalition of Essential Schools (CES), and the College Board's Pacesetter program have exerted considerable influence on education administrators and teachers across the nation, and have influenced a shift toward the use of performance assessments.
The New Standards Project began in 1991 with the aim of reinvigorating and revamping American education (Resnick & Simmons, 1993). It is jointly managed by the National Center on Education and the Economy and the Learning Research and Development Center at the University of Pittsburgh. The crux of NSP's work involves establishing performance standards and designing curricular, instructional, and assessment strategies. The NSP Board, which guides the formulation of performance standards and assessment strategies, is composed of representatives from NSP's partner states and districts and from professional organizations, such as the National Council of Teachers of Mathematics (NCTM), the American Association for the Advancement of Science (AAAS), and the National Council of Teachers of English (NCTE). As of March 1995, the NSP program listed 17 state and 6 urban district partners.
The NSP assessment system is being formulated for grades 4, 8, and 10. The fully articulated system will consist of student portfolios that will contain NSP-recommended matrix-sampled tasks requiring extended responses, exhibitions, projects, and other student work.
The Coalition of Essential Schools also is a national force in its own right. It was established in 1984, at Brown University, as a school-university partnership to help redesign schools. Coalition members include 150 schools that are actively involved in reform (Coalition of Essential Skills, undated). The reform work of the member schools is guided by a set of nine Common Principles, the sixth of which pertains to assessment. The sixth principle states that students should be awarded a diploma only upon a successful demonstration an exhibition of having acquired the skills and knowledge that are central to the school's program: As the diploma is awarded when earned, the school's program proceeds with no strict age grading and with no system of "earned credits" by "time spent" in class. The emphasis is on the students' demonstration that they can do important things (The Common Principles of the Coalition of Essential Schools) (Sizer, 1989). Several member schools have fashioned their graduation requirements on this principle.
Performance assessments on the national level have always been a feature of the College Board's Advancement Placement (AP) Program especially the Studio Art Portfolio Evaluation, which has no written or multiple-choice portions. This evaluation, in fact, is an example of a well-established national portfolio examination (Mitchell, 1992).
More recently, the College Board has launched another assessment development effort. The College Board's Pacesetter program is being designed as a national, syllabus-driven examination system for all high school students. It is modeled on the AP examinations, which (with the exception of Studio Art) contain both multiple-choice and partially open-response items. The Pacesetter design incorporates two forms of assessments classroom assessments and end-of-course assessments. Currently, 60 sites in 21 states are implementing Pacesetter course frameworks and associated assessments in English, mathematics, and Spanish (The College Board News, 1995).
The most visible indication of national-level, governmental involvement in assessment reform came with the passage of the Goals 2000: Educate America Act (P. L. 103-227). Passed in 1994, GOALS 2000 offers states Federal grants to develop standards-based education systems. As a result, Congress allocated $105 million in Fiscal Year (FY) 1994 for Goals 2000, and imposed no funding limits through FY 1999 (Education Daily, May 27, 1994). The law formally authorizes the National Education Goals Panel (NEGP) to monitor progress toward GOALS 2000, and the National Education Standards and Improvement Council (NESIC) was to have reviewed the criteria set for evaluating student performance standards. However, the amount of funding to be allocated for GOALS 2000 is likely to be drastically reduced, and the as yet unappointed NESIC is to be abolished (Education Daily, September 13, 1995). NESIC's role in endorsing state-generated standards is considered to be too intrusive by some members of Congress (Education Daily, January 31, 1995; Olson, 1995).
As of September 1995, 48 states had applied for the U.S. Department of Education's GOALS 2000 grants. Although states' initial applications include only general plans regarding how content and student performance standards would be set, future applications will be required to detail how student performance will be measured, in order to assess whether or not students are meeting set standards.
In another national program, Title I (formerly Chapter I), performance assessments especially portfolios stand a chance of being included as options for use beyond norm-referenced multiple-choice testing. Congress reauthorized the Title I compensatory education program in 1994. By law, states are required to use the same or equally rigorous standards and assessments they devise for GOALS 2000 for monitoring the progress of Title I students, but districts can also devise their own standards and assessments as long as they are as rigorous as those of the state. Through these requirements, Title I aims to coax states away from norm-referenced, multiple-choice tests and toward more open-ended, performance-based assessments (Olson, 1995).
In addition to the GOALS 2000 and Title I programs, the work of several national organizations and professional associations in developing content standards for academic areas has implications for assessment reform. The effort of many of these groups (e.g., The National Council of Teachers of Mathematics, The Center for Civic Education, The Consortium of National Arts Education Associations, The National Center for History in the Schools at the University of California at Los Angeles) in establishing content standards is supported by the Federal government. Perhaps the work of the National Council of Teachers of Mathematics (NCTM), which released the mathematics standards in 1989, has been the most prominent among these organizations and has had the greatest impact to date. The NCTM publications, Curriculum and Evaluation Standards for School Mathematics (1989) and Professional Standards for Teaching Mathematics (1991), are guiding the teaching and assessment of mathematics in several states and school districts across the nation. (Published in May 1995, Assessment Standards for School Mathematics is likely to be just as influential.) The NCTM standards, for example, promote the evaluation of students' mathematical problem-solving and communication skills through the use of applied mathematical problems. Similarly, documents published by the American Association for the Advancement of Science (AAAS), such as Science for All Americans (1989) and Benchmarks for Science Literacy (1993), have influenced the teaching and assessment of science.
As previously mentioned, the SCANS reports, too, have been active in prodding school systems toward more performance-based assessments. The SCANS work, in fact, is pertinent to GOALS 2000; SCANS competencies, which, among other things, emphasize interpersonal skills and intelligent use of information and technology, have a direct relationship to what students learn in classrooms. The SCANS commission envisioned setting proficiency levels for SCANS competencies and developing an associated assessment system based on demonstrating SCANS competencies through applied, contextualized problems.
Useful catalogues of performance assessment activity at the state and district levels include The Status of State Student Assessment Programs in the United States (1995) and State Student Assessment Programs Database (1993-1994), both by the Council of Chief State School Officers and the North Central Regional Educational Laboratory, and a survey of local district activity by Hansen and Hathaway (1991). These catalogues highlight the growing popularity of performance assessments. Information about activity at the school level is more difficult to obtain, as it is circulated largely by word-of-mouth, through professional networks, or by an occasional journal or newspaper article.
Similarly, there are many small-scale, pilot, or research and development efforts underway that may be funded by state agencies or even by the Federal government through its national research centers and laboratories. For example, the National Center for Research on Evaluation, Standards, and Student Testing (CRESST) at the University of California at Los Angeles and the North West Regional Educational Laboratory are involved in research on the development, implementation, and effects of performance assessments. These small-scale, local-level efforts are very much a part of a national trend, but they are difficult to catalog in a systematic fashion.
Developments at the state level are more dramatic than those at the national level. States committed to performance assessments as public policy are slowly increasing in number. For example, in the 1993-94 academic year, 38 states used writing samples to assess student writing proficiency, 25 included performance-based items in their assessment systems, and 7 included portfolios (Bond & Roeber, 1995).
In the late 1980s and early 1990s, a number of states (e.g., California, Connecticut, Maryland, Vermont) became trailblazers in the development and implementation of more innovative performance-based assessments. Currently, the most notable of these states are Vermont, Kentucky, and Maryland. Vermont, perhaps, is the most innovative of them all, being the first to fully implement a portfolio-based performance assessment system in writing and mathematics. Kentucky and Maryland also administer performance events once a year. Other states, such as Oregon, were not far behind in designing and implementing ambitious performance-based assessment systems. However, despite the great deal of energy going into these reforms, public backlash in some areas has given rise to a hostile climate for such reform endeavors. California's bold move with its California Learning Assessment System (CLAS), for example, was vetoed in 1994 by Governor Wilson. The program ended under an avalanche of criticism from parents that the assessments required their children to read distasteful materials and invaded family privacy by asking intrusive questions. (The future of Oregon's plans for assessment reform, too, is uncertain.)
Most states experimenting with performance-based assessments are explicit in their desire and intention to use the new assessments to influence instruction in the direction of conceptual, holistic teaching and learning, in addition to being interested in program evaluation.
There is some evidence that the use of performance assessment systems has achieved the aim of influencing instruction in the desired direction. For example, Vermont's surveys show that teachers have changed their instructional approach to align with project-based, holistic teaching (Koretz, Stecher, Klein, McCaffrey & Diebert, 1993), and Kentucky teachers have changed their instructional strategies as a result of Kentucky's system of portfolios and performance events (Kentucky Institute for Education Research, 1995). Evaluation of data collected from teachers in these two states indicates that teachers are asking their students to write and to work together in groups. However, most other evidence is anecdotal and is best established in terms of teacher performance rather than in terms of student achievement.
In sum, states have exhibited an extraordinary variety of responses to the advent of performance assessments, from a whole-hearted embrace of portfolios to an apparent lack of interest in new assessment methods. Thus, identifying the factors that facilitate the development and implementation of performance assessments was clearly a challenge for this study.
Assessments being developed at the district level are not as visible as those at the state level because the scale of the reform efforts tends to be smaller. Nonetheless, several districts have taken the lead in developing and implementing performance-based assessment systems and are getting national attention for their efforts. For example, the San Diego City Schools in San Diego, California, is a hotbed of activity. It leads the Southern Consortium of the California Assessment Collaborative with money provided by the California legislature for districts to experiment with performance assessment.
Another example is the public school system in Pittsburgh, which is famous as the site of ARTS PROPEL. It has a Syllabus Examination Project (SEP), 60 percent of which is based on performance assessments (Wolf & Piston, 1988, 1989, 1991). Varona, a school district just outside Milwaukee, uses portfolio assessments with students, teachers, and administrators alike (Pelavin Associates & CCSSO, 1991). South Brunswick, New Jersey, Frederick County, Maryland, Fort Worth, Texas, and Prince William County, Virginia, are other examples of districts that have embraced performance assessments.
Hansen and Hathaway (1991) attempted a systematic survey of assessments at many levels and sent out 433 questionnaires to school districts across the United States. They received only 110 responses, despite a follow-up mailing effort. Short of mailing questionnaires with reply-paid responses to all U.S. school districts, a comprehensive account of district assessment practices does not, at present, seem attainable.
While schools may perceive themselves as powerless to do much in the face of state and district mandates, developing and implementing performance assessments at the school level may be easier than at the district or state levels simply because it is easier to organize change on a small scale. For example, the notable graduation examinations based on performance assessments are at the school level the Rite of Passage Experience (ROPE) at Walden III in Racine, Wisconsin, and the graduation portfolio at Central Park East Secondary School in New York City. Both schools are members of the Coalition of Essential Schools, which, as mentioned earlier, advocates exhibitions as replacements for norm-referenced multiple-choice tests (Sizer, 1992).
Many schools utilize portfolio assessments for writing, and some use them for mathematics and other subject areas as well. The use of teacher-designed observations or records of literacy development also is becoming popular at the elementary school level. The California Learning Record, for example, is an assessment developed for both informal and formal record-keeping concerning early childhood development in literacy and mathematical ability. It is an adaptation of the Primary Language Record (PLR), which was developed by the Center for Language in Primary Education (CLPE) in London, England. Forms of the California Learning Record are being used in California, and a similar adaptation of the PLR is being promoted by the New York City Assessment Network (NYAN) in New York City schools.
In sum, while states cast a wider assessment net and enjoy more visibility in the reform arena, quieter attempts at reform by districts and schools also are generating fundamental changes in education at the most basic level.
1This introduction is a modified version of a chapter, Assessment Reform: Promises and Challenges,by Nidhi Khattri and David Sweet, in M.B. Kane & R. Mitchell (Eds.) Implementing Performance Assessment: Promises, Problem, and Challenges (1996). Hillsdale, NJ: Erlbaum.
-###-