Performance assessments vary tremendously in the forms they take and the types of demands they make of students and teachers. Indeed, the assessments at the 16 sites included in this study reveal that the only characteristics shared by all performance assessments are the pedagogical assumptions upon which they are based and the fact that they require students, in some fashion or another, to construct responses to tasks.
The format or structure of performance assessments can have important ramifications for the power of assessment systems as a tool to leverage changes in teaching and learning. In this section, we use the performance assessments in our sample to develop and illustrate a framework for understanding and classifying the format of performance assessments and performance assessment systems.
Exhibit 4-5 illustrates our conceptualization of the relationships among the components of performance assessment systems and the factors that influence their power as a tool of education reform. The exhibit illustrates that (1) one or more assessment tasks and one or more scoring methods are linked to create a performance assessment; (2) multiple performance assessments can be linked to create a performance assessment system; and (3) the relationship of performance assessments to teaching and learning may be characterized across several dimensions, including the extent to which they are integrated with instruction, their linkage to performance and content standards, their level of prescription, and the scope of the pedagogical net they cast.4
The taxonomic scheme we develop regarding the format of assessment tasks, scoring methods, performance assessments, and performance assessment systems builds upon a successively broadening framework; the characteristics of tasks and scoring methods determine the characteristics of a performance assessment, and the characteristics of the performance assessments that comprise the performance assessment system affect the characteristics of that performance assessment system.
The analysis in this section is broken into the following subsections:
Assessment Tasks
The first component of a performance assessment is the task that the student must attempt to complete. Tasks range from short, on-demand tasks to research projects that culminate in exhibitions of student work. Five categories of tasks include the following:
Each of these kinds of tasks requires students to be actively engaged in responding to the task. To varying degrees, students must define the parameters of the task, seek information to complete the task, and, in some cases, explain the processes by which they arrived at the answers.
Exhibit 4-6 summarizes the types of tasks used in the assessment systems studied at each of the 16 sites included in our study. The exhibit illustrates that on-demand tasks tend to be used by states for accountability purposes. Most states also have a goal of requiring students to compile comprehensive portfolios that illustrate student performance over the course of the academic year. Among state-level performance assessment systems, Kentucky's and Vermont's already require student portfolios. New York's pilot high school performance assessment system is based on portfolios, and Oregon's original plan for its assessment reform was to require portfolios (but the likelihood of that plan being implemented is now unclear).
Among districts, Prince William County's Applications Assessments consist only of on-demand tasks, Harrison School District 2's Performance-based Literacy Assessments consist of on-demand and extended tasks, and South Brunswick's Research Performance Assessment is based upon extended tasks and demonstrations. School-initiated assessments, in contrast, consist either of extended tasks, a combination of demonstrations and on-demand tasks, and portfolios. Schools involved in national-level assessment reform efforts tend to select among the assessment tasks recommended by the national group ? for instance, teachers at Ann Chester and Noakes Elementary Schools use portfolios as recommended by NSP. Teachers at two of the elementary schools in our sample also observe their students' involvement in classroom activities in order to plan lessons and to meet individual students' needs.
Dimensions of Task Specification
Individual tasks within each of the broad task categories can be specified along several dimensions, and the combination of those different dimensions determines how a student must respond to the task. In other words, the task dimensions determine the response structure of the task. Hence, represented within each task type is a variety of student response structures (Messick, 1984). In turn, these different response structures yield different types of information about the student's performance.
Exhibit 4-7 summarizes the following dimensions along which the tasks in our sample vary:5 time demands, applied problem-solving skill demands, metacognitive demands, social competencies (i.e., group problem-solving), and student control. Although the five dimensions are not mutually exclusive, each merits a definition and illustration through examples.
The five task dimensions are defined and illustrated below.
Time Demands refers to the amount of time allotted to a student to complete the task. The time allocated to complete the task can vary from a few weeks to less than an hour. For example, on-demand tasks typically require much less time than extended tasks. Preparation time for demonstrations can be quite long, but demonstrations themselves require little time to execute. Nonetheless, even within broad task categories, time demands can exhibit considerable variation.
Cooper Middle School's research assignment, an essay on the intersection of medicine and science, is an example of an extended task that requires several days to complete. Students are asked to research the topic, present sufficient information in their reports to address the topic, and include in their reports visual aids, such as graphs and charts. In contrast, Arizona's, Kentucky's, Maryland's, and Prince William County's on-demand tasks last only a few hours. In each case, students are asked to write an essay or to conduct an experiment without leaving the examination room.
Applied Problem-Solving Skills Demands refers to the degree to which tasks elicit cognitive skills, such as the ability to apply procedural knowledge to complete a complex, multi-step "applied problem-solving" task or the ability to apply factual knowledge to procedures.
Crandall High School's "Residential Zone" is an example of an applied problem-solving task. This one-week project requires students to plan a residential zone with a view to maximizing profits for the real estate developer. For this task, students must work within certain constraints, such as zoning regulations for residential plot size and street widths (thus establishing the task's "applied, real world" scenario). Students are evaluated on their applied mathematical skills, such as conceptual understanding, use of effective procedures and strategies, interpretation of information, and communication of reasoning and results.
Metacognitive Demands refers to the awareness students must exhibit of their own thinking and problem-solving skills. The task may require students to explain their thinking or the procedures they used in solving a problem. Tasks that attempt to capture such complex skills also are intended to invite student engagement and to motivate student involvement in the process of assessment.
Maryland's on-demand tasks require students to respond to a series of questions that lead to a solution or a decision, accompanied by an explanation or rationale for the student's responses. Kentucky's on-demand tasks ask students to work together in groups to solve problems, but they require students to construct individual responses that describe the process their group followed to solve the problem and the reasoning behind the conclusions they drew. Students also are asked to discuss in their responses whether or not they agreed or disagreed with their group's conclusions and why.
Social Competencies refers to the interpersonal skills students must use in order to complete a task. A task may require a student to work with other students in a group to complete all phases of the task, or he or she may be required to collaborate with others on only one aspect of the task. (If a student must collaborate with a team in order to complete a task, the inferences one could draw regarding the student's performance may be quite different from the inferences one would draw had he or she attempted the task by himself or herself.)
Kentucky's and Maryland's on-demand tasks sometimes require students to work in small groups to understand the task and to collect data prior to recording responses independently. For instance, one Kentucky task involves groups of four students working together to test which of several instruments available to them is a more effective tool for separating oil from water in a simulated oil spill. As described above, students are then asked to construct individual responses that discuss the processes their group followed and the reasoning behind their conclusions. In all such group activities, setting up or performing the task is a group activity, but the final student responses are independent.
The task of constructing a portfolio also can, in some cases, require some elements of group problem-solving. For example, at Maple Leaf Middle School in Vermont, the language arts teacher requires her students to critique and confer with one another about the writing samples that are to be included in their writing portfolios, blurring the line between independent work and group work.
Student Control refers to the degree of judgment a student must exercise in defining and completing the task. The response structure of the assessment tasks can range from being very tight, allowing students little leeway in defining the parameters of the task (i.e., topic, resources, length of procedures, products) to very loose, requiring students to formulate the task themselves. The more directive the task, the less control the student has in the types of "correct responses" he or she can give and in the types of procedures he or she can devise to complete the task.
South Brunswick's Sixth Grade Research Performance Assessment is an example of a system in which students exercise a fair amount of control over the assessment task. Students must decide upon a topic related to the "American Experience," use several sources of information to write their research papers, and determine the length of their paper. The structure of Thoreau's Rite of Passage Experience also is quite loose. Students must write essays and demonstrate their proficiency in a number of subject areas. However, students themselves, within a specified structure, choose the topics for their essays, sources of information they use in their essays, and the design of their demonstrations. At times, students choose to combine two or three topics into one demonstration.
In contrast, the structure of the Maryland Student Performance Assessment Program's on-demand 8th-grade "Birth Dates" mathematics task is tight, allowing students little control for specifying the task. For this task, student are given information on the percentage of people born in each month of the year. Based on this information, they are asked to respond to a series of sub-tasks, such as constructing a graph for the birthday data and calculating how many students in the school are likely to share the student's birthday month (given a certain number of students in the school). Hence, students exercise very little control over either the topic or the length of their response.
The five dimensions of assessment tasks ? time demands, applied problem-solving skill demands, metacognitive demands, social competencies, and student control ? are not entirely independent. For example, if the task is intended for evaluating a student's competency in mathematical operations, the student control and social competencies aspects of the task may have to be limited. Time demands, too, can limit the amount of control a student has in constructing responses to an assessment task. Hence, one task dimension may necessarily limit other task dimensions. Therefore, the specification of these dimensions depends, in part, on the intended purpose of the task.
Scoring Methods
One part of the assessment is the task. The second part is the scoring method. The scoring or evaluation instrument or method is used to judge the quality of the student's performance on the task. In many cases, these scoring methods are called scoring rubrics. Scoring rubrics are a pivotal feature of assessment reform, as they both specify the knowledge and competencies for which student work must be evaluated and delineate the criteria for determining the quality of student work. Through the combination of the task specification (e.g., on-demand tasks that elicit social competencies and portfolios that elicit metacognitive skills) and the scoring method, states, districts, and schools have attempted to articulate and communicate the skills and competencies that are important to teach and to assess.
Four broad types of scoring or evaluation methods evidenced in the performance assessments included in this study are:
At the state level, specific scoring rubrics are used in the Arizona, Kentucky, and Maryland performance assessment systems with on-demand tasks. Teachers at the New York portfolio pilot site also have developed specific scoring rubrics for extended tasks. Kentucky, Oregon, and Vermont, on the other hand, use generic scoring rubrics to guide teachers in developing and scoring tasks that are included in portfolios. (Note that the Kentucky system uses specific rubrics with one type of assessment task and generic rubrics with another type of task.)
The use of generic scoring rubrics has posed some problems in the implementation of Kentucky's and Vermont's portfolio assessments. Teachers who are responsible for constructing tasks for inclusion in the portfolios have not always been able to develop tasks that conform to and capture the skills and competencies articulated in the generic rubrics. Consequently, the two states have experienced difficulties in standardizing their portfolio assessments and in drawing inferences about the performance assessment results. For example, in some cases in 1993-94, Vermont teachers did not design mathematics tasks that were challenging enough for their students; consequently, students' skills and competencies were not adequately elicited by such tasks.6
The skills and competencies articulated in the two states' generic scoring rubrics in mathematics and language arts are shown in Exhibit 4-8 (for purpose of comparison, Oregon's generic rubric for math and science is included in the exhibit as well). As is illustrated, the generic rubrics are similar in many respects. All three state-level mathematics rubrics stress conceptual understanding, effective problem-solving procedures, and effective mathematical communication strategies. The Kentucky and Vermont language arts scoring rubrics stress purpose, organization, effective development of ideas, and effective and correct language usage.
At the district level, Harrison School District 2 utilizes specific rubrics with extended and on-demand tasks, and Prince William County utilizes them with on-demand tasks. In contrast, generic scoring rubrics are used in South Brunswick with the 6th Grade Research Performance Assessment. The difference in choice of rubric type in these three cases is clearly driven by the features of the assessment tasks employed, for the South Brunswick performance assessment allows students to select their own topics to conduct a series of writing and presentation tasks (only a generic rubric could be used in this scenario), while the other two districts specify the tasks students respond to more tightly and can write rubrics specific to those tasks.
School-initiated performance assessments, whether supported by national assessment reforms or not, use a mix of specific and generic scoring rubrics. At these schools, specific rubrics serve a dual function of guiding students in their assignments and helping teachers in determining student grades. On the other hand, generic rubrics help teachers design different tasks that tap into broad content knowledge and critical thinking domains. Because most of the rubrics in school-initiated systems are developed at the local level, teachers understand what skills and competencies are to be evaluated using the rubric. Hence, rubric-task alignment appears to have posed fewer problems at these schools than it did at some schools working with the Kentucky and Vermont performance assessment systems (as described above).
The implementation of two performance assessments included in this study involves no formal scoring method. The Primary Learning Record is a qualitative method of observing, recording, and analyzing student progress and intellectual development. Certain questions guide teachers' observations of their students' classroom behaviors, but the evaluation itself includes no method for quantifying these observations. Thoreau High School's Rite of Passage Experience (ROPE) also has no formalized scoring method associated with it. Students receive grades on ROPE and its components, but teachers use their own criteria and standards to judge student performance.
In sum, scoring methods used to judge student performance on tasks range from generic scoring rubrics, which are applicable to any number of tasks in a given domain, to specific scoring rubrics, which are applicable only to one or a few tasks, to implicit, unarticulated scoring criteria. The performance assessments in our sample include a variety of scoring methods, ranging from those that are quite detailed and clear in the criteria they explicate for judging the quality of student work, to others that are no more than checklists of the elements ? "existence proofs" ? that must be present in student work. The data collected in this study suggest that generic rubrics, in particular, can function as highly sophisticated instruments of education reform, for, if well constructed, they articulate the general skills and competencies that the state or other education agency believes are important to assess, and, therefore, to teach.
5An in-depth analysis of precisely where each task for each of the sampled performance assessments or performance assessment systems falls along the dimensions outlined in their subsection is not possible. It was beyond the scope of this study to collect the massive quantities of data required for conducting such an analysis. Hence, only representative tasks are classified in the table. The sampled tasks do not necessarily characterize the entire performance assessment system to which they belong.
6Assessment Results. Writing and Mathematics. 1993-1994. Vermont Assessment Program.
-###-