Help ED Improve How We Evaluate State Assessment Systems

We are in the midst of an important shift in K-12 education. Nearly all states are beginning to implement college- and career-ready content standards and are in the process of developing new aligned assessment systems to measure whether their students have the knowledge and critical skills they need to be ready for tomorrow’s jobs. These new systems are in direct response to educators and parents asking for assessments that are more than just “bubble tests,” and provide better information to inform and improve teaching and learning in our classrooms. Do you have ideas for how ED should evaluate states’ assessment systems? Do you have thoughts on how we should support states during a time of transition to new, higher standards? We are asking for your input between now and September 30, 2013.

As required by the Elementary and Secondary Education Act (ESEA), ED reviews and approves certain state assessments through panels of peer experts. More information about ED’s process is available here. This peer review process has been instrumental in helping states improve the reliability of their assessment systems and the accessibility of these assessments for all students, including students with disabilities and English learners. But in order to keep up with the new and more robust demands of what high-quality assessments need to be able to do, on December 21, 2012, ED suspended this peer review process in order to update it to align with the vision of what high-quality assessments should be. Specifically, in ESEA Flexibility, ED defined a high-quality assessment as one “that is valid, reliable, and fair for its intended purposes; and measures student knowledge and skills against college- and career-ready standards in a way that:

  • covers the full range of those standards, including standards against which student achievement has traditionally been difficult to measure;
  • as appropriate, elicits complex student demonstrations or applications of knowledge and skills;
  • provides an accurate measure of student achievement across the full performance continuum, including for high- and low-achieving students;
  • provides an accurate measure of student growth over a full academic year or course;
  • produces student achievement data and student growth data that can be used to determine whether individual students are college and career ready or on track to being college and career ready;
  • assesses all students, including English Learners and students with disabilities;
  • provides for alternate assessments based on grade-level academic achievement standards or alternate assessments based on alternate academic achievement standards for students with the most significant cognitive disabilities, consistent with 34 C.F.R. § 200.6(a)(2); and
  • produces data, including student achievement data and student growth data, that can be used to inform: determinations of school effectiveness for purposes of accountability under Title I; determinations of individual principal and teacher effectiveness for purposes of evaluation; determinations of principal and teacher professional development and support needs; and teaching, learning, and program improvement.”

ED is asking the public, and in particular experts in assessment, to respond to the following questions related to our peer review of state assessment systems. This is the first step in our process to review and revise our system to evaluate state tests.

  1. What types of evidence can and should a state provide to demonstrate that its system meets the elements of a high-quality assessment system? What benchmarks or rubrics can ED establish to help evaluate the evidence submitted by states? What are best and most-promising practices that ED should consider with respect to the topics below for providing guidance to states in documenting the quality of their assessment systems and to peers regarding how to evaluate that documentation?
    1. Alignment of tests and items with college- and career-ready content standards.
    2. Measuring higher-order thinking skills.
    3. Demonstrating the validity of assessment results for their intended purposes, both for the first operational administration of the assessments and on-going evaluations.
    4. Accessibility for English learners and students with disabilities.
    5. Measuring performance across the full performance continuum, including high- and low-achieving students.
    6. Measuring individual student growth.
    7. College- and career-readiness and academic achievement standards-setting.
    8. Computer-adaptive assessment algorithms.
    9. Additional specific documentation necessary to confirm the quality of alternate assessments based on alternate academic achievement standards for students with the most significant cognitive disabilities.
    10. Test security and integrity, including maintaining the security of computer-administered assessments.
    11. Any additional aspects of a high-quality assessment aligned to college- and career-ready standards that ED should include in its review of state assessment systems.
  1. ED is considering how to improve its process to conduct assessment peer reviews.
    1. Are there components of ED’s current process that can or should be revised or are there aspects ED should add?
    2. Documenting the technical quality of assessments is an on-going activity (e.g., documenting that the assessment results provide valid inferences of college- and career-readiness likely requires more than one year of operational data or longitudinal studies). How should ED consider the states’ on-going assessment development and documentation of the quality of its assessment system? Should ED establish criteria at various points in time of the assessment lifecycle?
    3. Are there models or best practices in conducting peer reviews that are applicable and practical for state assessment systems?
    4. How can ED use the peer review process to support states as they continually improve their assessment systems over time?

We encourage all interested parties to submit opinions, ideas, suggestions, and comments pertaining to how best to measure the quality of educational assessments. Respondents are encouraged, but not required, to address all of the questions above. All responses should be emailed to ESEA.Assessment@ed.gov by September 30, 2013. Please use the subject “Title I Peer Review” in your email. Please clearly identify the question(s) to which you are responding.

The fine print: Responses must be related to Title I assessment peer review, should be as specific as possible, and, as appropriate, be supported by data/relevant research. All opinions, ideas, suggestions and comments are considered informal input. ED will not respond to individual comments or emails, will publicly display all those that are appropriate, and may or may not reflect input provided in the policies and requirements of the Department. If you include a link to additional information in your response, please ensure that the linked-to information is accessible to all individuals, including individuals with disabilities. This is a moderated site. That means all responses will be reviewed before posting. Additionally, please do not include links to advertisements or endorsements; we will delete all such links before posting your comment.

ED intends to post all responsive submissions in a timely manner. We reserve the right not to post comments that are unrelated to this request, are inconsistent with ED’s Web site policies, are advertisements or endorsements, or are otherwise inappropriate. To protect your own privacy and the privacy of others, please do not include personally identifiable information such as Social Security numbers, addresses, phone numbers or email addresses in the body of your comment. For more information, please be sure to read the comments policy.

Thank you for helping ED consider how to better evaluate and support states as they develop the next generation of assessment systems.

13 Comments

  1. The American Speech-Language-Hearing Association (ASHA) is pleased to have the opportunity to comment on the U.S. Department of Education’s blog on evaluating state assessment systems. ASHA is the national professional, scientific, and credentialing association for more than 166,000 members and affiliates who are audiologists, speech-language pathologists, speech, language, and hearing scientists, audiology and speech-language pathology support personnel, and students. Audiologists specialize in preventing and assessing hearing and balance disorders as well as providing audiologic treatment, including hearing aids. Speech-language pathologists (SLPs) identify, assess, and treat speech and language problems, including swallowing disorders.

    ASHA is pleased to submit the following comments to be considered in the Department’s peer review of state assessment systems:

    General Comments
    Here are some general areas we’d like for you to consider as you review and revise your system to evaluate state tests:
    • How much instructional time is taken away for test preparation?
    • Can the results of the tests drive instruction? Is it discriminating enough to break down skills to drive remediation?

    Additional considerations include the following.
    • Utilize multiple measures, as opposed to a single measure.
    • Align assessments to the common core state standards, language, and literacy so that there is a high correlation between the test and common core standards in order for instruction to occur broadly across all common core state standards.
    • Provide true accessibility for students so that everything is individualized.

    Accessibility for English learners and students with disabilities.

    Comments:
    It is important that states recognize the need to update and revise their English language proficiency development standards to ensure that they correlate with the common core state standards (CCSS). CCSS require a level of English language proficiency that exceeds those whose English language skills are still developing. When a student’s English language proficiency is not accounted for and proper testing accommodations/modifications are not given, an accurate assessment of the student’s skills cannot be determined.

    Recommendations:
    1. English language proficiency development standards in each state should be revised to ensure that they correlate with the CCSS.
    2. Appropriate testing accommodation/modifications to English Language Learners (ELL) and students with disabilities should be provided. Those accommodations/modifications should be appropriate to the individual and based on recommendations by second language acquisition experts for ELL students.

    Rationale:
    Students who are English Language Learners and students with disabilities are provided with a variety of accommodations and adaptations for their day-to-day assessments and activities because of the need for this level of support. It is important to extend those supports to all types of assessments to ensure that the assessment provides a clear picture of the student’s abilities.

    Comments:
    In order to succeed in the general education curriculum, students with hearing loss will need individualized accommodations and supports that are unique to their disability. This includes the need for sophisticated personal and classroom technology and educational support from professionals with expertise in their technology, communication, and hearing related needs.

    Recommendations:
    1. Linking individualized education program (IEP) activities to content standards will help to ensure that students with hearing loss–as well as other students with disabilities–have opportunities to reinforce the CCSS addressed in their classrooms.
    2. Because of their unique needs, students with hearing problems should be supported by related services professionals (e.g., educational audiologists, speech-language pathologists, teachers of the deaf or hard of hearing) who have expertise in hearing loss and how this will impact classroom learning.

    Rationale:
    Students with hearing loss can meet the high expectations of the CCSS but many will require individualized accommodations and related services that allow them to access instruction in the auditory learning environment of today’s classroom.

    Measuring performance across the full performance continuum, including high- and low-achieving students.

    Recommendation:
    Incorporate Universal Design for Learning (UDL) principles in all state assessments for all students. UDL should be inherent in content and administration. Students should demonstrate success in different modalities (i.e., curriculum-based classroom assessments), and tests should provide the opportunity to demonstrate success in different modalities.

    Rationale:
    UDL is a set of principles for curriculum development that gives all individuals equal opportunities to learn. Individuals bring a huge variety of skills, needs, and interests to learning. Neuroscience reveals that these differences are as varied and unique as our DNA or fingerprints. From pre-kindergarten to graduate school, classrooms usually include learners with diverse abilities and backgrounds, including students with physical, sensory, and learning disabilities, differing cultural and linguistic backgrounds, varied preferences and motivations for learning, students who are unusually gifted, and many others.

    UDL supports teachers’ efforts to meet the challenge of diversity by providing flexible instructional materials, techniques, and strategies that help teachers differentiate instruction to meet these varied needs. It does this by providing the following options.
    • Presenting information and content in different ways (the “what” of learning)
    • Differentiating the ways that students can express what they know (the “how” of learning)
    • Stimulating interest and motivation for learning (the “why” of learning)
    For more information on UDL, visit the CAST website.

    Measuring individual student growth.

    Recommendation:
    Student assessments (particularly those for students with disabilities) should be based on the current level of functioning and aligned with the CCSS, not on the assigned grade level.

    Rationale:
    It is important to test students at their functional level, which may or may not be aligned with their current grade in school. No assessment of skills should be determined by the grade of enrollment; it should be determined by the level at which the child is performing. This is an assessment of knowledge and skills, not an assessment of enrollment. For students functioning below grade level, including students with disabilities, utilizing an assessment at a lower grade aligned with the student’s functional level and CCSS may be more reflective of their growth and predictive of the program that is needed for continued achievement and sustained growth. Taking a child who reads at the third-grade level and giving him/her an assessment of sixth-grade level content is an unreasonable and inappropriate use of assessments.

    Recommendation:
    Student progress should be measured using growth models, based on the current level of functioning and aligned with the common core state standards, that hold schools accountable for helping students move closer to grade-level achievement.

    Rationale:
    We encourage the Department to continue: 1) to encourage states to submit high-quality and innovative growth-model proposals recognizing that “growth models may show promise for measuring school accountability, giving schools credit for improvement over time, and measuring individual student progress,” and 2) its research on effective growth models and to engage in dialogue with special education and related service provider organizations to develop such models.

  2. There is far too much focus and complication put on assessments and testing. I agree with the other comments that one size does not fit all. We no longer live in an industrialized world, where we can put students through a one size fits all system. We need to focus on developing independent thinkers, self reliance, personal responsibility and leadership. No longer can a student graduate with a degree and expect to find work in their field. Massive student debt and underemployment is a clear sign that something needs to change.

    Students are disengaged because they are bored and see no value in what they are being taught. We must embrace their ideas, allow them to identify and follow their passions and think outside of the “traditional” career paths that are clearly not working for today’s youth.

    A big missing link in education is promoting entrepreneurship. Developing the entrepreneurial mindset is critical to opening new doors of opportunity, reducing underemployment and creating self reliant leaders of the future. This is the greatest gift we can give our youth.

  3. I think we have to make a distinction as to what the goal is for secondary education- and I don’t believe it can sustain itself as ‘career-ready’ institutions.

    In the 21st century, time and time again, we see that jobs require post-secondary education. If this is the case, why would we throw good money after bad trying to create career-ready secondary education? It doesn’t make sense.

    I believe the goal of secondary education should be to create critical thinking citizens. Attempting to do so would allow our country to grow, allow post-secondary students to be ready for college, and add value to our communities overall.

    This distinction needs to be recognized because the sustainability of secondary education depends on setting an achievable goal, and measuring success based on actual results.

    I do not believe that career-readiness is something that can- and will- be achieved in secondary education now, or in the future.

  4. The problem with tests today is that all we know how to do historically is provide a score based on how many questions on the test were right or wrong. By comparison, questions asked of one human by another have many degrees of right and wrong and responses can be tailored specifically to each of those degrees. Unfortunately the only way we can add reliability to a traditional test is to keep adding more questions and hope that answer consistency will help us to better predict a student’s relative skill level as compared to the expected knowledge framework of the entire test. When we max out the number of questions that can reasonably fit on a test, then we have to resort to more testing for the same reason.
    There is nothing wrong with scoring based on standards, assuming that is what actually gets done. Assuming each test item is aligned to a learning goal and associated with some scaled level of knowledge requirement in reaching that goal, then the results can be used both summative in judging the system and also formative for use in helping the student and their life coaches in seeing what understandings are missing.
    The problem is that a scaled score just can’t offer that level of information. Even a report of what or how many items on a test that were related to a specific standard, a student answered right or wrong can’t help much without the specific data around what level of knowledge each of those items required. And what if a student answered the harder questions about a topic but somehow was unable to answer the easier questions? Does this really indicate a reliable and proper mental model of understanding?
    To the degree that these tests are being scored by some logical conjunctive methods by standard, indicating clearly the estimates of where each student sits with their level of understanding on each learning goal, and not just by adding up the number of right or wrong answers and comparing that to other students who took the same test, I would think they will provide significant value. However, if all they provide is more scaled scores and grade level predictions, I suspect they are potentially going to fall pretty short of the mark.
    Methods exist to build and score more reliable measurement without resorting to larger tests and more testing events. These methods leverage additional data points like level of knowledge and knowledge confidence to help really assess students beyond the bubble and offer detailed individualized analysis. I think it will be critical for States to explain how their EOC exams will be leveraging these new data points to derive fair reliable results of the knowledge and critical skills owned by students and offering these more student centric outputs for the timely use by both formative and summative applications.

  5. Please. Please stop trying to impose uniform standards on young people who are different in what they think and how they learn. Standardized assessment yields nothing more than a standard bell curve — high, medium and low — as the last 20 years have shown again. Personalize learning and measure the extent to which each student can attain her goals.

  6. I am concerned that children aren’t meeting basic skills requirements and now we are switching directions and testing for aptitude in the workforce? Until last year, my children’s high school had two tracks: a college course track and a career course track or technical track. The college course track was seen as the more desired track and often students who struggled in school were ushered into the technical track. Is that where we are headed again? If so, why not make the course curriculum more relevant to the real world? Students should be able to take what they learn in a class and apply it to a real world, not just take a course to satisfy the state’s idea of what a student needs to know. If they are learning economics then they should learn how to apply it, not jus memorize terms, dates and theories. Knowledge without application is useless.

  7. “Alignment” is a necessary but far from sufficient condition for tests intended to evaluate schools or teachers. When we employ test results to make evaluative decisions about teachers or schools, we assume that students’ higher scores indicate better instruction has taken place, while lower scores reflect lower quality instruction. Yet, a test can be in complete alignment with a set of intended curricular outcomes, and be accompanied by no evidence whatsoever that students’ scores are reflective of instructional quality. One of the legitimate purposes of state tests is to evaluate–both schools and teachers. Accordingly, those tests should be accompanied by evidence, both judgmental and empirical, that the tests are sensitive to instruction, that is, are able to distinguish between well taught and badly taught students. Absent such evidence of instructional sensitivity, we cannot determine the evaluative appropriateness of such tests. Hence, we should not use them to evaluate–anything.

    • You might want to read the following: “How Standardized Tests Are Affecting Public Schools” by Valerie Strauss. Please read the entire article. However, I believe that you will all find the research information about Additional Concerns Relative to Testing Costs in the article quite interesting. The Answer Sheet article was posted on May 18, 2012. Ms. Strauss is a writer and education blogger who started writing for the Washington Post in 1988. http://www.washingtonpost.com

  8. Are State Required Tests, for instance, the End Of Course Exams (EOC), encouraged and recommended type tests that are implemented to show progress and/or what area a student needs to focus on more, or are the tests being used instead to forcefully demand students to sit for hours at a time to complete them in order to satisfy some other unknown agenda? Instructions are given to students to make a guess if they don’t understand a question and that they must answer all questions whether certain about the answer or not. All of the students are ushered in to take the exam even though he or she is passing the course with an A, B, or C average.

    There are some teachers that are willing to ignore all previous passing course work grades and penalize the student according to the outcome of the EOC exam, meaning a student who may have actually passed the course but will lose a letter grade or fail if the EOC exam results are less than perfect. Students are placed in a pressure cooker of a test room, where so many other students share the same feelings of anxiety and who are just as nervous knowing that their emotions are at an all-time high but they must score well to maintain for many an already passing grade. How devastating it must be for a child to do their best, pass their class, and then to be told “Sorry but your best just wasn’t good enough after all.”

    I keep hearing about School Budgets and that there aren’t enough funds for decent nutritious lunch programs, supplies, and other much needed school related equipment. There are also other related education programs that have also suffered. And then there are those programs in areas like music and the arts that have been neglected for so long because of lack of funding that will be phased out until they no-longer have a chance to exist.

    Therefore, why waste money on tests for a course in which a student has clearly passed and has demonstrated that he or she has understood the material throughout the school year? Why not allocate the money saved from the abundance of tests, the not so required ones like the EOC, towards teacher’s salaries and the areas that are in jeopardy. Besides, a teacher’s salary is not near where it should be compared to other professions that require less commitment.

    • In Addition To the Recent Comment: According to teachers I spoke with and comments made by other educators in reference to the EOC Exam, exams are also comprised of trick questions, questions for material not covered during the school year, as well as advanced questions to determine if a student has an innate gifted ability that exists in a course of study. In essence the EOC could end up doing more harm than good. Depending on a teacher’s perception of exam scores based solely on the day a child took the exam, their previous GPA in a class could become null and void. This action alone can cause a GPA shift to determine who gets what upon graduation, aid in determining class standing, and who gets scholarships, etc. Such a perception could actually stunt a child’s academic growth instead of encouraging it. For some students it could very well be the determining factor to drop-out.

    • As a high school English teacher, I have been frustrated with the lack of writing ability, historical context, and vocabulary in my students. Part of this issue stems from too much standardized testing. Teachers are forced into a position of choosing test over content. Gone are the mathematical drills that secured student knowledge, in favor of making sure students have a brief glimpse of the different types of math that will appear on the test. Spelling and vocabulary are out as well, as they do not pertain to passing a basic level test. The one thing I will say is that, having taken a long look at the new eleventh grade federal test, I am excited that it is relevant to what I actually do teach in the classroom. But it is a college readiness test and I am not sure where that will leave those students who are not going to college. How will it impact their graduation?

  9. First of all let’s address the standards in which you are referring to (Common Core). These standards are ridiculous in every sense of the word! You want 6 year olds to know what ziggurats are and a kindergartener to know what molecules are (look at NY ELA modules)? They are developmentally inappropriate! They are learning their name and the names of their classmates. Be realistic.

    Secondly, we are testing too much! My kids sit through hours and hours of useless testing. Assessments are important when they can inform the teacher on a student’s growth. Tell me how standardized tests do this? They don’t ….the results come out 5 months after the test. They give a score, but nothing else. Teachers can’t see the tests, so how can they see what Junior answered? How was Junior confused? Then, there is the expectation that a 4 year old can sit in front of a computer to take a test. This is child abuse! It is child abuse for a 3rd grader to sit through hours of testing. And it is in poor judgement that a special needs student be tortured with these as well.

    All of this has resulted in kids hating school. If all they are ever told is they are a failure, then why would we wonder why they drop out of school? Kids need choices. Maybe they aren’t cut out to be a rocket scientist, but would make a great carpenter. Let’s foster that talent and stop thinking “one size fits all”.

    Why do states have to have testing? To compare with other states? To compare with the rest of the world? How much does this testing cost? Lots! Why don’t we take that money, put it into programs that foster economic growth, like art and music? Offer children after school activities that are inspiring, like music, dance, art, sports. You know, things that are fun, but foster student worth.

    We are not moving in the right direction with education “reform”. It is causing more problems. If you want to improve education so the US compares to other countries, like Finland, then take a good hard look at how they do it! We will not get the same results by trying to implement the opposite ideas! Stop demoralizing teachers and students! You are getting nowhere on that stationary bike!

Comments are closed.