High-Stakes Standardized Testing: A Panacea or a Pest?
If you talk to any person in the field of education today about high-profile and controversial issues in their field, the topic of high-stakes standardized testing will inevitably come up. This issue has been around for a long time, but was really introduced into the minds of the public through the standards-based reform movement of the 1990s. The notion of standards-based reform focuses on three main principles: state-level content standards in core areas, corresponding tests, and accountability for results (Smith & O’Day, 1990). This reform model has strongly guided education reform in this country for the past two decades.
The effect was first seen on a large scale in President Clinton’s Improving America’s Schools Act of 1994, followed by President Bush’s No Child Left Behind Act (NCLB) of 2001. Both acts were reauthorizations of the Elementary and Secondary Education Act of 1965 (Cohen & Moffitt, 2009). Bush’s NCLB is what generally comes to mind when people think of high-stakes tests. This act set a definition of adequate yearly progress (AYP) that schools must meet. AYP is measured by student achievement on state-specific standardized tests. States and districts are held accountable for students’ performance, and if schools fail to meet AYP, there are certain measures that are taken. The accountability and the consequences are what make these tests “high-stakes.” Closing the achievement gap is one of the main priorities of NCLB, which specifies methods of school improvement (K. McDermott, personal communication, November 15, 2010).
The mandated tests are probably the most controversial piece of this legislation. State-wide tests have been around since the 1970s when students took tests of minimum competency. These tests have evolved since then, but only became truly required nation-wide in 2001. The act required that students be tested in language arts and mathematics annually in third through eighth grade and once in tenth grade, and were tested in science at least three times, once in each grade band (K. McDermott, personal communication, November 8, 2010). These strict requirements put a lot of focus onto these tests. Both the education community and the general public quickly chose sides on whether the tests were good or bad. The reality of the situation is that there is no clear answer, and it is more valuable to look at both pros and cons of high-stakes testing to get a better understanding of the strengths and weaknesses. From this information, it will be possible to come to general conclusions and implement necessary changes in order to ultimately increase student learning and achievement. But first, it is important to understand the many purposes of the tests.
There are a multitude of goals and purposes of both low-stakes and high-stakes tests. Primarily, tests provide evidence to conclude if taxpayers’ money is being well spent. Tests are a way to show policymakers how well schools are doing. Some level of accountability is seen as valuable to assess the schools themselves so that poor performing schools can be properly dealt with. Tests may also be used as a vehicle for change. When performance can be assessed, other decisions can be made based on those evaluations. Another idea is that the a correct mix of rewards and punishment for test scores will motivate students and teachers to do better and work harder (Madaus & Russell, 2010).
These tests provide students, parents, and teachers, as well as educators, reformers, policymakers and administrators, with feedback on students’ educational performance and progress. Tests are used to compile data for monitoring changes in student and school performance. High-stakes tests also provide data for accountability reasons such as NCLB requirements (R. Hambleton, personal communication, November 30, 2010). Overall, tests measure outcomes at a student, school, district, state and national level. Clearly, there are many hopes for what these tests can do. The question is if they are truly succeeding. Policymakers, researchers, teachers, students, reformers, educators and the general public all have different opinions on this question. I will now go into both sides of the argument and try and explore various points.
This side of the argument seems to be more prominent than the other. Especially in the past few years, it seems that more and more people believe that high-stakes tests are harmful to our education system. There are numerous reasons for holding this opinion. Alfie Kohn, in his 1999 book The Schools Our Children Deserve, goes into a thorough argument against these tests. He starts off by saying that scores in a particular year reflect multiple years of learning and that it does not make sense to hold one teacher accountable for the years of past education that a student has had. He also claims that high-stakes on tests encourage “teaching to the test.” This phrase refers to spending excessive amounts of time on test preparation. This allows students who may not be fully prepared academically to still do well since they have learned test-taking skills. This commonly cited phenomenon alters scores as measures of student knowledge, since they are rather measuring students’ test-taking ability (1999). Diane Ravitch also talks about teaching to the test, stating that “excessive test preparation distorts the very purpose of tests, which is to assess learning and knowledge, not just to produce higher test scores” (2010, p. 160).
Kohn also discusses specific negative effects on teachers. These include setting teachers against each other with an atmosphere of competition, skewing teachers’ priorities, and making teachers defensive as they try to show that low scores are not necessarily their fault. He states that unfortunately “high-stakes testing routinely drives good people out of the profession” (1999, p. 99). Whether this statement is fact or opinion, at this point it is clear that the tests affect teachers and their work so much that it has changed the very nature of the profession.
Kohn continues by bringing up various unintended consequences of teaching behavior. This includes focusing all attention on students right below the proficient level and ignoring everyone else, failing low achievers so that they get another chance to take the exam, and assigning low achievers to special education classes (1999). Low-performing students may be excluded from schools or kicked out of schools they currently attend since they lower overall scores (Ravitch, 2010). Other extreme consequences may even include behaviors as severe as cheating and altering answer sheets (Kohn, 1999). States also have other, less obvious ways of cheating. They may make their tests easier, or simply lower the cut scores (Ravitch, 2010). All of these practices lead to a decrease in both the validity and the reliability of the tests. These practices distort the true meaning of scores and are clearly not in the best interests of the students.
Many consequences directly affect the students taking these tests. Kohn states that there have been some studies that have found that the use of graduation tests results in increased drop-out rates (1999). In Florida, there are certain state policies that encourage limited English proficiency (LEP) students to drop out of high school (Giambo, 2010). In Massachusetts, a state that has received accolades for their high-quality exams, low-income urban students who just barely fail the mathematics exam have a graduation rate of 8 percentage points lower than similar students who just barely passed (Papay et al., 2010). If these tests are directly related to students leaving school either because they get discouraged or because they are unable to pass the tests, clearly something must be changed about the graduation aspect. Kohn says that another problem with the tests is that they are so publicly presented: “it’s hard to get an accurate sense of how children (or schools) are doing by using a test that is also going to be the basis for public judgment” (1999, p. 197). One commonly used and commonly criticized practice is posting exam results in local papers – a practice that adds unnecessary stress to everyone involved.
Moreover, the accuracy of the scores themselves have been questioned. Some believe that the scores that students receive on these tests reflect more on the education and wealth of a student’s family than on their academic achievement (1999). Ravitch agrees with this claim, and says that a student’s score is based on many other factors, including student motivation, parental engagement, student’s state of mind, distractions, etc. The results vary randomly in ways that have little to do with achievement (2010). Once again, the validity and reliability of the scores is questioned. What is the point of these tests if they reflect simply on how much money your family has? If this is really the case then clearly another measure needs to be developed to actually measure academic achievement. Similarly, scores may be misinterpreted at the school level as well. Scores changes likely reflect changes in school composition and not necessarily change in instructional practice (Wheelock, 2002). School populations change from year to year, especially at schools in low-income communities with high transient populations. The students taking the test may be a completely different group of students than who entered the school at the beginning of the year.
Another issue that Ravitch brings up is a phenomenon similar to “teaching to the test” called “curriculum narrowing.” This involves focusing time on subjects that will be on exams, and not spending a significant amount of time on subjects that are not tested (2010). In practice that means spending hours on reading and math, and cutting out valuable programs such as art, foreign language and music. In the end students may excel in some areas but not have any knowledge base for others. Another claim is that tests contain bias and are not fair to all students. Although each state test is different, it has been found that some tests use culturally discriminative language to bias against ELL students (Shannon, 2008). In addition, most tests cause increased levels of student stress and anxiety (Madaus & Russell, 2010). Not only is this harmful for the health of students, but once again, scores that an overly anxious child receives may not be accurately reflecting his or her true knowledge. In these arguments, both the validity and reliability of these tests are questioned. A general way to sum up all these negative aspects of high-stakes tests is that schools may benefit from practices that are not in the best interests of children, and that is simply not acceptable (Wheelock, 2002).
On the other hand, there are many people who are supportive of high-stakes tests. They believe that the benefits of these tests outweigh the consequences, and that high-stakes tests are an important way to improve the education system. I personally believe this side of the argument is not as commonly heard as the other, but that makes it even more important to understand. Gregory Cizek would disagree with my point: “concerns about the extent and cost of testing are overblown, and there is evidence that the depth of anti-testing sentiment in the populace has been overstated” (2001). Cizek writes about positive unintended consequences in his article “More Unintended Consequences of High-Stakes Testing.” He begins by noting the increase in professional development and its’ quality, greater educator intimacy with disciplines, and positive effects on classroom assessment practices. Greater educator knowledge is due to higher accountability for teachers, and therefore more pressure for them to be effective teachers. The positive effects on classroom assessment practices are due to increased levels of professional development. He also notes the creation of accountability systems, which in theory have many positive benefits.Continued on Next Page »