High-Stakes Standardized Testing: A Panacea or a Pest?
If you talk to any person in the field of education today about high-profile and controversial issues in their field, the topic of high-stakes standardized testing will inevitably come up. This issue has been around for a long time, but was really introduced into the minds of the public through the standards-based reform movement of the 1990s. The notion of standards-based reform focuses on three main principles: state-level content standards in core areas, corresponding tests, and accountability for results (Smith & O’Day, 1990). This reform model has strongly guided education reform in this country for the past two decades.
The effect was first seen on a large scale in President Clinton’s Improving America’s Schools Act of 1994, followed by President Bush’s No Child Left Behind Act (NCLB) of 2001. Both acts were reauthorizations of the Elementary and Secondary Education Act of 1965 (Cohen & Moffitt, 2009). Bush’s NCLB is what generally comes to mind when people think of high-stakes tests. This act set a definition of adequate yearly progress (AYP) that schools must meet. AYP is measured by student achievement on state-specific standardized tests. States and districts are held accountable for students’ performance, and if schools fail to meet AYP, there are certain measures that are taken. The accountability and the consequences are what make these tests “high-stakes.” Closing the achievement gap is one of the main priorities of NCLB, which specifies methods of school improvement (K. McDermott, personal communication, November 15, 2010).
The mandated tests are probably the most controversial piece of this legislation. State-wide tests have been around since the 1970s when students took tests of minimum competency. These tests have evolved since then, but only became truly required nation-wide in 2001. The act required that students be tested in language arts and mathematics annually in third through eighth grade and once in tenth grade, and were tested in science at least three times, once in each grade band (K. McDermott, personal communication, November 8, 2010). These strict requirements put a lot of focus onto these tests. Both the education community and the general public quickly chose sides on whether the tests were good or bad. The reality of the situation is that there is no clear answer, and it is more valuable to look at both pros and cons of high-stakes testing to get a better understanding of the strengths and weaknesses. From this information, it will be possible to come to general conclusions and implement necessary changes in order to ultimately increase student learning and achievement. But first, it is important to understand the many purposes of the tests.
There are a multitude of goals and purposes of both low-stakes and high-stakes tests. Primarily, tests provide evidence to conclude if taxpayers’ money is being well spent. Tests are a way to show policymakers how well schools are doing. Some level of accountability is seen as valuable to assess the schools themselves so that poor performing schools can be properly dealt with. Tests may also be used as a vehicle for change. When performance can be assessed, other decisions can be made based on those evaluations. Another idea is that the a correct mix of rewards and punishment for test scores will motivate students and teachers to do better and work harder (Madaus & Russell, 2010).
These tests provide students, parents, and teachers, as well as educators, reformers, policymakers and administrators, with feedback on students’ educational performance and progress. Tests are used to compile data for monitoring changes in student and school performance. High-stakes tests also provide data for accountability reasons such as NCLB requirements (R. Hambleton, personal communication, November 30, 2010). Overall, tests measure outcomes at a student, school, district, state and national level. Clearly, there are many hopes for what these tests can do. The question is if they are truly succeeding. Policymakers, researchers, teachers, students, reformers, educators and the general public all have different opinions on this question. I will now go into both sides of the argument and try and explore various points.
This side of the argument seems to be more prominent than the other. Especially in the past few years, it seems that more and more people believe that high-stakes tests are harmful to our education system. There are numerous reasons for holding this opinion. Alfie Kohn, in his 1999 book The Schools Our Children Deserve, goes into a thorough argument against these tests. He starts off by saying that scores in a particular year reflect multiple years of learning and that it does not make sense to hold one teacher accountable for the years of past education that a student has had. He also claims that high-stakes on tests encourage “teaching to the test.” This phrase refers to spending excessive amounts of time on test preparation. This allows students who may not be fully prepared academically to still do well since they have learned test-taking skills. This commonly cited phenomenon alters scores as measures of student knowledge, since they are rather measuring students’ test-taking ability (1999). Diane Ravitch also talks about teaching to the test, stating that “excessive test preparation distorts the very purpose of tests, which is to assess learning and knowledge, not just to produce higher test scores” (2010, p. 160).
Kohn also discusses specific negative effects on teachers. These include setting teachers against each other with an atmosphere of competition, skewing teachers’ priorities, and making teachers defensive as they try to show that low scores are not necessarily their fault. He states that unfortunately “high-stakes testing routinely drives good people out of the profession” (1999, p. 99). Whether this statement is fact or opinion, at this point it is clear that the tests affect teachers and their work so much that it has changed the very nature of the profession.
Kohn continues by bringing up various unintended consequences of teaching behavior. This includes focusing all attention on students right below the proficient level and ignoring everyone else, failing low achievers so that they get another chance to take the exam, and assigning low achievers to special education classes (1999). Low-performing students may be excluded from schools or kicked out of schools they currently attend since they lower overall scores (Ravitch, 2010). Other extreme consequences may even include behaviors as severe as cheating and altering answer sheets (Kohn, 1999). States also have other, less obvious ways of cheating. They may make their tests easier, or simply lower the cut scores (Ravitch, 2010). All of these practices lead to a decrease in both the validity and the reliability of the tests. These practices distort the true meaning of scores and are clearly not in the best interests of the students.
Many consequences directly affect the students taking these tests. Kohn states that there have been some studies that have found that the use of graduation tests results in increased drop-out rates (1999). In Florida, there are certain state policies that encourage limited English proficiency (LEP) students to drop out of high school (Giambo, 2010). In Massachusetts, a state that has received accolades for their high-quality exams, low-income urban students who just barely fail the mathematics exam have a graduation rate of 8 percentage points lower than similar students who just barely passed (Papay et al., 2010). If these tests are directly related to students leaving school either because they get discouraged or because they are unable to pass the tests, clearly something must be changed about the graduation aspect. Kohn says that another problem with the tests is that they are so publicly presented: “it’s hard to get an accurate sense of how children (or schools) are doing by using a test that is also going to be the basis for public judgment” (1999, p. 197). One commonly used and commonly criticized practice is posting exam results in local papers – a practice that adds unnecessary stress to everyone involved.
Moreover, the accuracy of the scores themselves have been questioned. Some believe that the scores that students receive on these tests reflect more on the education and wealth of a student’s family than on their academic achievement (1999). Ravitch agrees with this claim, and says that a student’s score is based on many other factors, including student motivation, parental engagement, student’s state of mind, distractions, etc. The results vary randomly in ways that have little to do with achievement (2010). Once again, the validity and reliability of the scores is questioned. What is the point of these tests if they reflect simply on how much money your family has? If this is really the case then clearly another measure needs to be developed to actually measure academic achievement. Similarly, scores may be misinterpreted at the school level as well. Scores changes likely reflect changes in school composition and not necessarily change in instructional practice (Wheelock, 2002). School populations change from year to year, especially at schools in low-income communities with high transient populations. The students taking the test may be a completely different group of students than who entered the school at the beginning of the year.
Another issue that Ravitch brings up is a phenomenon similar to “teaching to the test” called “curriculum narrowing.” This involves focusing time on subjects that will be on exams, and not spending a significant amount of time on subjects that are not tested (2010). In practice that means spending hours on reading and math, and cutting out valuable programs such as art, foreign language and music. In the end students may excel in some areas but not have any knowledge base for others. Another claim is that tests contain bias and are not fair to all students. Although each state test is different, it has been found that some tests use culturally discriminative language to bias against ELL students (Shannon, 2008). In addition, most tests cause increased levels of student stress and anxiety (Madaus & Russell, 2010). Not only is this harmful for the health of students, but once again, scores that an overly anxious child receives may not be accurately reflecting his or her true knowledge. In these arguments, both the validity and reliability of these tests are questioned. A general way to sum up all these negative aspects of high-stakes tests is that schools may benefit from practices that are not in the best interests of children, and that is simply not acceptable (Wheelock, 2002).
On the other hand, there are many people who are supportive of high-stakes tests. They believe that the benefits of these tests outweigh the consequences, and that high-stakes tests are an important way to improve the education system. I personally believe this side of the argument is not as commonly heard as the other, but that makes it even more important to understand. Gregory Cizek would disagree with my point: “concerns about the extent and cost of testing are overblown, and there is evidence that the depth of anti-testing sentiment in the populace has been overstated” (2001). Cizek writes about positive unintended consequences in his article “More Unintended Consequences of High-Stakes Testing.” He begins by noting the increase in professional development and its’ quality, greater educator intimacy with disciplines, and positive effects on classroom assessment practices. Greater educator knowledge is due to higher accountability for teachers, and therefore more pressure for them to be effective teachers. The positive effects on classroom assessment practices are due to increased levels of professional development. He also notes the creation of accountability systems, which in theory have many positive benefits.
High-stakes tests also cause an increase in the knowledge about testing and improved quality of tests themselves. The higher the stakes are, the more thoroughly the tests are developed, monitored and researched. Also, since high-quality content standards are necessary for high-stakes exams, they have generally been improved which consequently has improved student learning by specifying exactly what students ought to know. In addition, these exams require increased data collection and quality control, which makes future research and implementation of programs much easier (2001).
Cizek states that once NCLB passed and made high-stakes testing a national requirement, “failure was no longer acceptable and there was a stake in helping all students succeed” (2001). What stands out in this phrase is the word “all,” since that is related to a huge positive consequence of tests. Since schools must report disaggregated scores, the score of each subgroup of students is equally important. Therefore, schools must make sure that each subgroup of students (including special education and limited English proficient students) passes the exams. The tests therefore work to achieve educational equity (Luna & Turner, 2001).
There are other, more basic benefits to high-stakes tests as well. Specifically because there are high-stakes attached to them, they provide incentives for students to study and to take their education seriously. Without tests, students can go through high school exerting little effort and still graduate. In addition, tests can show what students know and where they need to improve. Tests give both parents and teachers a sense of how students are doing individually and in comparison to others. And at a larger scale, tests inform leaders and policymakers on the progress of the educational system. They give administrators information on what programs are working and which ones are not. Support, training and resources can be sent to specific schools that need extra help (Ravitch, 2010). This is exactly what NCLB intends to do.
Another equally basic argument is that high-stakes tests actually improve instruction and learning. This is probably one of the main goals of these exams and if they are truly accomplishing that, then this is one of the strongest arguments for testing. The argument flows as follows. Since the requirement of these tests, state-wide content standards have been revised and strengthened. Because the content standards have changed, it should logically follow that instruction then should change as well, and should presumably change for the better. Ron Hambleton argues that if a good test is designed, then the negative “teaching to the test” phenomenon is nonsensical because “teaching to the test” simply means teaching the curricula. This is something that all teachers should be doing anyway, and if they are not, the enforcement of the tests should drive their instruction in that direction (R. Hambleton, personal communication, November 30, 2010).
The question that follows is whether or not this actually happens in practice. In fact, Hambleton cites NAEP scores as an answer to that question. The National Assessment of Educational Progress (NAEP) assesses the academic achievement of a representative sample of students across the country. In Massachusetts, there has been a clear increase in NAEP scores as well as SAT scores since the NCLB Act (R. Hambleton, personal communication, November 30, 2010). Looking nationwide, although scores have not improved as steadily as they have in Massachusetts, national NAEP scores have generally been increasing since 2004. Specifically, there have been increases in reading scores for 9-, 13- and 17-year-olds, and increases in math for 9- and 13-year-olds (Rampey et al., 2009). So clearly the benefits of high-stakes testing are important and crucial to take into account when looking at these tests.
This controversial issue is made even more complicated by the numerous arguments for and against the use of high-stakes tests. It seems that everyone in the education community and even in the general public have an opinion on this topic. When making policy, or even when choosing a school for your child to attend, it is important to go through the various arguments and try to understand this issue as deeply as possible.
On one hand, there are many negative consequences to mandated high-stakes tests. They may encourage “teaching to the test” and “curriculum narrowing.” They may have negative effects on teachers, and unintended consequences on teaching behavior such as neglecting high-achievers, or in contrast, low-achievers. They may promote cheating in various ways or increase the drop-out rate. There have been claims that the scores do not even measure academic achievement or quality of instruction. There is also a possibility of cultural bias and an unquestioned increase in levels of student stress.
However, positive consequences also exist. Tests increase the amount and quality of professional development and have positive effects on classroom assessment practices. Tests create accountability systems and encourage increased data collection. Tests may cause improved content standards, improved instruction, and improved student learning. They demonstrate student and school performance and progress to parents, teachers, administrators and policymakers. And they may encourage increased attention towards special needs and LEP students.
Both the pros and cons are abundant. Although I do not foresee an agreement of these two sides anytime soon, there are some important conclusions that can be reached by those at both ends of the spectrum. The first one is that “standardized tests are not precise instruments” (Ravitch, 2010, p. 152). Unlike a thermometer which measures the exact temperature, a test can only provide an estimate of a student’s knowledge at any given moment. A task such as measuring teachers, schools, districts, states, and the country as a whole is even more unreliable and prone to error. Exams are much better at showing trends than they are at precise measurement. That is why it is important to always keep in mind the capabilities and limitations of tests, and to only use them for their true intended purposes.
Along those same lines are two other conclusions: that problems people have are generally not with tests themselves, but with the misuse of them, and that test scores should not be used in isolation to make important decisions (Ravitch, 2010). The tests that states are currently using are very high-quality instruments that required a lot of money, time and expertise to design. Problems that people have are generally not with the tests themselves (with a few exceptions such as claims of bias), but with how the tests are used. If a test was not designed to assess teacher performance, it should not be used to do so. Tests are also often used as graduation requirements. If a test has not been specifically designed for this purpose, then it should not be the sole determiner of whether or not a student graduates. Ultimately, scores should only be used for the purpose for which that particular test was designed (Ravitch, 2010).
These seem to be conclusions that both sides could agree on. High-stakes tests are no panacea, but when used properly they can be very useful in school reform. Diane Ravitch makes a very telling statement by saying that “tests are necessary and helpful. But tests must be supplemented by human judgment” (Ravitch, 2010, p. 166). As long as human judgment remains a part of the picture, our schools will be just fine whether it be with high-stakes tests or without.
Cizek, G. J. (2001). More Unintended Consequences of High-Stakes Testing. Educational Measurement: Issues and Practice, 20(4), 19-27.
Cohen, D. K. & Moffitt, S. L. (2009). The Ordeal of Equality: Did Federal Regulation Fix the Schools?
Giambo, D. (2010). High-Stakes Testing, High School Graduation, and Limited English Proficient Students: A Case Study. American Secondary Education, 38(2), 44-56. Retrieved from ERIC database.
Kohn, A. (1999). The Schools Our Children Deserve.
Luna, C., & Turner, C. L. (2001). The Impact of the MCAS: Teachers Talk About High-Stakes. The English Journal, 9 (1), 79-87.
Madaus, G., & Russell, M. (2010). Paradoxes of High-Stakes Testing. Journal of Education, 190(1/2), 21-30. Retrieved from Academic Search Premier database.
Papay, J.P., Murnane, R.J., & Willett, J.B. (2010). The Consequences of High School Exit Examinations for Low-Performing Urban Students: Evidence from Massachusetts. Educational Evaluation and Policy Analysis, 32(1), 5-23. Retrieved from ERIC database.
Rampey, B. D., Dion, G. S., & Donahue, P. L. (2009, April). The Nation’s Report Card: Trends in Academic Progress in Reading and Mathematics 2008. Retrieved from http://nces.ed.g ov/nationsreportcard/pubs/main2008/2009479.asp#pdflist
Ravitch, D. (2000). Left Back: A Century of Battles Over School Reform.
Ravitch, D. (2010). The Death and Life of the Great American School System.
Shannon, J. (2008). Reading Results: A Critical Look at Standardized Testing and the
Linguistic Minority. (Doctoral Dissertation). Retrieved from ERIC database.
Smith, M. S. & O’Day, J. A. (1990). Systemic School Reform. In S. Fuhrman & B. Malen (Eds.), The politics of curriculum and testing.
Wheelock, A. (2002). School Awards Programs and Accountability in Massachusetts: Misusing MCAS Scores To Assess School Quality.