Appropriate Use of High-Stakes
Testing in Our Nation's Schools
A honlap címe: http://www.apa.org/pubinfo/testing.html
How Should Student Learning and
Achievement Be Measured?
Measuring what and how well students
learn is an important building
block in the process of strengthening and improving our nation's
schools. Tests, along with student grades and teacher evaluations, can
provide critical measures of students' skills, knowledge, and
abilities. Therefore, tests should be part of a system in which broad
and equitable access to educational opportunity and advancement is
provided to all students. Tests, when used properly, are among the most
sound and objective ways to measure student performance. But, when test
results are used inappropriately or as a single measure of performance,
they can have unintended adverse consequences.
Today, many school districts are mandating tests to measure student
performance and to hold individual schools and school systems
accountable for that performance. Knowing if and what students are
learning is important. Test results give classroom teachers important
information on how well individual students are learning and provide
feedback to the teachers themselves on their teaching methods and
It is important to remember, however, that no test is valid for all
purposes. Indeed, tests vary in their intended uses and in their
ability to provide meaningful assessments of student learning.
Therefore, while the goal of using large-scale testing to measure and
improve student and school system performance is laudable, it is also
critical that such tests are sound, are scored properly, and are used
Some public officials and educational administrators are increasingly
calling for the use of tests to make high-stakes decisions, such as
whether a student will move on to the next grade level or receive a
diploma. School officials using such tests must ensure that students
are tested on a curriculum they have had a fair opportunity to learn,
so that certain subgroups of students, such as racial and ethnic
minority students or students with a disability or limited English
proficiency, are not systematically excluded or disadvantaged by the
test or the test-taking conditions. Furthermore, high-stakes decisions
should not be made on the basis of a single test score, because a
single test can only provide a "snapshot" of student achievement and
may not accurately reflect an entire year's worth of student progress
The potential problem with the current increased emphasis on testing is
not necessarily the test, per se, but the instances when tests have
unintended and potentially negative consequences for individual
students, groups of students, or the educational system more broadly.
But, it is also critical to remember that, in many instances, without
tests, low-performing students and schools could remain invisible and
therefore not get the extra resources or remedial help that they need.
The Appropriate Use of Tests
The measurement validity of a test is
an extremely important concept.
Measurement validity simply means whether a test provides useful
information for a particular purpose. Said another way: Will the test
accurately measure the test taker's knowledge in the content area being
When tests are developed and used appropriately, they are among the
most sound and objective knowledge and performance measures available.
But, appropriate development and use are critical. Fairness in testing
begins when tests are being developed. Test developers should provide
to those using their tests (school systems, for example) specific
information about the potential limitations of the test, including
situations in which the use of the test scores would be inappropriate.
For example, a test that has been validated only for diagnosing
strengths and weaknesses of individual students should not be used to
evaluate the educational quality of a school. Furthermore, those using
a particular test should have an appreciation for how the test
performance of some students--students with a disability or those with
limited English-speaking ability, for example, should be interpreted.
The Standards for Educational and Psychological Testing, * created by
the American Psychological Association, the American Educational
Research Association, and the National Council on Measurement in
Education, present a number of principles that are designed to promote
fairness in testing and avoid unintended consequences. They include:
Any decision about a student's continued education, such as retention,
tracking, or graduation, should not be based on the results of a single
test, but should include other relevant and valid
When test results substantially contribute to decisions made about
student promotion or graduation, there should be evidence that the test
addresses only the specific or generalized content and skills that
students have had an opportunity to learn. For tests that will
determine a student's eligibility for promotion to the next grade or
for high school graduation, students should be granted, if needed,
multiple opportunities to demonstrate mastery of materials through
equivalent testing procedures.
When a school district, state, or some other authority mandates a test,
the ways in which the test results are intended to be used should be
clearly described. It is also the responsibility of those who
mandate the test to monitor its impact, particularly on racial and
ethnic-minority students or students of lower socioeconomic status, and
to identify and minimize potential negative consequences of such
In some cases, special accommodations for students with limited English
proficiency may be necessary to obtain valid test scores. If students
with limited English skills are to be tested in English, their test
scores should be interpreted in light of their limited English skills.
For example, when a student lacks proficiency in the
language in which the test is given (students for whom English is a
second language for example), the test could become a measure of their
ability to communicate in English rather than a measure of other skills.
Likewise, special accommodations may be needed to ensure that test
scores are valid for students with disabilities. Not enough is
currently known about how particular test modifications may affect the
test scores of students with disabilities; more research is needed. As
a first step, test developers should include students with
disabilities in field testing of pilot tests and document the impact of
particular modifications (if any) for test users.
Gaps Between Testing Principles and
Calls to improve educational outcomes
by measuring student and school
performance are based on good intentions. And, as previously stated,
tests, when used appropriately, can be valid measures of student
achievement. However, test users must ensure that results are truly
indicative of student achievement rather than a reflection of the
quality of school resources or instruction. It is only fair to use test
results in high-stakes decisions when students have had a real
opportunity to master the materials upon which the test is based.
Therefore, in conjunction with supporting the use of tests to evaluate
performance, public policymakers should also support research on the
consequences of such testing, and localities should
work to provide the resources necessary for schools to provide quality
educational opportunities and achieve real student growth and learning,
not just "teaching to the test" skills acquisition. Test results should
also be reported by sex, race/ethnicity, income level,
disability status, and degree of English proficiency for evaluation
More Research Is Needed on the Impact
of Large-Scale Testing
In summary, testing is an extremely
valuable part of educational
assessment, but it is only a part of the formula for quality learning.
When tests are used in high-stakes circumstances, a number of
safeguards must be in place. Test developers must ensure that certain
groups of students are not disadvantaged by a test, and test users must
guard against allowing the testing process--the need for students to
pass a certain test--to overwhelm the rest of a student's mastery of a
wide curriculum. Furthermore, remedial programs should be in place for
students who score low or fail such tests.
Because the stakes are so high for so many students, additional
research should begin immediately to learn more about the intended and
unintended consequences of testing in educational decision making. If
tests are going to be used to determine which students will advance and
what subjects schools will teach, it is imperative that we understand
how best to measure student learning and how the use of high-stakes
testing will affect student drop-out rates, graduation rates, course
content, levels of student anxiety, and teaching practices. The
bottom-line question, as yet unanswered, is: What will be the long-term
effect of high-stakes testing on student achievement? Will it enhance
or diminish broad-based learning?
American Educational Research Association, American Psychological
Association, and National Council on Measurement in Education (1999).
Standards for Educational and Psychological Testing. Washington, DC:
American Educational Research Association.