I.
INTRODUCTION
In
conducting language test, it should be followed by a good quality of the test
itself. There are two characteristics of good test: validity and reliability. After
discussing about validity test before, here is the summary of reliability test.
II.
SUMMARY
OF CONTENT
Reliability
refers to the consistency of a measure. A test is considered reliable if we get
the same result repeatedly. For example, if a test is designed to measure a
trait (such as introversion), then each time the test is administered to a subject,
the results should be approximately the same. Unfortunately, it is impossible
to calculate reliability exactly, but there several different ways to estimate
reliability.
Reliability
does not imply validity. That is, a reliable measure is
measuring something consistently, but you may not be measuring what you want to
be measuring. For example, while there are many reliable tests of specific
abilities, not all of them would be valid for predicting, say, job performance.
In terms of accuracy and precision, reliability is analogous to
precision, while validity is analogous to accuracy.
An
example often used to illustrate the difference between reliability and
validity in the experimental sciences involves a common bathroom scale. If someone who is 200 pounds steps on a scale 10 times and
gets readings of 15, 250, 95, 140, etc., the scale is not reliable. If the
scale consistently reads "150", then it is reliable, but not valid.
If it reads "200" each time, then the measurement is both reliable
and valid. This is what is meant by the statement, "Reliability is
necessary but not sufficient for validity."
You
learned in the Theory of Reliability that it's not possible to calculate
reliability exactly. Instead, we have to estimate reliability, and this is
always an imperfect endeavor. Here, I want to introduce the major reliability
estimators and talk about their strengths and weaknesses. There are four
general classes of reliability estimates, each of which estimates reliability
in a different way. They are:
1. Test-Retest Reliability
To
gauge test-retest reliability, the test is administered twice at two different
points in time. This kind of reliability is used to assess the consistency of a
test across time. This type of reliability assumes that there will be no change
in the quality or construct being measured. Test-retest reliability is best
used for things that are stable over time, such as intelligence. Generally,
reliability will be higher when little time has passed between tests.
2. Inter-rater Reliability
This
type of reliability is assessed by having two or more independent judges score
the test. The scores are then compared to determine the consistency of the
raters’ estimates. One way to test inter-rater reliability is to have each
rater assign each test item a score. For example, each rater might score items
on a scale from 1 to 10. Next, you would calculate the correlation between the
two ratings to determine the level of inter-rater reliability. Another means of
testing inter-rater reliability is to have raters determine which category each
observations falls into and then calculate the percentage of agreement between
the raters. So, if the raters agree 8 out of 10 times, the test has an 80% inter-rater
reliability rate.
3. Parallel-Forms Reliability
Parallel-forms
reliability is gauged by comparing to different tests that were created using
the same content. This is accomplished by creating a large pool of test items
that measure the same quality and then randomly dividing the items into two
separate tests. The two tests should then be administered to the same subjects
at the same time.
4. Internal Consistency Reliability
This
form of reliability is used to judge the consistency of results across items on
the same test. Essentially, you are comparing test items that measure the same
construct to determine the tests internal consistency. When you see a question
that seems very similar to another test question, it may indicate that the two
questions are being used to gauge reliability. Because the two questions are
similar and designed to measure the same thing, the test taker should answer
both questions the same, which would indicate that the test has internal
consistency.
III.
CONCLUSION
Much has been written about
characteristic of language tests. Discussions have centered on the tests which have
reliability characteristics. Reliability becomes important part of constructing
a good test. Those are the conclusion of the summary related to the topic.
REFERENCES
Hughes,
Arthur. (1983). Testing for Language
Teachers. UK: Cambridge University Press.
No comments:
Post a Comment