The inconsistency in these test results would indicate an unreliable instrument. We want our test instruments to measure whatever they are supposed to measure with consistency. That is that any errors, or differences between test scores for an individual, would result from chance and not from systematic error.
Think of reliability as the degree to which an
individual's test performance remains nearly the same despite being
tested on numerous occasions
You can approach reliability from two different
points of view. One approach looks at the amount of variation
expected within a set of repeated measures of a single individual.
The other approach looks at the extent to which each individual
maintains the same relative positions in the group.
![]()
Suppose you give a test to a group of individuals, and some time later you administer the same test to the same group of individuals. How well each of the individuals paired scores from the first and second test match each other is a measure of the test reliability. Could an individual's memory of test answers influence their results on the retest? It might.
This procedure measures the internal consistency
reliability of a test. It requires you divide a test into two
comparable halves, and then correlate an individual's scores on each
of these halves. Often used on large tests in which a test-retest
isn't possible.
Another type of internal measure of reliability
involves looking at an individual's correct and incorrect responses
to each item on the test. The Rational Equivalence reliability is
essentially computing the average of all possible split-halves
correlations.
Figure skaters and Olympic divers are assigned
scores that use interrater reliability. When a well defined behavior
is evaluated by well trained judges, the extent of agreement of
scores (the reliability) among judges will be high.
On to reliability
activity.