Reliability Activity


Background

The inconsistency in these test results would indicate an unreliable instrument. We want our test instruments to measure whatever they are supposed to measure with consistency. That is that any errors, or differences between test scores for an individual, would result from chance and not from systematic error.

Think of reliability as the degree to which an individual's test performance remains nearly the same despite being tested on numerous occasions


There are three types of chance, or random influence that affect reliability:

  1. The test taker may actually change from one day to the next. One day she may feel better than the next, or may feel more or less alert

  2. The test conditions may change. Maybe you are allowed to use notes one time but not the next. This may help some students more than others.

  3. A small sample of questions. Small samples are highly vulnerable to chance influences.

You can approach reliability from two different points of view. One approach looks at the amount of variation expected within a set of repeated measures of a single individual. The other approach looks at the extent to which each individual maintains the same relative positions in the group.


   

Types of Reliability

Test-Retest

Suppose you give a test to a group of individuals, and some time later you administer the same test to the same group of individuals. How well each of the individuals paired scores from the first and second test match each other is a measure of the test reliability. Could an individual's memory of test answers influence their results on the retest? It might.

To get around the possible contaminating effects of a test on the retest, try correlating the results of the first test to an equivalent form for the retest.


Split-Half

This procedure measures the internal consistency reliability of a test. It requires you divide a test into two comparable halves, and then correlate an individual's scores on each of these halves. Often used on large tests in which a test-retest isn't possible.

Rational Equivalence Reliability

Another type of internal measure of reliability involves looking at an individual's correct and incorrect responses to each item on the test. The Rational Equivalence reliability is essentially computing the average of all possible split-halves correlations.


Scorer/Rater Reliability

Figure skaters and Olympic divers are assigned scores that use interrater reliability. When a well defined behavior is evaluated by well trained judges, the extent of agreement of scores (the reliability) among judges will be high.


On to
reliability activity.