Reliability in Psychometrics: What it is and How it is Estimated in Tests

If you have studied psychology or other related careers, you are surely familiar with the concept of reliability. But… what exactly does it consist of? Reliability in psychometrics is a quality or property of measurement instruments (for example tests), which allows us to verify whether they are precise, consistent and stable in their measurements.

In this article we tell you what this property consists of, we name some examples to clarify the concept and we explain the different ways to calculate the reliability coefficient in psychometrics.

Table of Contents hide

1 What is reliability in psychometrics?

1.1 Examples

2 The variability of measurements

3 The calculation: reliability coefficient

3.1 1. Two applications

3.2 2. A single application

3.3 3. Other methods

What is reliability in psychometrics?

Reliability is a concept included within psychometrics, the discipline responsible for measuring the psychological variables of human beings through different techniques, methods and tools. Thus, reliability in psychometry, despite the redundancy, consists of a psychometric property, which implies the absence of measurement errors of a certain instrument (for example, a test).

It is also known as the degree of consistency and stability of the scores obtained in different measurements through the same instrument or test. Another synonym for reliability in psychometrics is “precision.”. Thus, we say that a test is reliable when it is precise, does not present errors and its measurements are stable and consistent over repeated measurements.

Beyond reliability in psychology, in what fields does this concept appear and be used? In different fields, such as social research and education.

Examples

To better illustrate what this psychometric concept consists of, let’s think about the following example: we use a thermometer to measure the daily temperature in a classroom. We take the measurement at ten in the morning every day, for a week.

We will say that the thermometer is reliable (it has a high reliability) if, by having more or less the same temperature each day, the thermometer indicates this (that is, the measurements are close to each other, there are no large jumps or big differences).

Instead, if the measurements are totally different from each other (the temperature being more or less the same each day), will mean that said instrument does not have good reliability (because its measurements are not stable or consistent over time).

Another example to understand the concept of reliability in psychometrics: imagine that we weigh a basket with three apples every day, for several days, and we write down the results. If these results vary a lot throughout the successive measurements (that is, as we repeat them), this would indicate that the reliability of the scale is not good, since the measurements would be inconsistent and unstable (the antagonists of reliability ).

Thus, a reliable instrument is one that shows consistent and stable results in repeated measurement processes of a certain variable.

The variability of measurements

How do we know if an instrument is reliable? For example, starting from the variability of your measurements. That is, if the scores we obtain (measuring the same thing repeatedly) with said instrument are highly variable among themselves, we will consider that their values are not precise, and therefore the instrument does not have good reliability (it is not reliable).

Extrapolating this to psychological tests and the responses of a subject to one of them, we see how the fact that he answered the same test under the same conditions, repeatedly, would provide us with an indicator of the reliability of the test, based on the variability in the scores.

The calculation: reliability coefficient

How do we calculate reliability in psychometrics? From the reliability coefficient, which can be calculated in two different ways: from procedures that involve two applications or just one. Let’s see the different ways to calculate it, within these two large blocks:

1. Two applications

In the first group we find the different ways (or procedures) that allow us to calculate the reliability coefficient from two applications of a test. Let’s get to know them, as well as their disadvantages:

1.1. Parallel or equivalent forms

With this method, we obtain the measure of reliability, in this case also called “equivalence”. The method consists of applying, simultaneously, the two tests: X (the original test) and X’ (the equivalent test that we have created). The disadvantages of this procedure are basically two: examinee fatigue and the construction of two tests.

1.2. Test-retest

The second method, within the procedures to calculate the reliability coefficient from two applications, is the test-retest, which allows us to obtain the stability of the test. It basically consists of apply test X, allow a period of time to pass, and reapply the same test.

The disadvantages of this procedure are: the learning that the examined subject may have acquired in that period of time, the evolution of the person, which can alter the results, etc.

1.3. Test-retest with alternative forms

Finally, another way to calculate reliability in psychometrics is to use the test-retest with alternative forms. It is a combination of the two previous procedures so, although it can be used for certain cases, it accumulates the disadvantages of both.

The procedure consists of administering test X, allowing a period of time to pass, and administering test X’ (that is, the equivalent test created from the original,

2. A single application

On the other hand, the procedures to calculate reliability in psychometry (reliability coefficient) from a single application of the test or measurement instrument are divided into two subgroups: the two halves and the covariance between items. Let’s look at it in more detail, so that it is better understood:

2.1. Two halves

In this case, Simply put, the test is divided into two. Within this section, we find three types of procedures (ways of dividing the test):

2.2. Covariance between items

The covariance between items involves analyzing the relationship between all the test items. Within it, we also find three methods or formulas specific to psychometrics:

Croanbach’s alpha coefficient: its value ranges between 0 and 1. Kuder-Richardson (KR20): it is applied when the items are dichotomous (that is, when they only acquire two values). Guttman.

3. Other methods

Beyond the procedures that involve one or two applications of the test to calculate the reliability coefficient, we find other methods, such as: inter-rater reliability (which measures the consistency of the test), Hoyt’s method, etc.

Bibliographic references:

By citing this article, you acknowledge the original source and allow readers to access the full content.

PsychologyFor. (2024). Reliability in Psychometrics: What it is and How it is Estimated in Tests. https://psychologyfor.com/reliability-in-psychometrics-what-it-is-and-how-it-is-estimated-in-tests/