Item Response Theory (IRT)

3 minutes

Item response theory tells us that any measuring instrument should be in line with an idea.

Written and verified by the psychologist Paula Villasante.

Last update: 21 December, 2022

One of the most important tasks of psychological intervention is evaluation. This evaluation is often determined by test results. In this regard, item response theory (IRT) is a test measurement theory that complements classical test theory.

Classical test theory (CTT) and IRT could evaluate the same test. Each one can establish the relevance or score for each of the items. Therefore, each person who takes the test could get a different result. However, we should mention that IRT leads to more calibrated instruments. However, it often costs more and requires the participation of specialized professionals.

These two test theories have the same goal: to create instruments that measure what you want to measure with the least possible error. This is because psychometry requires a certain level of reliability and validity.

The better a test replicates the performance of two test-takers with the same level of expertise, or the same test-taker on different occasions, the more reliable it is. On the other hand, validity refers to the degree to which empirical evidence and theory support the interpretation of test scores.

Item response theory improves measuring tools.

The limitations of CTT that led to the emergence of IRT

Although it has been very valuable, classical test theory has some limitations. In CTT, the instrument’s measurements are not invariant. For example, imagine that a psychologist will evaluate the intelligence of three people with a different test for each one. In this case, you couldn’t compare the results. Why?

Well, because each test has its own scale. Therefore, in order to compare the intelligence of a group of people, for example, you’d need to transform the scores from different scales.

On the other hand, IRT allows you to compare the results when using different instruments with the same scale. Additionally, another limitation of classical test theory is the lack of invariance of the test properties regarding the people you use to determine it. IRT can also improve that aspect.

Assumptions of item response theory (IRT)

In order to resolve these limitations, IRT has to make stronger and more restrictive assumptions than CTT.

First assumption

The most important assumption of item response theory tells us that any measuring instrument should be in line with an idea. In other words, there should be a functional relationship between the variable values of the items and the probability of coinciding with them. This function is called the item characteristic curve (ICC).

Thus, we can say that IRT improves CTT with this new idea. For example, during an intelligence test, only the most intelligent people could answer the most difficult questions. On the other hand, if everyone taking the test gave the same answer to an item, then that item wouldn’t be able to determine the subject’s level of expertise.

Second assumption

The second assumption is that most models suppose that the items are part of a single dimension. In other words, that they’re one-dimensional. Thus, before using these type of models, you must make sure that the data complies with this one-dimensionality. Unfortunately, many of the instruments that psychologists frequently use collect multidimensional data.

Third assumption

The third assumption of the item response theory is local independence. In other words, to use these models, the items must be independent of each other. Thus, the answers to one item can’t affect the response to other items. Thus, if the unidimensionality is met, local independence is also fulfilled. This can only be possible if there is no interdependence between the items or a shared variance that is related to the measured dimension. Thus, both assumptions are related.

Muñiz (2010) pointed out the importance of advances in the field of psychometrics and test interpretation. Thus, the logical thing to do is start to take another step in this direction since the tests analyzed under IRT show worrisome results about how they’re being currently measured.

All cited sources were thoroughly reviewed by our team to ensure their quality, reliability, currency, and validity. The bibliography of this article was considered reliable and of academic or scientific accuracy.

Cuesta, M. y Muñiz, J. (1999). Robustness of item response logistic models to violations of the unidimensionality assumption. Psicothema, Vol. 11, 175-182
Muñiz, J. (1997) Introducción a la teoría de respuesta a los ítems. Madrid: Pirámide.
Muñiz, J. (2000). Teoría Clásica de los Tests. Madrid: Pirámide.
Muñiz Fernández, J. (2010). Las teorías de los tests: teoría clásica y teoría de respuesta a los ítems. Papeles del Psicólogo: Revista del Colegio Oficial de Psicólogos.

This text is provided for informational purposes only and does not replace consultation with a professional. If in doubt, consult your specialist.