Validation of psychometric instruments in social and health sciencesa practical guide

  1. López-Pina, Jose-Antonio 1
  2. Veas, Alejandro 2
  1. 1 Department of Basic Psychology and Methodology, University of Murcia (Spain)
  2. 2 Department of Developmental and Educational Psychology, University of Murcia (Spain)
Revista:
Anales de psicología

ISSN: 0212-9728 1695-2294

Año de publicación: 2024

Título del ejemplar: January - April

Volumen: 40

Número: 1

Páginas: 163-170

Tipo: Artículo

DOI: 10.6018/ANALESPS.583991 DIALNET GOOGLE SCHOLAR lock_openDIGITUM editor

Otras publicaciones en: Anales de psicología

Objetivos de desarrollo sostenible

Resumen

Recientemente se ha incrementado significativamente el número de estudios psicométricos junto a avances estadísticos cruciales para evaluar la fiabilidad y validez de los tests. Dada la importancia de proporcionar procedimientos más exactos tanto en la metodología como en la interpretación de las puntuaciones, los editores de la revista Anales de Psicología proponen esta guía para abordar los tópicos más relevantes en el campo de la psicometría aplicada. Con esta finalidad, el presente manuscrito analiza los tópicos principales de la Teoría Clásica de Tests (e.g., análisis factorial exploratorio/confirmatorio, fiabilidad, validez, sesgo, etc.) con vistas a sintetizar y clarificar las aplicaciones prácticas, y mejorar los estándares de publicación de estos trabajos.

Referencias bibliográficas

  • Abad, F. J., Olea, J., Ponsoda, V., & García, C. (2011). Medición en ciencias sociales y de la salud [Measurement in social and health sciences]. Síntesis.
  • Adams, R. J., Wu, M. L., Cloney, D., Berezner, A., & Wilson, M. (2020). ACER ConQuest: Generalised Item Response Modelling Software (Version 5.29) [Computer software]. Australian Council for Educational Research. https://www.acer.org/au/conquest
  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. https://doi.org/10.1007/BF02293814
  • Andrich, D., & Luo, G. (1996). RUMMFOLDss: A Windows program for analyzing single stimulus responses of persons to items according to the hyperbolic cosine unfolding model. [Computer program]. Perth, Australia: Murdoch University.
  • American Educational Research Association. American Psychological Association. National Council on Measurement in Education (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438. https://doi.org/10.1080/10705510903008204
  • Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13, 186-203. https://doi.org/10.1207/s15328007sem1302_2
  • Bentler, P. M., & Yuan, K. H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181-187. https://doi.org/10.1207/S15327906Mb340203
  • Bock, R. D., & Gibbons, R. (2010). Factor analysis of categorical item responses. In M. L. Nering and R. Ostini (Eds.). Handbook of polytomous item response theory models. Routledge.
  • Bond, T. G., & Fox, C. (2015). Applying the Rasch model; fundamental measurement in the Human Sciences. Routledge.
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research. The Guilford Press.
  • Byrne, B. M., Shavelson, R. J. & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456-466. https://doi.org/10.1037/0033-2909.105.3.456
  • Canivez, G. L. (2016). Bifactor modeling. In K. Schweizer & C. DiStefano (Eds), Principles and methods of test construction (pp. 247-271). Hogrefe.
  • Charter, R. A. (2000). Confidence interval formulas for split-half reliability coefficients. Psychological Reports, 86, 1168-1170. https://doi.org/10.1177/003329410008600317.2
  • Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. https://doi.org/10.1037/0021-9010.78.1.98
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart, and Winston.
  • de Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford Press.
  • de Boeck, P., & Wilson, M. (Eds.) (2004). Explanatory item response models: A generalized linear and nonlinear approach. Springer-Verlag.
  • Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64, 419-436. https://doi.org/10.1177/0013164403261050
  • Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (pp. 105–146). Macmillan Publishing Co, Inc; American Council on Education.
  • Ferrando, P. J., & Lorezo-Seva, U. (2014). Exploratory item factor analysis: additional considerations. Annals of Psychology, 30(3), 1170-1175. https://doi.org/10.6018/analesps.30.3.199991
  • Finney, S. J. & DiStefano, C. (2006). Nonnormal and categorical data in structural equation models. In G. R. Hancock, & R. O. Mueller (Eds.), A second course in Structural equation modeling (pp. 269-314). Information Age.
  • Fisher, G. H., & Molenaar, I. W. (Eds.) (1995). Rasch models: Foundations, recent developments, and applications. Springer-Verlag.
  • Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychological Methods, 9, 466-491. https://doi.org/10.1037/1082-989X.9.4.466
  • Forero, C., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling, 16, 625-641. https://doi.org/10.1080/10705510903203573
  • Gilmer, J. S., & Feldt, L. S. (1983). Reliability estimation for a test with part of unknown lengths. Psychometrika, 48, 99-111. https://doi.org/10.1007/BF02314679
  • Goretzko, D., Pham, T. T. H., & Bühner, M. (2021). Exploratory factor analysis: Current use, methodological developments, and recommendations for good practice. Current Psychology, 40, 3510-3521. https://doi.org/10.1007/s12144-019-00300-2
  • Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827-838. https://doi.org/10.1177/001316447703700403
  • Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (2005). Adapting educational and psychological tests for cross-cultural assessment. Lawrence Erlbaum Associates.
  • Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189. https://doi.org/10.1080/07481756.2002.12069034
  • Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling, 8, 205-223. https://doi.org/10.1207/S15328007SEM0802_3
  • Lei, P. W. (2009). Evaluating estimation methods for ordinal data in structural equation modeling. Quality and Quantity, 43, 495-507. https://doi.org/10.1007/s11135-007-9133-z
  • Linacre, J.M. (2023). Winsteps® (Version 5.6.0) [Computer Software]. Portland, Oregon: Winsteps.com. Available from https://www.winsteps.com/
  • Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2014). Exploratory item factor analysis: A practical guide revised and updated. Annals of Psychology, 30(3), 1151-1169. https://doi.org/10.6018/analesps.30.3.199361
  • Lloret, S., Ferreres, A., Hernández, A., & Tomás, I. (2017). The exploratory factor analysis of items: guided analysis based on empirical data and software. Annals of Psychology, 33(2), 417-432. https://doi.org/10.6018/analesps.33.2.270211
  • Lohr, K. N., Aaronson, N. K., Alonso, J., Burnam, M. A., Patrick, D. L., Perrin, E. B., & Roberts, J. S. (1996). Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clinical Therapeutics, 18, 979-992. https://doi.org/10.1016/s0149-2918(96)80054-3
  • Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
  • Masters, G. (1982). A Rasch model for credit partial scoring. Psychometrika, 47, 149-174. https://doi.org/10.1007/BF02296272
  • McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: LEA.
  • McHorney, C. A., & Tarlov, A. R. (1995). Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Quality of Life Research, 4, 293-307. https://doi.org/10.1007/BF01593882
  • Mearns, J., Patchett, E., & Catanzaro, S. (2009). Multitrait-multimethod matrix validation of the Negative Mood Regulation Scale. Journal of Research in Personality, 43(5), 910-913. https://doi.org/10.1016/j.jrp.2009.05.003
  • Meyer, J. P. (2014). Applied measurement with jMetrik. Routdlege.
  • Michell, J. (1999). Measurement in Psychology: A critical history of a methodological concept. Cambridge University Press.
  • Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge.
  • Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479-515. https://doi.org/10.1207/S15327906MBR3903_4
  • Muñiz, J., & Bartram, D. (2007). Improving international tests and testing. European Psychologist, 12, 206-219. https://doi.org/10.1027/1016-9040.12.3.206
  • Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71. https://doi.org/10.1177/014662169001400106
  • Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory models. New York: Routledge.
  • O'Rourke, N. (2004). Reliability generalization of responses by care providers to the Center for Epidemiologic Studies-Depression Scale. Educational and Psychological Measurement, 64, 973-990. https://doi.org/10.1177/0013164404268668
  • Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear restrictions. British Journal of Mathematical and Statistical Psychology, 54, 315-323. https://doi.org/10.1348/000711001159582
  • Raykov, T. (2002). Analytic estimation of standard error and confidence interval for scale reliability. Multivariate Behavioral Research, 37, 89-103. https://doi.org/10.1207/S15327906MBR3701_04
  • Raykov, T. (2004). Behavioral scale reliability and measurement invariance evaluation using latent variable modeling. Behavior Therapy, 35, 299-331. https://doi.org/10.1016/S0005-7894(04)80041-8
  • Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcome measures. Quality of Life Research, 16, 19-31. https://doi.org/10.1007/s11136-007-9183-7
  • Robitzsch, A., & Lüdtke, O. (2023). Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Structural Equation Modeling: A multidisciplinary Journal. https://doi.org/10.1080/10705511.2023.2191292
  • Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74, 31-57. https://doi.org/10.1177/0013164413498257
  • Samejima, E. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, No. 17.
  • Sánchez-Meca, J., Marín-Martínez, F., López-López, J. A., Núñez-Núñez, R. M., Rubio-Aparicio, M., López-García, J. J., López-Pina, J. A., Blázquez-Rincón, D. M., López-Ibáñez, C., & López-Nicolás, R. (2021). Improving the reporting quality of reliability generalization meta-analyses: The REGEMA checklist. Research Synthesis Methods, 12(4), 516-536. https://doi.org/10.1002/jrsm.1487
  • Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350-353. https://doi.org/10.1037/1040-3590.8.4.350
  • Schmitt, N., & Kuljanin, G. (2008). Mesurement invariance: Review of practice and implications. Human Resource Management Review, 18(4), 210-222. https://doi.org/10.1016/j.hrmr.2008.03.003
  • Shevlin, M., Miles, J. N. V., Davies, M. N. O., & Walker, S. (2000). Coefficient alpha: A useful indicator of reliability? Personality and Individual Differences, 28, 229-237. https://doi.org/10.1016/S0191-8869(99)00093-8
  • Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory. Sage.
  • Streiner, D. L. (2003). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99-103. https://doi.org/10.1207/S15327752JPA8001_18
  • Streiner, D., Norman, G., & Cairney, J. (2015). Health measurement scales: A practical guide to their development and use. Oxford.
  • Svetina, D., Rutkowski, I., & Rutkowski, D. (2020). Multiple-group invariance with categorical outomes using updated guidelines: an illustration using Mplus and the lavaan/semtools packages. Structural Equation Modeling: A Multidisciplinary Journal, 27, 111-130. https://doi.org/10.1080/10705511.2019.1602776
  • The jamovi project (2023). jamovi (Version 2.3) [Computer Software]. Retrieved from https://www.jamovi.org
  • Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195. https://doi.org/10.1177/00131640021970448
  • Thompson, M. S. (2016). Assessing measurement invariance of scales using Multiple-Group Structural Equation Modeling. In K. Schewizer & C. DiStefano (Eds.), Principles and methods of test construction (pp. 218-244). Hogrefe.
  • Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. https://doi.org/10.1177/0013164498058001002
  • van der Linden, W., & Hambleton, R. K. (Eds.) (1997). Handbook of modern item response theory. Springer.
  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4-69. https://doi.org/10.1177/109442810031002
  • Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Annals of Psychology, 33(3), 755-782. http://dx.doi.org/10.6018/analesps.33.3.268401
  • Wright, B. D., & Stone, M. H. (1979). Best test design. Mesa Press.
  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Mesa Press.
  • Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle's β, and McDonald's ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123-133. https://doi.org/10.1007/s11336-003-0974-7
  • Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to a latent variable common to all of a scale’s indicators: A comparison of estimators for ωh. Applied Psychological Measurement, 30, 121-144. https://doi.org/10.1177/0146621605278814