Metrics for estimating validity, reliability and bias in peer assessment
ISSN: 0949-149X
Année de publication: 2018
Volumen: 34
Número: 3
Pages: 968-980
Type: Article
D'autres publications dans: The International journal of engineering education
Résumé
Peer assessment is a widespread way of evaluating and rating the quality of a work in the field of education. Although itresults to be a very effective learning instrument, it is subjected to possible problems of reliability, validity and somepotential biases. Most works that study and try to solve these problems are focused on specific cases and the statistics formeasuring reliability, validity or bias are global, that is, they give a measure of these values for the whole process, but theydo not allow an individual study. In this work the approach is different. It proposes some metrics for reliability and validityof each reviewer, as well as an approximation to the possible biases that may appear in the assessment process, so that thereview process can be itself assessed. An analogy between the work of a reviewer in a process of peer assessment and theoperation of an automatic classifier is proposed. This has allowed us to leverage the usual measures in evaluating thequality of automatic classifiers to establish the quality of peer assessment. The reviewers are characterized by obtainingtheir confusion matrices and six new indicators: success rate (which estimates the validity); agreement degree (as a measureof reliability); assessment median and its interquartile range (for the estimation of central tendency and restriction of rangebiases); and average distance to diagonal and its standard deviation (to determine possible leniency and harshness biases).This method provides indicators of the reviewer’s task and the detection of different profiles, so that the teacher can assessthe work of the students as reviewers and introduce some correction mechanisms in the final assessment of the works. Apractical example of application to an engineering degree is provided to illustrate the potential of the method.