The fairness of raters in music performance assessment has become an important concern in the field of music. The assessment of students' music performance depends in a fundamental way on rater judgements. The quality of rater judgements is crucial to provide fair, meaningful and informative assessments of music performance. There are many external factors that can influence the quality of rater judgements. Previous research has used different measurement models to examine the quality of rater judgements (e.g., generalizability theory). There are limitations with the previous analysis methods that are based on classical test theory and its extensions. In this study, we use modern measurement theory (Rasch measurement theory) to examine the quality of rater judgements. The many-facets Rasch rating scale model is employed to investigate the extent of rater-invariant measurement in the context of music performance assessments related to university degrees in Malaysia (159 students rated by 24 raters). We examine the rating scale structure, the severity levels of the raters, and the judged difficulty of the items. We also examine the interaction effects across musical instrument subgroups (keyboard, strings, woodwinds, brass, percussions and vocal). The results suggest that there were differences in severity levels among the raters. The results of this study also suggest that raters had different severity levels when rating different musical instrument subgroups. The implications for research, theory and practice in the assessment of music performance are included in this paper.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.