Investigating reliability and validity in rating scripts for standardisation purposes in onscreen marking

  • Kwai Mun Amy Cheung, Hong Kong Examinations and Assessment Authority, Hong Kong
  • Rui Chang, Hong Kong
  • This study investigated the reliability and validity of Assistant Examiners (AEs) in rating the standardised scripts used as benchmarks in onscreen marking (OSM) of the written component of Primary 6 English Language in the Territory-wide System Assessment (TSA) in the Hong Kong Special Administrative Region. The marking criteria included ‘content’ and ‘language’ in the written component. Standardised scripts were employed for three purposes: 1) training, 2) qualifying markers before they started rating, and 3) check-marking the markers at random intervals throughout the entire OSM period. Therefore, these standardised scripts still played a vital role in monitoring the marking quality even with the cutting edge technology of OSM. Scripts selected were drawn from a stratified sample from a total of some 580 participating schools. All scripts for standardisation purposes could have been marked by all AEs (a total of 245 scripts). This would have been time-consuming and the AEs would have suffered from rater fatigue which affected their marking reliability. Therefore, this study adopted ‘overlapping marking’ where AEs were only required to rate a limited number of scripts (i.e. 60 – 70). For each rater, 20 – 30 scripts overlapped with one other rater so that they formed an unbroken chain of overlap. From this data, the inter-rater reliability coefficient was calculated and the multi-faceted Rasch Model was run to calculate the ‘fair average’ (FA) for all AEs and ‘infit’ for each rater. To validate the ratings, verifiable quantitative measures (VQM) were used as external validity measures which were correlated against both FA and individual ratings. The VQM included ‘T-units’, ‘lexical variation’ and ‘tokens’. The results showed that the method used in rating scripts for standardisation purposes was valid and reliable.

    View Paper