ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2009, Vol. 41 ›› Issue (08): 773-784.

• • 上一篇    

GT与IRT的比较: 北京奥运会男子10米跳台跳水分析

俞宗火;唐小娟;王登峰   

  1. (1北京大学心理学系, 北京 100871) (2南昌航空大学数信学院, 南昌 330063)
  • 收稿日期:2008-11-14 修回日期:1900-01-01 发布日期:2009-08-30 出版日期:2009-08-30
  • 通讯作者: 王登峰

A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

YU Zong-Huo;TANG Xiao-Juan;WANG Deng-Feng   

  1. (1 Department of Psychology, Peking University, Beijing 100871, China)
    (2 College of Mathematics and Information Science, Nanchang Hangkong University, Nanchang 330063, China)
  • Received:2008-11-14 Revised:1900-01-01 Online:2009-08-30 Published:2009-08-30
  • Contact: WANG Deng-Feng

摘要: 概化理论(GT)和项目反应理论(IRT)从两个不同的方向发展了经典测量理论, GT和IRT中的多面Rasch测量模型(MFRM)在主观评分中都可以用来估计评分中各变异来源对变异的贡献, 对测评的信度进行估计, 提出测评改进意见。12名运动员参加了2008北京奥运会男子10米跳台跳水决赛, 比赛共6个回合, 7名裁判独立对他们在各个回合的表现进行打分。GT和MFRM比较一致地认为运动员自身、回合、运动员与回合的交互效应是运动员得分的重要变异来源, 而裁判员对运动员得分差异的贡献不显著。MFRM同时还估计出难度系数是影响男子跳台跳水成绩的重要变异来源, 在评分等级6.5附近存在步校准错乱, 得出的运动员成绩排序与2008奥运实际排序有所不同。在GT中难度系数作为隐藏侧面, 其效应未能分离出来。GT和MFRM从两个不同的方面给测量提供改进意见: GT发现可以通过增加回合数来提高g系数, 而增加裁判数对其影响不大。MFRM给出各侧面的要素(如某裁判、运动员等)的估计值及其标准误, 它给出的诊断性拟合统计也有助于甄别异常得分或评分模式。

关键词: 概化理论, 多面Rasch测量模型, 主观评分

Abstract: Generalizability Theory (GT) and Item Response Theory (IRT) have improved the Classical Test Theory (CTT) in different aspects. They put focus on macro-level and micro-level of measurement, respectively. Both GT and Multi-Facet Rasch Measurement model (MFRM, which is one case of IRT methods) can be applied to decompose the variances from different sources (including error) in the Performance Rating and to estimate the reliability of rating. The results from both of them can give researchers some recommendations about how to improve the Performance Rating. This paper tries to find how they perform differently in the way of improving the rating process in Beijing Olympic Games through making a comparison between GT and MFRM.
Those athletes’ scores from 10 meters platform diving in Beijing Olympic Games form the data to be anal-ysis. In the 2008 Beijing Olympic Games, there were twelve athletes who participated in the final of Men’s 10 meters platform diving. Each athlete dived six times, and was marked independently by seven referees each time. In total, there are 12´6´7=504 data points. Based on this dataset, both GT and MFRM are applied to analyze four facets (including round, person, referee, and difficulty) of these scores. However, as a hidden facet, diffi-culty can’t be separated in GT.
The results from GT and MFRM suggest consistently that the athlete, the round, and their interaction are important sources of variation in these scores, and that the referees have not significant contribution to variance in athletes’ scores. At the same time, the results from MFRM indicate that the difficulty is also a significant source of variation. Based on these results, we can find some ways to improve scoring from different aspects. For example, we find that the g coefficient is influenced significantly not by the number of referee but by the number of rounds. Therefore, it’s helpful to improve the reliability of rating through increasing the number of rounds. MFRM gives the measure of individual elements within each facet, the standard errors for each ele-ment and the diagnostic fit statistics to detect aberrant responses. Based on the analysis of MFRM, We find the referees disordered the step calibrations of the scale around the category of 6.5. The results from MFRM also give birth to a new ranking which is really different from that given in the 2008 Beijing Olympic Games.
In sum, we find that GT and MFRM are consistent totally in estimating the sources of variation. However, both methods have their own advantages. GT is more helpful in the way of design of measurement, and MFRM is more helpful in the ways of measure of individual elements within each facet and detecting aberrant responses. Moreover, MFRM can separate the effects of round, referee, and difficulty more successfully and produce a more precise estimation of ranking of athletes than the method used in 2008 Beijing Olympic Games.

Key words: generalizability theory, multi-facet Rasch measurement, performance rating