GT与IRT的比较: 北京奥运会男子10米跳台跳水分析

心理学报 ›› 2009, Vol. 41 ›› Issue (08): 773-784. cstr: 32110.14.2009.00773

• • 上一篇

GT与IRT的比较: 北京奥运会男子10米跳台跳水分析

俞宗火;唐小娟;王登峰

(¹北京大学心理学系, 北京 100871) (²南昌航空大学数信学院, 南昌 330063)

收稿日期:2008-11-14 修回日期:1900-01-01 发布日期:2009-08-30 出版日期:2009-08-30

A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

YU Zong-Huo;TANG Xiao-Juan;WANG Deng-Feng

(¹ Department of Psychology, Peking University, Beijing 100871, China)
(² College of Mathematics and Information Science, Nanchang Hangkong University, Nanchang 330063, China)

Received:2008-11-14 Revised:1900-01-01 Online:2009-08-30 Published:2009-08-30

摘要/Abstract

摘要： 概化理论(GT)和项目反应理论(IRT)从两个不同的方向发展了经典测量理论, GT和IRT中的多面Rasch测量模型(MFRM)在主观评分中都可以用来估计评分中各变异来源对变异的贡献, 对测评的信度进行估计, 提出测评改进意见。12名运动员参加了2008北京奥运会男子10米跳台跳水决赛, 比赛共6个回合, 7名裁判独立对他们在各个回合的表现进行打分。GT和MFRM比较一致地认为运动员自身、回合、运动员与回合的交互效应是运动员得分的重要变异来源, 而裁判员对运动员得分差异的贡献不显著。MFRM同时还估计出难度系数是影响男子跳台跳水成绩的重要变异来源, 在评分等级6.5附近存在步校准错乱, 得出的运动员成绩排序与2008奥运实际排序有所不同。在GT中难度系数作为隐藏侧面, 其效应未能分离出来。GT和MFRM从两个不同的方面给测量提供改进意见: GT发现可以通过增加回合数来提高g系数, 而增加裁判数对其影响不大。MFRM给出各侧面的要素(如某裁判、运动员等)的估计值及其标准误, 它给出的诊断性拟合统计也有助于甄别异常得分或评分模式。

关键词: 概化理论, 多面Rasch测量模型, 主观评分

Abstract: Generalizability Theory (GT) and Item Response Theory (IRT) have improved the Classical Test Theory (CTT) in different aspects. They put focus on macro-level and micro-level of measurement, respectively. Both GT and Multi-Facet Rasch Measurement model (MFRM, which is one case of IRT methods) can be applied to decompose the variances from different sources (including error) in the Performance Rating and to estimate the reliability of rating. The results from both of them can give researchers some recommendations about how to improve the Performance Rating. This paper tries to find how they perform differently in the way of improving the rating process in Beijing Olympic Games through making a comparison between GT and MFRM.
Those athletes’ scores from 10 meters platform diving in Beijing Olympic Games form the data to be anal-ysis. In the 2008 Beijing Olympic Games, there were twelve athletes who participated in the final of Men’s 10 meters platform diving. Each athlete dived six times, and was marked independently by seven referees each time. In total, there are 12´6´7=504 data points. Based on this dataset, both GT and MFRM are applied to analyze four facets (including round, person, referee, and difficulty) of these scores. However, as a hidden facet, diffi-culty can’t be separated in GT.
The results from GT and MFRM suggest consistently that the athlete, the round, and their interaction are important sources of variation in these scores, and that the referees have not significant contribution to variance in athletes’ scores. At the same time, the results from MFRM indicate that the difficulty is also a significant source of variation. Based on these results, we can find some ways to improve scoring from different aspects. For example, we find that the g coefficient is influenced significantly not by the number of referee but by the number of rounds. Therefore, it’s helpful to improve the reliability of rating through increasing the number of rounds. MFRM gives the measure of individual elements within each facet, the standard errors for each ele-ment and the diagnostic fit statistics to detect aberrant responses. Based on the analysis of MFRM, We find the referees disordered the step calibrations of the scale around the category of 6.5. The results from MFRM also give birth to a new ranking which is really different from that given in the 2008 Beijing Olympic Games.
In sum, we find that GT and MFRM are consistent totally in estimating the sources of variation. However, both methods have their own advantages. GT is more helpful in the way of design of measurement, and MFRM is more helpful in the ways of measure of individual elements within each facet and detecting aberrant responses. Moreover, MFRM can separate the effects of round, referee, and difficulty more successfully and produce a more precise estimation of ranking of athletes than the method used in 2008 Beijing Olympic Games.

Key words: generalizability theory, multi-facet Rasch measurement, performance rating

俞宗火,唐小娟,王登峰. (2009). GT与IRT的比较: 北京奥运会男子10米跳台跳水分析
. 心理学报, 41(08), 773-784.

YU Zong-Huo,TANG Xiao-Juan,WANG Deng-Feng. (2009). A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games. , 41(08), 773-784.

[1]	黎光明, 秦越. 一种基于进化算法的概化理论最佳样本量估计新方法：兼与三种传统方法比较[J]. 心理学报, 2022, 54(10): 1262-1276.
[2]	罗照盛;郭小军. 认知行为实验研究中最佳素材容量的选择与确定：多元概化理论应用[J]. 心理学报, 2014, 46(6): 876-884.
[3]	张敏强;张文怡;黎光明;刘晓瑜;黄菲菲. 概化理论缺失数据方差分量估计[J]. 心理学报, 2014, 46(12): 1897-1909.
[4]	黎光明;张敏强. 校正的Bootstrap方法对概化理论方差分量及其变异量估计的改善[J]. 心理学报, 2013, 45(1): 114-124.
[5]	黎光明,张敏强. 基于概化理论的方差分量变异量估计[J]. 心理学报, 2009, 41(09): 889-901.
[6]	杨志明,张雷,马世晔. 从多元概化理论看高考综合能力测试的改进[J]. 心理学报, 2004, 36(02): 195-200.
[7]	严芳,李伟明. 用结构方程建模(SEM)估计概化理论(GT)中的评分者信度[J]. 心理学报, 2002, 34(05): 92-97.
[8]	杨志明,张雷. 用多元概化理论对普通话的测试[J]. 心理学报, 2002, 34(01): 51-56.
[9]	李伟明,严芳. 概化理论中的模型选择、数据解释和指标比较——评刘远我等的两篇论文[J]. 心理学报, 2001, 33(5): 84-87.
[10]	张雷,侯杰泰,何伟杰,文剑冰,王渝光. 普通话测试的录音评分可行性、信度及经济效率[J]. 心理学报, 2001, 33(2): 97-103.
[11]	刘远我,张厚粲. 概化理论在作文评分中的应用研究[J]. 心理学报, 1998, 30(2): 211-218.

GT与IRT的比较: 北京奥运会男子10米跳台跳水分析

A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

PDF (PC)

可视化

English Version

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价