A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

›› 2009, Vol. 41 ›› Issue (08): 773-784.

A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

YU Zong-Huo;TANG Xiao-Juan;WANG Deng-Feng

(¹ Department of Psychology, Peking University, Beijing 100871, China)
(² College of Mathematics and Information Science, Nanchang Hangkong University, Nanchang 330063, China)

Received:2008-11-14 Revised:1900-01-01 Published:2009-08-30 Online:2009-08-30
Contact: WANG Deng-Feng

Abstract

Abstract: Generalizability Theory (GT) and Item Response Theory (IRT) have improved the Classical Test Theory (CTT) in different aspects. They put focus on macro-level and micro-level of measurement, respectively. Both GT and Multi-Facet Rasch Measurement model (MFRM, which is one case of IRT methods) can be applied to decompose the variances from different sources (including error) in the Performance Rating and to estimate the reliability of rating. The results from both of them can give researchers some recommendations about how to improve the Performance Rating. This paper tries to find how they perform differently in the way of improving the rating process in Beijing Olympic Games through making a comparison between GT and MFRM.
Those athletes’ scores from 10 meters platform diving in Beijing Olympic Games form the data to be anal-ysis. In the 2008 Beijing Olympic Games, there were twelve athletes who participated in the final of Men’s 10 meters platform diving. Each athlete dived six times, and was marked independently by seven referees each time. In total, there are 12´6´7=504 data points. Based on this dataset, both GT and MFRM are applied to analyze four facets (including round, person, referee, and difficulty) of these scores. However, as a hidden facet, diffi-culty can’t be separated in GT.
The results from GT and MFRM suggest consistently that the athlete, the round, and their interaction are important sources of variation in these scores, and that the referees have not significant contribution to variance in athletes’ scores. At the same time, the results from MFRM indicate that the difficulty is also a significant source of variation. Based on these results, we can find some ways to improve scoring from different aspects. For example, we find that the g coefficient is influenced significantly not by the number of referee but by the number of rounds. Therefore, it’s helpful to improve the reliability of rating through increasing the number of rounds. MFRM gives the measure of individual elements within each facet, the standard errors for each ele-ment and the diagnostic fit statistics to detect aberrant responses. Based on the analysis of MFRM, We find the referees disordered the step calibrations of the scale around the category of 6.5. The results from MFRM also give birth to a new ranking which is really different from that given in the 2008 Beijing Olympic Games.
In sum, we find that GT and MFRM are consistent totally in estimating the sources of variation. However, both methods have their own advantages. GT is more helpful in the way of design of measurement, and MFRM is more helpful in the ways of measure of individual elements within each facet and detecting aberrant responses. Moreover, MFRM can separate the effects of round, referee, and difficulty more successfully and produce a more precise estimation of ranking of athletes than the method used in 2008 Beijing Olympic Games.

Key words: generalizability theory, multi-facet Rasch measurement, performance rating

YU Zong-Huo,TANG Xiao-Juan,WANG Deng-Feng. (2009). A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games. , 41(08), 773-784.

[1]	LUO Zhaosheng;GUO Xiaojun. The Optimal Size of Material in Psychological Experiment: The Applications of Multivariate Generalizability Theory [J]. Acta Psychologica Sinica, 2014, 46(6): 876-884.
[2]	ZHANG Minqiang; ZHANG Wenyi; LI Guangming; LIU Xiaoyu; Huang Feifei. Estimating Variance Components of Missing Data for Generalizability Theor [J]. Acta Psychologica Sinica, 2014, 46(12): 1897-1909.
[3]	LI Guangming;ZHANG Minqiang. Using Adjusted Bootstrap to Improve the Estimation of Variance Components and Their Variability for Generalizability Theory [J]. Acta Psychologica Sinica, 2013, 45(1): 114-124.
[4]	LI Guang-Ming,ZHANG Min-Qiang. Estimating the Variability of Estimated Variance Components for Generalizability Theory [J]. , 2009, 41(09): 889-901.
[5]	Yang Zhiming,Chang Lei,Ma Shiye. MULTIVARIATE GENERALIZABILITY ANALYSIS OF THE CHINESE COLLEGE ENTRANCE COMPREHENSIVE EXAMINATION [J]. , 2004, 36(02): 195-200.
[6]	Yan Fang, Li Weiming (Department of Psychology, East China Normal University, Shanghai 200062). USING STRUCTURAL EQUATION MODELS TO ESTIMATE RATER RELIABILITY IN GENERALIZABILITY THEORY [J]. , 2002, 34(05): 92-97.
[7]	Yang Zhiming,Chang Lei (Department of Educational Psychology, The Chinese University of Hong Kong). A STUDY ON PUTONGHUA TESTING BY MULTIVARIATE GENERALIZABILITY THEORY [J]. , 2002, 34(01): 51-56.
[8]	Li Weiming Yan Fang(Department of psychology, East-China Normal University, Shanghai 200062). MODEL SELECTIONS, VARIANCE COMPONENT EXPLANATIONS AND INDEXCOMPARISONS IN THE APPLICATION OF GENERALIZABILITYTHEORY:COMMENTS ON LIU AND ZHANG (1998,1999) [J]. , 2001, 33(05): 84-87.
[9]	Liu Yuanwo (Personnel Testing Authorities,Minisity of Personnel P.R.C.,Beijing 100054) Zhang Houcan (Beijing Normal University. Beijing 100875). APPLICATION OF GENERALIZABILITY THEORY IN COMPOSITION SCORING [J]. , 1998, 30(02): 211-218.

A Comparison of GT and IRT: An Analysis of Performance Rating of Men’s 10 Me-ters Platform Diving in Beijing Olympic Games

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics

Comments