概化理论缺失数据方差分量估计

doi:10.3724/SP.J.1041.2014.01897

心理学报 ›› 2014, Vol. 46 ›› Issue (12): 1897-1909.doi: 10.3724/SP.J.1041.2014.01897

概化理论缺失数据方差分量估计

张敏强¹;张文怡²;黎光明¹;刘晓瑜³;黄菲菲¹

(¹华南师范大学心理学院、心理应用研究中心, 广州 510631) (²暨南大学管理学院, 广州 510632) (³华南师范大学教育科学学院, 广州 510631)

收稿日期:2013-07-18 发布日期:2014-12-25 出版日期:2014-12-25
通讯作者: 张敏强, E-mail: 2640726401@qq.com; 张文怡, Zhangwenyi25@hotmail.com; 黎光明, E-mail: Lgm2004100@sina.com
基金资助:
全国教育科学“十二五”规划教育部重点课题(GFA111009)、2014年国家自然科学基金面上项目(31470050)、国家社会科学基金“十二五”规划教育学一般课题(BHA130053)、教育部人文社会科学研究青年基金项目(12YJC190016)、广东省教育科学“十二五”规划2011年度研究项目(2011TJK161)、广东省高等院校学科建设专项资金项目育苗工程(人文社科) (2012WYM_0108)、广州市教育科学十二五规划2012年度面上一般课题(12A019)。

Estimating Variance Components of Missing Data for Generalizability Theor

ZHANG Minqiang¹; ZHANG Wenyi²; LI Guangming¹; LIU Xiaoyu³; Huang Feifei¹

(¹ School of Psychology, Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, China) (² Management School, Jinan University, Guangzhou 510632, China) (³School of Education Science, South China Normal University, Guangzhou 510631, China)

Received:2013-07-18 Online:2014-12-25 Published:2014-12-25
Contact: ZHANG Minqiang, E-mail: 2640726401@qq.com; ZHANG Wenyi, Zhangwenyi25@hotmail.com; LI Guangming, E-mail: Lgm2004100@sina.com

摘要/Abstract

摘要：

各种心理调查、心理实验中, 数据的缺失随处可见。由于数据缺失, 给概化理论分析非平衡数据的方差分量带来一系列问题。基于概化理论框架下, 运用Matlab 7.0软件, 自编程序模拟产生随机双面交叉设计p×i×r缺失数据, 比较和探讨公式法、REML法、拆分法和MCMC法在估计各个方差分量上的性能优劣。结果表明：(1) MCMC方法估计随机双面交叉设计p×i×r缺失数据方差分量, 较其它3种方法表现出更强的优势; (2) 题目和评分者是缺失数据方差分量估计重要的影响因素。

关键词: 概化理论, 缺失数据, 方差分量估计, p×i×r设计, MCMC方法

Abstract:

Missing observations are common in operational performance assessment settings or psychological surveys and experiments. Since these assessments are time-consuming to administer and score, examinees seldom respond to all test items and raters seldom evaluate all examinee responses. As a result, a frequent problem encountered by those using generalizability theory with large-scale performance assessments is working with missing data. Data from such examinations compose a missing data matrix. Researchers usually concern about how to make good use of the full data and often ignore missing data. As for these missing data, a common practice is to delete them or make an imputaion for missing records; however, it may cause problems in following aspects. Firstly, deleting or interpolating missing data may result in ineffective statistical analysis. Secondly, it is difficult for researchers to choose an unbiased method among diverse rules of interpolation. As a result of missing data, a series of problems may be caused when estimating variance components of unbalanced data in generalizability theory. A key issue with generalizability theory lies in how to effectively utilize the existing missing data to their maximum statistical analysis capacity. This article provides four methods to estimate variance components of missing data for unbalanced random p×i×r design of generalizability theory: formulas method, restricted maximum likelihood estimation (REML) method, subdividing method, and Markov Chain Monte Carlo (MCMC) method. Based on the estimating formulas of p×i design by Brennan (2001), formulas method is the deduction of estimating variance components formulas for p×i×r design with missing data. The aim of this article is to investigate which method is superior in estimating variance components of missing data rapidly and effectively. MATLAB 7.0 was used to simulate data, and generalizability theory was used to estimate variance components. Three conditions were simulated respectively: (1) persons sample with small size (200 students), medium size (1000 students) and large size (5000 students); (2) item sample with 2 items, 4 items and 6 items; (3) raters sample with 5 raters, 10 raters and 20 raters. The authors also developed some programs for MATLAB, WinBUGS, SAS and urGENOVA software in order to estimate variance components of p×i×r missing data with four methods. Criterions were made for the purpose of comparing the four methods. For example, bias was the criterion when estimating variance components. The reliability of the results increased as the absolute bias decreased. Results indicate that: (1) MCMC method has a strong advantage for estimating variance components of p×i×r missing data over the other three methods. MCMC method is superior to formulas method because of smaller deviation for variance components estimation. It is better than REML method because iteration of MCMC method converge, while REML method does not. Unlike subdividing method, MCMC method does not require variance components to be combined in order to obtain accurate estimations. (2) Item and rater are two important influencing factors for estimating variance components of missing data. If manpower and material resources are limited, priority should be given to increase the number of items in order to increase estimation accuracy. If researchers cannot increase the number of items, the next-best thing is to increase the number of raters. However, the number of raters should be cautiously controlled.

Key words: Generalizability Theory, missing data, estimating variance components, p×i×r design, Markov Chain Monte Carlo (MCMC)

张敏强;张文怡;黎光明;刘晓瑜;黄菲菲. (2014). 概化理论缺失数据方差分量估计. 心理学报, 46(12), 1897-1909.

ZHANG Minqiang; ZHANG Wenyi; LI Guangming; LIU Xiaoyu; Huang Feifei. (2014). Estimating Variance Components of Missing Data for Generalizability Theor. Acta Psychologica Sinica, 46(12), 1897-1909.

[1]	游晓锋, 杨建芹, 秦春影, 刘红云. 认知诊断测评中缺失数据的处理：随机森林阈值插补法[J]. 心理学报, 2023, 55(7): 1192-1206.
[2]	宋枝璘, 郭磊, 郑天鹏. 认知诊断缺失数据处理方法的比较：零替换、多重插补与极大似然估计法[J]. 心理学报, 2022, 54(4): 426-440.
[3]	黎光明, 秦越. 一种基于进化算法的概化理论最佳样本量估计新方法：兼与三种传统方法比较[J]. 心理学报, 2022, 54(10): 1262-1276.
[4]	王孟成;邓俏文. 缺失数据的结构方程建模：全息极大似然估计时辅助变量的作用[J]. 心理学报, 2016, 48(11): 1489-1498.
[5]	罗照盛;郭小军. 认知行为实验研究中最佳素材容量的选择与确定：多元概化理论应用[J]. 心理学报, 2014, 46(6): 876-884.
[6]	黎光明;张敏强. 校正的Bootstrap方法对概化理论方差分量及其变异量估计的改善[J]. 心理学报, 2013, 45(1): 114-124.
[7]	黎光明,张敏强. 基于概化理论的方差分量变异量估计[J]. 心理学报, 2009, 41(09): 889-901.
[8]	俞宗火,唐小娟,王登峰. GT与IRT的比较: 北京奥运会男子10米跳台跳水分析 [J]. 心理学报, 2009, 41(08): 773-784.
[9]	曾莉,辛涛,张淑梅. 2PL模型的两种马尔可夫蒙特卡洛缺失数据处理方法比较[J]. 心理学报, 2009, 41(03): 276-282.
[10]	杨志明,张雷,马世晔. 从多元概化理论看高考综合能力测试的改进[J]. 心理学报, 2004, 36(02): 195-200.
[11]	严芳,李伟明. 用结构方程建模(SEM)估计概化理论(GT)中的评分者信度[J]. 心理学报, 2002, 34(05): 92-97.
[12]	杨志明,张雷. 用多元概化理论对普通话的测试[J]. 心理学报, 2002, 34(01): 51-56.
[13]	李伟明,严芳. 概化理论中的模型选择、数据解释和指标比较——评刘远我等的两篇论文[J]. 心理学报, 2001, 33(5): 84-87.
[14]	张雷,侯杰泰,何伟杰,文剑冰,王渝光. 普通话测试的录音评分可行性、信度及经济效率[J]. 心理学报, 2001, 33(2): 97-103.
[15]	刘远我,张厚粲. 概化理论在作文评分中的应用研究[J]. 心理学报, 1998, 30(2): 211-218.

概化理论缺失数据方差分量估计

Estimating Variance Components of Missing Data for Generalizability Theor

PDF (PC)

评审附件

可视化

English Version

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价