国家公务员结构化面试中评委偏差的IRT分析

心理学报 ›› 2006, Vol. 38 ›› Issue (04): 614-625. cstr: 32110.14.2006.00614

国家公务员结构化面试中评委偏差的IRT分析

孙晓敏;张厚粲

北京师范大学心理学院，北京100875

收稿日期:2005-08-10 修回日期:1900-01-01 发布日期:2006-07-30 出版日期:2006-07-30

An IRT Analysis of Rater Bias in Structured Interview of National Civilian Candidates

Sun-Xiaomin,Zhang Houcan-

School of Psychology, Beijing Normal University, Beijing 100875, China

Received:2005-08-10 Revised:1900-01-01 Online:2006-07-30 Published:2006-07-30

摘要/Abstract

摘要： 使用项目反应理论(IRT)中的多面Rasch模型，对两组共12名评委在国家公务员结构化面试中的评委偏差进行了分析。提出并验证了两种评委偏差：评委之间在宽严程度上的差异和评委自身的一致性问题。结果发现：不同评委之间在宽严程度上差异显著，且不同评委评定行为的跨考生、跨维度、跨性别、跨时间的自身一致性也存在差异。研究表明，这种进入到评委个体层次的分析突破了经典测量理论(CTT)定位于评委群体进行分析的局限，针对每位评委的偏差行为提供了详细具体的诊断信息，从而为评委的针对性培训和评委库的建立提供了现代测量学的新方法

关键词: 结构化面试, 评委偏差, 项目反应理论

Abstract: Abstract
Introduction
Structured interview is one of the most important ways in personnel selection. The existence of rater bias, however, could threatens its reliability and validity to a great extent. Different test theories give different solutions to this problem. Many-faceted Rasch Model (MFRM), an extension to Rasch model, overcomes the shortcomings of Classical Test Theory (CTT). By parameterizing not only interviewee’s ability and item difficulty but also judge severity, MFRM offers an effective way to estimate rater bias and provides detailed information of rater’s bias behavior. This study divided rater bias into two kinds -- intra-rater inconsistency and between-rater difference in stringency -- and used MFRM to analyze these two kinds of rater bias.
Method
Data comes from a structured interview of national civilian candidates. There were 200 interviewees and 21 raters who were randomized into 3 panels in the morning of each of two days. Rating scores of two panels were used in this study. The first 34 interviewees (numbers 1 through 34) were interviewed by raters A、B、C、D、E、F、G in the first morning. Interviewees 35 to 66 were interviewed by raters A、E、H、I、J、K、L in the second morning. Using a 10-point rating scale (1 to 10), each rater rated each interviewee independently on five dimensions.
Using FACETS3.55.0, a computer program based on MFRM, we examined between-rater differences in stringency and intra-rater inconsistency. Rater bias across candidates, rating dimensions, gender, and time periods are further examined by bias analysis provided by FACETS.
Results
(1) In the structured interview, rater severity differed from each other significantly;
(2) In the structured interview, raters demonstrated different levels of internal consistency. The specific behavior of the inconsistent raters and the overly consistent raters were identified.
(3) Raters also showed different pattern of rater bias across candidates, rating dimensions, gender, and time periods.
Conclusions
The results suggest the existence of two rater bias sources. The application of MFRM in the analysis of structured interview offers an effective way to fairly select competent national civilian candidates, and provides valuable information for the selection of qualified raters, the identification of each rater’s strong points and shortcomings, which is useful for the construction of rater bank and further training of incompetent raters

Key words: Structured Interview, rater bias, IRT

中图分类号:

B841

孙晓敏,张厚粲. (2006). 国家公务员结构化面试中评委偏差的IRT分析. 心理学报, 38(04), 614-625.

Sun-Xiaomin,Zhang Houcan-. (2006). An IRT Analysis of Rater Bias in Structured Interview of National Civilian Candidates. , 38(04), 614-625.

[1]	付颜斌, 陈琦鹏, 詹沛达. 问题解决任务中行动序列的二分类建模：单/两参数行动序列模型[J]. 心理学报, 2023, 55(8): 1383-1396.
[2]	童昊, 喻晓锋, 秦春影, 彭亚风, 钟小缘. 多级计分测验中基于残差统计量的被试拟合研究[J]. 心理学报, 2022, 54(9): 1122-1136.
[3]	任赫, 陈平. 两种新的多维计算机化分类测验终止规则[J]. 心理学报, 2021, 53(9): 1044-1058.
[4]	罗芬, 王晓庆, 蔡艳, 涂冬波. 基于基尼指数的双目标CD-CAT选题策略[J]. 心理学报, 2020, 52(12): 1452-1465.
[5]	陈平. 两种新的计算机化自适应测验在线标定方法[J]. 心理学报, 2016, 48(9): 1184-1198.
[6]	孟祥斌;陶剑;陈莎莉. 四参数Logistic模型潜在特质参数的 Warm加权极大似然估计[J]. 心理学报, 2016, 48(8): 1047-1056.
[7]	汪文义; 宋丽红;丁树良. 复杂决策规则下MIRT的分类准确性和分类一致性[J]. 心理学报, 2016, 48(12): 1612-1624.
[8]	詹沛达;陈平;边玉芳. 使用验证性补偿多维IRT模型进行认知诊断评估[J]. 心理学报, 2016, 48(10): 1347-1356.
[9]	詹沛达;李晓敏;王文中;边玉芳;王立君. 多维题组效应认知诊断模型[J]. 心理学报, 2015, 47(5): 689-701.
[10]	姚若松;赵葆楠;刘泽;苗群鹰. 无领导小组讨论的多侧面Rasch模型应用[J]. 心理学报, 2013, 45(9): 1039-1049.
[11]	杜文久;周娟;李洪波. 二参数逻辑斯蒂模型项目参数的估计精度[J]. 心理学报, 2013, 45(10): 1179-1186.
[12]	刘红云,李冲,张平平,骆方. 分类数据测量等价性检验方法及其比较：项目阈值(难度)参数的组间差异性检验[J]. 心理学报, 2012, 44(8): 1124-1136.
[13]	杜文久;肖涵敏. 多维项目反应理论等级反应模型[J]. 心理学报, 2012, 44(10): 1402-1407.
[14]	刘红云,骆方,王玥,张玉. 多维测验项目参数的估计：基于SEM与MIRT方法的比较[J]. 心理学报, 2012, 44(1): 121-132.
[15]	涂冬波,蔡艳,戴海琦,丁树良. 多维项目反应理论：参数估计及其在心理测验中的应用[J]. 心理学报, 2011, 43(11): 1329-1340.