多题多做测验模型及其应用

心理学报 ›› 2007, Vol. 39 ›› Issue (04): 730-736.

多题多做测验模型及其应用

丁树良;罗芬;戴海琦;朱玮

江西师范大学计算机信息工程学院，南昌 330027

收稿日期:2004-11-26 修回日期:1900-01-01 发布日期:2007-07-30 出版日期:2007-07-30
通讯作者: 丁树良

The Development of Multiple-Attempt, Multiple-Item Test Models and Their Applications

Ding Shuliang,Luo Fen,Dai Haiqi,Zhu Wei

College of Computer information engineering, Jiangxi Normal University, Nanchang 330027,China

Received:2004-11-26 Revised:1900-01-01 Online:2007-07-30 Published:2007-07-30
Contact: Ding Shuliang

摘要/Abstract

摘要： 在IRT框架下，建立了0-1评分方式下单维双参数Logistic多题多做（MAMI）测验模型。与Spray给出的一题多做（MASI）模型相比，MAMI不仅模型更加精致，而且扩展了适用范围，参数估计方法也不同，采用EM算法求取项目参数。Monte Carlo模拟结果显示，应用MAMI测验模型与测验题量作相应增加的作法相比，两者给出的能力估计精度相同，但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比，项目参数的估计精度相同，但MAMI给出的能力参数估计精度更高。这个发现表明，在一定条件下若允许修改答案，并采用累加式记分方式，纵使题量不变，也可使能力估计的精度相当于题量增加一倍的估计精度，而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值，而且对数据的处理方式也有参考价值

关键词: 多题多做模型, EM算法, 参数估计精度

Abstract: Three one-parameter item response theory (IRT) models were proposed by Spray to describe score probabilities of an examinee who takes a multiple attempts, single-item (MASI) test of a psychomotor skill. However, if students are encouraged to check and modify their answers in a test, the phenomenon could be regarded as multiple-attempt, multiple-item (MAMI) test. To describe the MAMI test, a two-parameter IRT MAMI model (Binomial trails model) was proposed and an item parameter estimation procedure was formulated in this paper. Three assumptions about the model were made. The first two were the same as in the ordinary IRT, the unidimesionality and the local independence. The third assumption was another kind of local independence, which required that the individual trails or attempts be independent for a given examinee.
The model and the estimation procedure developed in this article were evaluated using simulated data. Test consisted 60 items and sample size was 1000 in this simulation. The simulated data were generated 50 times. The ability parameters, difficulty parameters, the logarithm of the discrimination parameters were drawn from the standard normal distribution N(0,1). Three different methods estimated procedure (MMLE/EM for MAMI model, MMLE/EM for BILOG, MMLE/EM for ordinary IRT) were used to analyze this simulation data. MMLE/EM for MAMI model means that the elements in the score matrix were the sum of the original score and the modified score (cumulating score scheme) when the examinees modified their answers. MMLE/EM for ordinary IRT means that the score matrix was the original score matrix and the modified score matrix obtained form (from?) repeated response were lengthened in row (as if the number of the examinees were double, in brief, lengthened score matrix) and were widened in column (as if the length of the test were double, in brief, widened score matrix) separately when the examinees modified their answers.
The mean of the absolute difference between the estimated and the correspondent simulated value of the parameters (ABS), the bias and root mean square error (SD) of the estimated values of the parameters were computed for each item parameter across 50 replications.
The results of simulations showed that:
1. The accuracy in terms of ABS and SD of estimating the ability parameters in MAMI model was higher than that obtained by MMLE/EM used in ordinary IRT for the score matrix being lengthened.
2. The accuracy of estimating the item parameters in MAMI model was higher than that obtained by MMLE/EM used in ordinary IRT for the score matrix being widened.
These findings indicate that when MAMI appears, the cumulating score scheme is more reasonable than the traditional scoring scheme, in which only the last response is collected. This finding may motivate researchers to consider how to score when the skill test allows reviewing and changing answers

Key words: multiple-attempt multiple-item model, EM algorithm, accuracy of estimating parameter

中图分类号:

B841

丁树良,罗芬,戴海琦,朱玮. (2007). 多题多做测验模型及其应用. 心理学报, 39(04), 730-736.

Ding Shuliang,Luo Fen,Dai Haiqi,Zhu Wei. (2007). The Development of Multiple-Attempt, Multiple-Item Test Models and Their Applications. , 39(04), 730-736.

[1]	游晓锋, 杨建芹, 秦春影, 刘红云. 认知诊断测评中缺失数据的处理：随机森林阈值插补法[J]. 心理学报, 2023, 55(7): 1192-1206.
[2]	杜文久;周娟;李洪波. 二参数逻辑斯蒂模型项目参数的估计精度[J]. 心理学报, 2013, 45(10): 1179-1186.
[3]	朱玮,丁树良,陈小攀. IRT中最小化χ2/EM参数估计方法[J]. 心理学报, 2006, 38(03): 453-460.