ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报, 2020, 52(9): 1132-1142 doi: 10.3724/SP.J.1041.2020.01132

研究报告

多维对数正态作答时间模型:对潜在加工速度多维性的探究

詹沛达,1, Hong Jiao2, Kaiwen Man3

1浙江师范大学教师教育学院心理学系, 金华 321004

2Measurement, Statistics, and Evaluation, Department of Human Development and Quantitative Methodology, University of Maryland, College Park, Maryland, United States

3Educational Studies in Psychology, Research Methodology, and Counseling, The University of Alabama, Tuscaloosa, United States

The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed

ZHAN Peida,1, Hong JIAO2, Kaiwen MAN3

1Department of Psychology, College of Teacher Education, Zhejiang Normal University, Jinhua, 321004, China

2Measurement, Statistics, and Evaluation, Department of Human Development and Quantitative Methodology, University of Maryland, College Park, Maryland, United States

3Educational Studies in Psychology, Research Methodology, and Counseling, The University of Alabama, Tuscaloosa, United States

通讯作者: 詹沛达, E-mail:pdzhan@gmail.com

收稿日期: 2020-03-2   网络出版日期: 2020-09-25

基金资助: * 国家自然科学基金青年基金项目.  31900795

Received: 2020-03-2   Online: 2020-09-25

摘要

在心理与教育测量中, 潜在加工速度反映学生运用潜在能力解决问题的效率。为在多维测验中探究潜在加工速度的多维性并实现参数估计, 本研究提出多维对数正态作答时间模型。实证数据分析及模拟研究结果表明:(1)潜在加工速度具有与潜在能力相匹配的多维结构; (2)新模型可精确估计个体水平的多维潜在加工速度及与作答时间有关的题目参数; (3)冗余指定潜在加工速度具有多维性带来的负面影响低于忽略其多维性所带来的。

关键词: 题目作答时间 ; 多维潜在加工速度 ; 题目作答理论 ; 计算机化测验 ; PISA

Abstract

With the popularity of computer-based testings, the collection of item response times (RTs) and other process data has become a routine in large- and small-scale psychological and educational assessments. RTs not only provide information about the processing speed of respondents but also could be utilized to improve the measurement accuracy because the RTs are considered to convey a more synoptic depiction of the participants’ performance beyond responses alone. In multidimensional assessments, various skills are often required to answer questions. The speed at which persons were applying a set of skills reflecting distinct cognitive dimensions could be considered as multidimensional as well. In other words, each latent ability was measured simultaneously with its corresponding working efficiency of applying a facet of skills in a multidimensional test. For example, the latent speed corresponding to the latent ability of decoding of an algebra question may differ from encoding. Therefore, a multidimensional RT model is needed to accommodate this scenario, which extends various currently proposed RT models assuming unidimensional processing speed.

To model the multidimensional structure of the latent processing speed, this study proposed a multidimensional log-normal response time model (MLRT) model, which is an extension of the unidimensional log-normal response time model (ULRTM) proposed by van der Linden (2006). Model parameters were estimated via the full Bayesian approach with the Markov chain Monte Carlo (MCMC). A PISA 2012 computer-based mathematics RT dataset was analyzed as a real data example. This dataset contains RTs of 1581 participants for 9 items. A Q-matrix (see Table 1) was prespecified based on the PISA 2012 mathematics assessment framework (see Zhan, Jiao, Liao, 2018); three dimensions were defined based on the mathematical content knowledge, which are: 1) change and relationships (θ1), 2) space and shape (θ2), and, 3) uncertainty and data (θ3). One thing to note is that the defined Q-matrix served as a bridge to link items to the corresponding latent abilities, which shows the multidimensional structure of latent abilities. First, exploratory factor analysis (EFA) was conducted with the real dataset to manifest the multidimensional structure of the processing speed. Second, two RT models, i.e., the ULRTM and the MLRTM, were fitted to the data, and the results were compared. Third, a simulation study was conducted to evaluate the psychometric properties of the proposed model.

The results of the EFA indicated that the latent processing speed has a three-dimensional structure, which matches with the theoretical multidimensional structure of the latent abilities (i.e., the Q-matrix in Table 1). Furthermore, the ULRTM and the MLRTM yield adequate model data fits according to the posterior predictive model checking values (ppp = 0.597 for the ULRTM and ppp = 0.633 for the MLRTM). Furthermore, by comparing the values of the -2LL, DIC, and WAIC across the ULRTM and the MLRTM, the results indicate that the MLRTM fits the data better. In addition, the results show that (1) the correlations among three dimensions vary from medium to large (from 0.751 to 0.855); (2) the time-intensity parameters estimates of the two models were similar to each other. However, in terms of the time-discrimination parameters, the estimates of the ULRTM were slightly lower than the MLRTM. Moreover, the results from the simulation study show: 1) the model parameters were fully recovered with the Bayesian MCMC estimation algorithm; 2) the item time-discrimination parameter could be underestimated if the multidimensionality of the latent processing speed gets ignored, which meets our expectation, whereas the item time-intensity parameter stayed the same.

Overall, the proposed MLRTM performed well with the empirical data and was verified by the simulation study. In addition, the proposed model could facilitate practitioners in the use of the RT data to understand participants’ complex behavioral characteristics.

Keywords: item response times ; multidimensional latent processing speed ; item response theory ; computer-based testing ; PISA

PDF (1168KB) 元数据 多维度评价 相关文章 导出 EndNote| Ris| Bibtex  收藏本文

本文引用格式

詹沛达, Hong Jiao, Kaiwen Man. 多维对数正态作答时间模型:对潜在加工速度多维性的探究. 心理学报[J], 2020, 52(9): 1132-1142 doi:10.3724/SP.J.1041.2020.01132

ZHAN Peida, Hong JIAO, Kaiwen MAN. The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed. Acta Psychologica Sinica[J], 2020, 52(9): 1132-1142 doi:10.3724/SP.J.1041.2020.01132

1 引言

近些年, 随着计算机化测验的普及, 对题目作答时间(response times, RT)及其他过程数据的采集已趋于常态化。例如, 自2012年以来, 国际学生能力评估项目(PISA)就开始采用计算机化测验采集学生的RT数据。已有研究表明, RT数据作为传统作答精度数据外的一种补充, 不仅能够提供学生在问题解决中的加工速度信息, 在联合分析中还可以提高对潜在能力的测量精度(Bolsinova & Tijmstra, 2018; van der Linden, Klein Entink, & Fox, 2010; 詹沛达, 2019)。因此, 近些年对RT数据的分析成为了国内外心理与教育测量领域的新热点之一。

研究者基于认知心理学理论和实验研究提出了多种RT模型(参见de Boeck & Jeon, 2019; 郭磊, 尚鹏丽, 夏凌翔, 2017)。其中, 速度-精度权衡(speed-accuracy trade-off)是一些早期RT模型所探讨的主要议题(例如, Ferrando & Lorenzo-Seva, 2007; Wang & Hanson, 2005), 即对于特定的任务, 被试的加工速度越快则其加工精度(或成功率)越低; 反之, 被试的加工速度越慢则其加工精度越高。然而, 该权衡反映的是加工速度与加工精度在个体内(within-person)的关系(van der Linden, 2009), 无法通过横断研究/测验来评估(Curran & Bauer, 2011)。通常, 对于一组固定的任务/题目, 一旦被试的加工速度被固定, 那么其加工精度也是固定的; 因此, 建议对加工速度和加工精度分别建模, 而与之相对应的潜在加工速度和潜在能力之间的关系可以在更高的层次上建构(van der Linden, 2006, 2007, 2009)。当前, 使用最多的是对数正态RT模型(lognormal RT model, LRTM) (van der Linden, 2006), 也有一些研究对该模型做了进一步拓广(例如, 孟祥斌, 2016; Klein Entink, van der Linden, & Fox, 2009; Wang, Chang, & Douglas, 2013)。

为进一步探究潜在加工速度与潜在能力之间的关系, van der Linden (2007)提出了贝叶斯层级建模框架。该框架的基本逻辑是, 在个体内, 潜在加工速度对RT的影响和潜在能力对作答精度(response accuracy, RA)的影响是相互独立的; 而在群体内(即个体间), 潜在加工速度与潜在能力之间具有相关关系。鉴于该框架的灵活性, 通过替换不同的测量模型已形成多种联合模型(例如, 詹沛达, 2019; Guo, Luo, & Yu, 2020; Lu, Wang, Zhang, & Tao, 2019; Man, Harring, Jiao, & Zhan, 2019; Wang & Xu, 2015; Wang, Zhang, Douglas, & Culpepper, 2018; Zhan, Jiao, & Liao, 2018)。但目前, 绝大多数联合模型都仅适用于单维测验, 即使用单维题目作答理论(item response theory)模型来分析RA数据并使用单维RT模型来分析RT数据; 而仅有的几个模型虽然关注到了潜在能力的多维性问题, 但仍假设潜在加工速度是单维的, 进而使用多维IRT (multidimensional IRT, MIRT)模型分析RA数据并仍使用单维RT模型来分析RT数据(詹沛达, 2019; Man et al., 2019; Wang, Weiss, & Su, 2019; Zhan, Jiao et al., 2018)。导致该问题的主要原因是目前尚未有研究者关注到潜在加工速度可能存在多维性的问题, 同时也缺少相应的分析模型。

在心理和教育测量中, 关于潜在加工速度的一个恰当的概念是劳动的速度(speed of labor) (van der Linden, 2009)。因此, 潜在加工速度可被定义为“解答某题目时所付出劳动与所花费时间的比例(a rate of the amount of labor performed on the items with respect to time)” (van der Linden, 2011)。潜在加工速度反映了学生运用潜在能力(例如, 知识或技能)来解决问题的效率。针对同一道题目, 学生消耗的作答时间越少表明其潜在加工速度越快, 反映出学生运用该题目所需的知识或技能的效率越高。在多维测验中, 由于潜在能力的多维性, 潜在加工速度应该在特定的测验维度中与潜在能力一起讨论, 即潜在加工速度也可能具有与潜在能力相匹配的多维结构。换句话说, 被试在每个测验维度上的潜在加工速度与该维度所需的潜在能力相匹配。例如, 被试在解码任务中的潜在加工速度与该任务所需的解码能力相匹配, 而被试在编码任务中的潜在加工速度与该任务所需的编码能力相匹配。再比如, 当非英语母语被试参加GRE学科测验(例如, 数学或英语文学)时, 至少需要两个潜在能力, 一个用于理解题目(例如, 英语阅读能力), 一个用于解决问题(例如, 学科能力)。这会涉及到对应的两个潜在加工速度, 一个反映理解题目的速度, 一个反映解决问题的速度。

对此, 本研究假设:在多维测验中, 潜在加工速度具有与潜在能力相匹配的多维结构。已有一些认知心理学证据可能支持该假设。首先, 不同的大脑区域工作对应于不同的认知加工功能, 适当的行为表现取决于特定大脑区域之间的相互作用(Horwitz, Tagamets, & McIntosh, 1999; Mesulam, 1990), 这也是功能磁共振成像(fMRI)和脑电图(EEG)的基本逻辑。从概念上讲, 不同认知任务所需的不同认知加工功能具有不同的认知加工速度。其次, 与在实验心理学中用来记录反应时(reaction time)的简单刺激任务(例如, 数字广度任务[digit-span task]等其他不涉及特定陈述性和程序性知识的刺激任务)不同, 心理和教育测量中的题目始终是对特定认知建构或能力的测查。因此, 在心理和教育测量中观察到的RT应包括两个部分:用于加工所有信息的基本反应时和运用特定潜在能力所消耗的时间。鉴于题目水平的RT无法区分两者, 所以必须将它们视为一个整体来看待。此时, 我们可以使用“特定维度的加工时间(dimension-specific processing time)”来指代题目水平RT, 并使用“特定维度的加工速度(dimension-specific processing speed)”来指代多维潜在空间中特定维度中的加工速度。因此, 与潜在能力一样, 潜在加工速度的维度数也可由测验所包含的维度数来确定。

目前, 尽管针对RA的MIRT模型已经得到较好的发展(Reckase, 2009), 但尚缺乏可分析多维潜在加工速度的多维RT (multidimensional RT, MRT)模型。如上文所述, 近期已有一些研究尝试使用MIRT模型来分析多维潜在能力, 但仍使用URT模型来分析可能存在的多维潜在加工速度(詹沛达, 2019; Man et al., 2019; Wang et al., 2019)。然而, 由于缺少MRT模型, 上述研究仅能估计学生的多个潜在能力和一个潜在加工速度。从逻辑上讲不同的潜在能力应与不同的潜在加工速度相匹配; 因此, 强制将多个潜在加工速度约束为一个变量的做法具有局限性, 可能导致推论不准确。在多维测验中, 尽管单维潜在加工速度可以被解释为被试的一般或高阶潜在加工速度, 但实际上, 我们仍渴望知道被试在每一个子维度上的潜在加工速度。因此, 开发相应的MRT模型是有必要的。

为解决上述问题, 本研究提出了多维对数正态RT模型(multidimensional LRTM, MLRTM)。该模型可视为对单维对数正态RT模型(unidimensional LRTM, ULRTM) (van der Linden, 2006)的拓广。首先, 简单回顾了ULRTM; 其次, 提出了MLRTM; 然后, 对2012年PISA计算机化数学测验中RT数据进行了探索性因素分析以探究潜在加工速度的多维结构, 使用新提出的模型对该数据做进一步分析, 并与ULRTM进行对比, 以展现新模型的实际可应用性和相对优势; 随后, 通过一则模拟研究来探究新模型的心理计量学性能; 最后, 总结了研究结果并讨论了未来的研究方向。

2 多维对数正态作答时间模型

2.1 模型建构

在介绍MLRTM前, 我们先简单回顾下ULRTM。设定Tni为学生n (n = 1, …, N)对题目i (i = 1, …, I)的作答时间。则ULRTM可表示为

$\log {{T}_{ni}}={{\xi }_{i}}-{{\tau }_{n}}+{{\varepsilon }_{ni}},\ {{\varepsilon }_{ni}}\tilde{\ }N(0,\ \omega _{i}^{-2})$,

$\log {{T}_{ni}}\tilde{\ }N({{\xi }_{i}}-{{\tau }_{n}},\ \omega _{i}^{-2})$.

其中, ξi为题目时间强度参数, 表示解答题目i所必需的时间; τn是学生n的潜在加工速度, 假定其满足${{\tau }_{n}}\tilde{\ }N(0,\sigma _{\tau }^{2})$; εni为残差; ωi是残差的标准差的倒数, 可以将其视为题目时间区分度参数。ULRTM的基本假设之一是logTni在给定单维τn时满足条件独立。

在心理与教育测量中, 主要有两种多维测验类型:题目内(within-item)和题目间(between-item) (Adams, Wilson, & Wang, 1997)。在题目间多维测验中, 每个题目仅测量一个维度的潜在能力, 但不同题目可能会测量不同维度的潜在能力; 而在题目内多维测验中, 一个题目可能同时测量多个维度的潜在能力。从理论上讲, 题目间多维度是题目内多维度的一个特例, 因此, 本研究借鉴题目内多维度的表达式来建构MLRTM。则MLRTM可表示为

$\log {{T}_{ni}}={{\xi }_{i}}-\sum\limits_{k=1}^{K}{{{\tau }_{nk}}{{q}_{ik}}}+{{\varepsilon}_{ni}},\ {{\varepsilon }_{ni}}\tilde{\ }N(0,\ \omega _{i}^{-2})$,

$\log {{T}_{ni}}\tilde{\ }N\left( {{\xi }_{i}}-\sum\limits_{k=1}^{K}{{{\tau }_{nk}}{{q}_{ik}}},\ \omega _{i}^{-2} \right)$,

其中, τnk是学生n在维度k (k = 1, 2, …, K)上的潜在加工速度, 反映了学生n运用第k维度潜在能力来解决问题的效率; τn = (τn1, …, τnk, ..., τnK)′是遵循多元正态分布的多维潜在加工速度向量:${{\tau }_{n}}\tilde{\ } N({{\mu }_{\tau }},{{\mathbf{\Sigma }}_{\tau }})$, 其中均值向量μτ = (μ1, …, μk, …, μK)′和方差-协方差矩阵Στ, μk是维度k上学生总体的平均加工速度。为使模型可识别, 将μτ设置为0向量。Q矩阵(Tatsuoka, 1983)是一个I × K的验证性矩阵, 其中qik = 1表示题目i归属于维度k, 反之qik = 0。对于题目间多维度, qi中只有一个元素等于1; 对于题目内多维度, qi中有多个元素等于1。其他参数与ULRTM中的参数相同。在MLRTM中, 假定logTni在给定τn的情况下满足条件独立。此外, 若假定测验中所有题目仅考查同一个维度, 则MLRTM等价于ULRTM。

2.2 贝叶斯参数估计

本研究使用全贝叶斯马尔可夫链蒙特卡洛算法对MLRTM进行参数估计, 并基于MultiBUGS (version 1.0) (Goudie, Turner, de Angelis, & Thomas, 2017)实现。感兴趣的读者可向通讯作者索取MultiBUGS代码, MLRTM中各待估计参数的先验分布设定详见附录。

3 实证数据分析

3.1 潜在加工速度多维结构的探索

如上文所述, 本研究的基本假设是, 在多维测验中, 潜在加工速度具有与潜在能力相匹配的多维结构。为了探索潜在加工速度的多维性, 并探究潜在加工速度的多维结构是否与潜在能力的多维结构相匹配, 我们拟对一则RT实证数据进行探索性因素分析。

3.1.1 数据描述

本研究选用2012年PISA计算机化数学测验中的RT数据。该数据集最初由Zhan, Jiao et al. (2018)使用。该数据包含N = 1581名学生对I = 9道题目的作答。原始RT数据均事先求取对数, 并将所有0视为缺失数据。Zhan, Jiao等(2018)根据2012年PISA数学测评框架(OECD, 2013)设定了Q矩阵1( Q矩阵本质上只是一个验证性矩阵,用于界定题目与潜在变量之间的关系,其使用范围并不局限在认知诊断领域,且其中的潜在变量也并不限定于知识、技能等细颗粒属性。), 本研究选择了属于数学内容知识的三个维度, 即变化和关系(θ1), 空间和形状(θ2), 以及不确定性和数据(θ3), 见表1。需要强调的是, 该Q矩阵界定了题目和潜在能力之间关系, 即该Q矩阵表达的是RA数据背后的潜在能力的多维结构。此时, 若该Q矩阵与通过对RT数据进行探索性因素分析发现的潜在结构(即RT数据背后的潜在加工速度的结构)相匹配, 就可说明潜在加工速度具有与潜在能力相匹配的多维结构。

表1   2012年PISA计算机化数学测验的Q矩阵

题目θ1θ2θ3
CM015Q02D1
CM015Q03D1
CM020Q011
CM020Q021
CM020Q031
CM020Q041
CM038Q03T1
CM038Q051
CM038Q061

注:空白表示“0”。

新窗口打开| 下载CSV


3.1.2 探索性因素分析

本研究使用Mplus (version 8.1) (Muthén & Muthén, 2019)进行探索性因素分析。Mplus默认使用验证性因素分析框架下的探索性因素分析, 本研究将保留因素数量设为从1到5。根据模型-数据拟合指标(例如, AIC和BIC)来确定因素数量以及相应的潜在结构。理论上, 多个维度之间应该存在相关, 因此使用斜交旋转。其他均采用默认设置。

表2给出了探索性因素分析的模型-数据拟合指标。前人研究表明TLI > 0.95, CFI > 0.95, SRMR < 0.08, RMSEA < 0.05意味着良好的模型-数据拟合(Hu & Bentler, 1999; Steiger, 1990)。综合各个指标, 可认为三因素模型比其他模型更适合该数据, 表明RT数据背后具有三维潜在结构。

表2   2012年PISA计算机化数学测验数据的探索性因素分析中的数据-模型拟合指标

Modelχ2dfTLICFIAICBICSRMRRMSEA [90% CI]
1-factor462.79**270.8960.92224592.1524737.030.0450.101 [0.093, 0.109]
2-factor225.49**190.9300.96324370.8524558.650.0320.083 [0.073, 0.093]
3-factor32.66**120.9890.99624192.0224417.380.0100.033 [0.020, 0.047]
4-factor5.5661.0001.00024176.9224434.480.0040.000 [0.000, 0.031]
5-factor0.0911.0061.00024181.4424465.830.0000.000 [0.000, 0.045]

注:**p < 0.01; TLI = Tucker-Lewis index; CFI = comparative fit index; AIC = Akaike information criterion; BIC = Bayesian information criterion; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; 90% CI = 90%置信区间。

新窗口打开| 下载CSV


表3给出了三因素模型的旋转因素载荷矩阵。可发现, 该因素载荷矩阵与表1中的Q矩阵相比, 仅题目CM038Q03T存在差异, 且CM038Q03T在因素3上的载荷为0.300 (p < 0.05)。因此, 可以说由理论构建的潜在能力的多维结构(即Q矩阵)与对RT数据进行探索性因素分析发现的潜在结构是相匹配的。该结果支持了本研究的核心假设, 即在多维测验中, 潜在加工速度具有与潜在能力相匹配的多维结构。因此, 后续研究可直接使用表1中的Q矩阵来表达RA和RT数据背后一致的多维潜在结构。当然, 由于探索性因素分析本身的限制, 我们无法获得每位学生的潜在加工速度估计值以及每道题目的题目参数。因此, 有必要进一步利用本研究提出的MLRTM进行数据分析。

表3   三因素模型的旋转因素载荷矩阵

题目因素1因素2因素3
CM015Q02D0.695*
CM015Q03D0.609*
CM020Q010.565*
CM020Q020.801*
CM020Q030.642*
CM020Q040.943*
CM038Q03T0.502*
CM038Q050.985*
CM038Q060.621*

注:* p < 0.05; 未呈现因素载荷0.4以下的值。

新窗口打开| 下载CSV


3.2 采用多维对数正态作答时间模型进行分析

3.2.1 分析

为实现对RT数据的深入分析, 本研究同时使用ULRTM和MLRTM分析该数据。探索性因素分析结果表明表1中的Q矩阵适用于描述题目和潜在加工速度之间的关系。在贝叶斯MCMC估计中设定2条马尔可夫链, 每条链包含5000次迭代(其中前2000次做burn-in), 最后保留两条链剩余的共6000次迭代进行参数估计推断。使用MC_error指标进行参数估计收敛检验(Ntzoufras, 2009), 本研究所有参数的MC_error均小于0.05, 表示参数估计已收敛。

本研究使用DIC和WAIC (Gelman et al., 2013, Chapter 7)作为模型-数据相对拟合指标进行模型选择。使用后验预测模型检验(posterior predictive model checking, PPMC)来评估模型-数据绝对拟合, 其中后验预测概率(posterior predictive probability, ppp)接近0.5表明模型与数据拟合。对PPMC而言选取一个合适的差异测度的必要的, 本研究选用被试n和题目i的标准化误差函数之和作为差异测量(Fox & Marianti, 2017)来评估RT模型的整体拟合情况:

$\begin{matrix} & D(\log T;\upsilon )=D(\log {{T}_{ni}};{{\xi }_{i}},{{\tau }_{n}},{{\omega }_{i}})= \sum\limits_{n=1}^{N}{\sum\limits_{i=1}^{I}{{{\left( {{\omega }_{i}}\left( \log {{T}_{ni}}-\left( {{\xi }_{i}}-\sum\limits_{k=1}^{K}{{{q}_{ik}}{{\tau }_{nk}}} \right) \right) \right)}^{2}}}} \\ \end{matrix}$.

3.2.2 结果

表4呈现了模型-数据拟合指标。其中, ULRTM和MLRTM的ppp值分别为0.597和0.633, 表明这两个模型均拟合该数据。进一步, 由-2LL、DIC和WAIC指标均表示MLRTM对该数据的拟合程度更高, 说明在多维测验中考虑潜在加工速度的多维性是更合适的。

表4   2012年PISA计算机化数学测验数据分析中模型-数据拟合指标

分析模型-2LLDICWAICppp
MLRTM1930522505220550.633
ULRTM2131022890227700.597

注:ULRTM = 单维对数正态作答时间模型; MLRTM = 多维对数正态作答时间模型; -2LL = -2 log likelihood; DIC = deviance information criterion; WAIC = widely available information criterion; ppp = 后验预测概率。

新窗口打开| 下载CSV


表5呈现了方差-协方差矩阵估计值。三个潜在加工速度之间的相关系数范围为0.751到0.855, 表明这三个潜在加工速度为中等偏高程度相关, 即三者之间有较高一致性但仍清晰可分。主要原因是三者都归属于数学内容知识这一更高阶的维度。另外, ULRTM中单维潜在加工速度的方差估计值为0.216 (95% CI = [0.197, 0.231]), 不仅无法区分不同维度上的潜在加工速度, 还低估了维度1 (变化和关系)和维度3 (不确定性和数据)上所有被试的潜在加工速度之间的差异性(即方差被低估)。

表5   2012年PISA计算机化数学测验数据分析中多维潜在加工速度的方差-协方差矩阵估计值

Σττ1τ2τ3
τ10.301 (0.016)
[0.270, 0.334]
0.7510.767
τ20.185 (0.010)
[0.167, 0.204]
0.202 (0.010)
[0.184, 0.220]
0.855
τ30.227 (0.012)
[0.206, 0.250]
0.208 (0.009)
[0.190, 0.226]
0.292 (0.013)
[0.266, 0.317]

注:τ = 潜在加工速度; Στ = 多维潜在加工速度的方差-协方差矩阵; 上三角阵为相关系数, 下三角阵为协方差; 小括号内为标准误(后验分布标准差); 中括号内为95%贝叶斯可信区间。

新窗口打开| 下载CSV


图1呈现了前20名被试的潜在加工速度估计值。根据MLRTM的估计结果, 每个被试在3个维度上的潜在加工速度都是不同的, 甚至有一些被试(例如, 被试2、6、7、12、15)在3个维度上的潜在加工速度估计值的正负号都不同。此时, 若使用ULRTM中的单维估计值作为被试的反馈信息(甚至基于此给被试贴上诸如“急先锋”或“慢郎中”的标签)势必过于笼统, 无法体现出被试在不同维度上潜在加工速度之间的差异。

图1

图1   2012年PISA计算机化数学测验数据分析中前20名被试潜在加工速度估计值

注:ULRTM = 单维对数正态作答时间模型; MLRTM = 多维对数正态作答时间模型; τ = 潜在加工速度


表6呈现了题目参数估计值。对题目时间强度参数而言, 两模型的参数估计结果基本一致, 表明考虑潜在加工速度的多维性并不影响题目时间强度参数的估计。与之相比, MLRTM对题目时间区分度参数的估计值略大于ULRTM的, 即ULRTM会低估log RT的峰度值。

表6   2012年PISA计算机化数学测验数据分析中题目参数估计值

题目ULRTMMLRTM
ξωξω
MSE95% CIMSE95% CIMSE95% CIMSE95% CI
14.4700.020[4.432, 4.508]1.6170.031[1.558, 1.678]4.4690.020[4.433, 4.510]1.8450.045[1.760, 1.936]
24.6300.019[4.592, 4.667]1.6970.032[1.635, 1.762]4.6290.019[4.594, 4.668]1.9760.051[1.874, 2.076]
34.7780.016[4.750, 4.811]2.4230.050[2.327, 2.519]4.7780.015[4.747, 4.807]2.5050.055[2.397, 2.612]
43.8600.018[3.825, 3.895]1.8660.036[1.793, 1.934]3.8590.017[3.825, 3.894]1.9150.038[1.841, 1.991]
54.2580.016[4.226, 4.291]2.1860.044[2.104, 2.274]4.2580.016[4.224, 4.287]2.2020.047[2.112, 2.295]
63.7390.017[3.707, 3.774]2.0310.040[1.958, 2.116]3.7390.017[3.706, 3.771]2.0970.043[2.012, 2.179]
74.1900.016[4.158, 4.220]2.3140.047[2.221, 2.406]4.1890.017[4.156, 4.222]2.5160.063[2.393, 2.638]
84.5220.018[4.487, 4.557]1.8790.036[1.809, 1.950]4.5220.018[4.488, 4.558]2.0910.047[1.995, 2.180]
94.3770.020[4.338, 4.417]1.6000.031[1.533, 1.656]4.3790.021[4.339, 4.420]1.7010.036[1.632, 1.771]
μξ4.3160.202[3.901, 4.701]4.3150.199[3.914, 4.708]
σξ20.3670.217[0.103, 0.751]0.3660.219[0.113, 0.763]

注:ULRTM = 单维对数正态作答时间模型; MLRTM = 多维对数正态作答时间模型; M = 后验均值; SE = 标准误(后验分布标准差); 95% CI = 95%贝叶斯可信区间。

新窗口打开| 下载CSV


4 模拟研究

上文已经通过实证研究阐述了MLRTM的实用性。进一步, 我们使用两则模拟研究来探究新模型的心理计量学性能, 以期进一步验证实证数据分析中所得到的结论。两个模拟研究均基于实证研究情境, 其中, 研究1拟探究(1) MLRTM的参数估计返真性和(2)忽略潜在加工速度多维性所带来的影响。此时, 使用MLRTM作为数据生成模型, 并使用MLRTM和ULRTM进行参数估计。研究2拟探究冗余地指定潜在加工速度具有多维性所带来的影响。此时, 使用ULRTM作为数据生成模型, 并使用MLRTM和ULRTM进行参数估计。

4.1 模拟研究1

4.1.1 数据生成与分析模拟研究1中, 设定30道题目考查4个维度, 对应的Q矩阵呈现在图2中。参考实证研究中的估计值来设定模型参数的真值。对题目参数而言, 时间强度参数依据ξi ~ N (4, 0.25)生成; 而时间区分度参数依据ωi ~ N (2, 0.25)生成。被试量N = 1000, 多维潜在加工速度参数依据四元正态分布生成

$\left( \begin{matrix} {{\tau }_{n1}} \\ {{\tau }_{n2}} \\ {{\tau }_{n3}} \\ {{\tau }_{n4}} \\\end{matrix} \right)\text{ }\!\!\tilde{\ }\!\!\text{ }N\left( \left( \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\\end{matrix} \right),\left( \begin{matrix} 0.25 & {} & {} & {} \\ 0.1\text{5} & 0.2\text{5} & {} & {} \\ 0.1\text{5} & 0.1\text{5} & 0.2\text{5} & {} \\ 0.1\text{5} & 0.\text{15} & 0.1\text{5} & 0.2\text{5} \\\end{matrix} \right) \right)$,

该设定下, ρττ = 0.6。基于MLRTM生成50组RT数据。

图2

图2   模拟研究1中K × I的Q′矩阵

注: 灰色为1、白色为0分别使用MLRTM和ULRTM去拟合生成数据。对于每组数据, 马尔可夫链数、迭代数和预热数等均与实证研究中保持一致。采用bias和RMSE来评估参数估计返真性; 另外, 也计算了各参数估计值与其真值之间的相关系数(Cor)。


4.1.2 结果

图3呈现了题目参数返真性。首先, 整体来看MLRTM的返真性较好。其次, 对时间强度参数而言, 两模型的返真性较为接近。对时间区分度参数而言, MLRTM的返真性要优于ULRTM的返真性, 尤其是对题目内多维题目。明确地说, 对时间区分度参数而言, ULRTM的bias和RMSE在题目间多维题目(题目1 ~ 20)上分别约为-0.30和0.35; 在题目内两维题目(题目21 ~ 28)上约为-0.60和0.65; 在题目内三维题目(题目29 ~ 30)上约为-1.0和1.0。即ULRTM整体会低估题目区分度参数, 这与实证数据分析中的结论相一致; 此外, ULRTM对题目区分度参数的返真性会随着题目所考查的维度数量增加而变差。

图3

图3   模拟研究1中题目参数返真性(题目水平)

注:U = 单维对数正态作答时间模型; M = 多维对数正态作答时间模型; RMSE = 均方根误差.


表7总结了被试参数的返真性。对每一个维度而言, 所有被试的平均绝对bias和平均RMSE均分别约为0.016和0.145, 且所有被试的真值和估计值之间的相关系数也高于0.95。表8呈现了被试参数方差-协方差矩阵的返真性。所有参数的bias和RMSE均接近于0, 返真性很好。

表7   模拟研究1中被试参数返真性的总结

ParameterMA_biasM_RMSECor
τ10.0160.1470.956
τ20.0170.1470.955
τ30.0160.1440.957
τ40.0170.1430.958

注:τ = 潜在加工速度; MA_bias = 所有被试的bias的绝对均值; M_RMSE = 所有被试的RMSE的均值; Cor = 所有被试的真值与估计值之间的相关系数。

新窗口打开| 下载CSV


表8   模拟研究1中被试参数的方差协方差矩阵返真性

Σττ1τ2τ3τ4
τ10.00003 (0.00000)
τ20.00023 (0.00003)0.00069 (-0.00010)
τ30.00031 (0.00004)0.00015 (0.00002)0.00015 (0.00002)
τ40.00015 (0.00002)0.00041 (-0.00006)0.00020 (0.00003)0.00079 (-0.00011)

注:τ = 潜在加工速度; Στ = 多维潜在加工速度的方差-协方差矩阵; 括号内为均方根误差(RMSE); 括号外为bias。

新窗口打开| 下载CSV


总之, 根据模拟研究结果表明MLRTM可以得到较好的参数估计返真性。当数据包含潜在的多维潜在加工速度时, 使用ULRTM会低估时间区分度参数, 而时间强度参数几乎不受影响。

4.2 模拟研究2

4.2.1 数据生成与分析

模拟研究2中, 设定30道题目考查单一维度。同样参考实证研究中的估计值来设定模型参数的真值。对题目参数而言, 时间强度参数依据ξi ~ N (4, 0.25)生成; 而时间区分度参数依据ωi ~ N (2, 0.25)生成。被试量N = 1000, 单维潜在加工速度参数依据τn ~ N (0, 0.25)生成。基于ULRTM生成50组RT数据。同样, 分别使用MLRTM和ULRTM去拟合生成数据; 其中, 使用MLRTM时冗余地将单维潜在结构设定为图3中的多维潜在结构。分析过程与指标等与模拟研究1保持一致。

4.2.2 结果

图4呈现了研究2中题目参数的返真性。对于题目时间强度参数而言, 两模型的参数估计返真性基本一致。而对于题目时间区分度参数而言, MLRTM的返真性略差于ULRTM的。再结合研究1中结果(见表7), 发现冗余地指定潜在加工速度具有多维性所带来的负面影响低于忽略潜在加工速度多维性所带来的。

图4

图4   模拟研究2中题目参数返真性(题目水平)

注:U = 单维对数正态作答时间模型; M = 多维对数正态作答时间模型; RMSE = 均方根误差.


表9呈现了研究2中被试参数返真性。相比而言, MLRTM的返真性略差于ULRTM的。但根据Cor指标可发现即便冗余地把单维结构指定为4个维度, 每个维度的估计值与真值之间仍具有很高的相关系数。同时, 我们计算了MLRTM中4个维度的潜在加工速度的估计值与ULRTM中单维潜在加工速度的估计值之间的相关系数, 分别为ρτ, τ1 = 0.990、ρτ, τ2 = 0.989、ρτ, τ3 = 0.987和ρτ, τ4 = 0.989, 即两模型的潜在加工速度估计值具有很高的一致性。此外, 我们还计算了MLRTM中4个维度的潜在加工速度之间的相关系数, 分别为ρτ1, τ2 = 0.979、ρτ1, τ3 = 0.977、ρτ1, τ4 = 0.981、ρτ2, τ3 = 0.975、ρτ2, τ4 = 0.978和ρτ3, τ4 = 0.977, 即4个维度的估计值之间具有很高的相关性, 表明它们测量/描述的很可能是同一个潜在变量。

表9   模拟研究2中被试参数返真性

分析模型参数MA_biasM_RMSECor
ULRTMτ0.0130.0880.985
MLRTMτ10.0230.1970.974
τ20.0260.2260.973
τ30.0270.2350.971
τ40.0230.1990.974

注:MLRTM中各变量的返真性指标中真值均为单维潜在加工速度的生成值; τ = 潜在加工速度; MA_bias = 所有被试的bias的绝对均值; M_RMSE = 所有被试的RMSE的均值; Cor = 所有被试的真值与估计值之间的相关系数

新窗口打开| 下载CSV


5 总结与展望

为探究并分析多维测验中潜在加工速度的多维性, 本研究提出了MLRTM, 新模型可视为对单维对数正态作答时间模型的多维拓广。随后, 本文以2012年PISA计算机化数学测验中RT数据为例, 通过探索性因素分析发现RT数据背后的多维潜在结构(即潜在加工速度的多维结构)与多维潜在能力的理论结构(即专家界定的Q矩阵)相匹配, 验证了本研究的基本假设:在多维测验中, 潜在加工速度具有与潜在能力相匹配的多维结构。然后, 采用新模型对该数据做进一步分析, 并与ULRTM的分析结果进行对比, 结果表明在多维测验中考虑潜在加工速度的多维性是适合且必要的。最后, 通过两则模拟研究探究了新模型的心理计量学性能, 模拟研究1结果表明: (1)贝叶斯MCMC算法能够为MLRTM提供较好的参数估计返真性; (2)忽略潜在加工速度的多维性对题目强度参数几乎无影响, 但会大幅低估时间区分度参数, 且返真性会随着题目所考查的维度数量增加而变差。模拟研究2结果表明: (1)冗余地指定潜在加工速度具有多维性对题目强度参数几乎无影响, 但会低估时间区分度参数; (2)当冗余地指定潜在加工速度具有多维性时, 基于MLRTM的多维潜在加工速度估计值之间具有很高程度相关。此外, 结合模拟研究1和2的结果, 可发现: (1)冗余地指定潜在加工速度具有多维性所带来的负面影响低于忽略其多维性所带来的; (2)当潜在加工速度具有多维潜在结构时(即MLRTM为数据生成模型), 使用ULRTM会低估时间区分度参数; 而当潜在加工速度为单维结构时(即ULRTM为数据生成模型), 使用MLRTM也会低估时间区分度参数。因此, 对时间区分度参数而言, 当ULRTM的估计值小于MLRTM的时, 可推断潜在加工速度具有多维结构; 反之, 当ULRTM的估计值大于MLRTM的时, 可推断潜在加工速度具有单维结构。而实证研究中, ULRTM对时间区分度参数的估计值小于MLRTM的, 可推断实证研究中的潜在加工速度具有多维结构。

当然, 尽管该研究得到了较好的结果, 但由于能力和精力有限, 本研究仍有一些局限性值得后续做进一步探究。首先, MLRTM是对经典的ULRTM的多维扩展。由于对RT进行对数变换后仍有可能违反正态性假设, 因此可尝试对本文所提出的MLRTM做进一步拓展, 例如Box-Cox变换(Klein Entink et al., 2009)、线性变换(Wang et al., 2013)以及Log-Skew-Normal变换(孟祥斌, 2016)等。其次, 本研究提出的MLRTM为补偿模型, 即假设多维潜在加工速度之间是相互补偿的。在题目内多维测验中, 若被试在某一维度中的潜在加工速度较慢, 则可以通过在另一维度中的潜在加工速度来弥补。而至于潜在加工速度之间是否存在非补偿(或部分补偿)关系也值得今后做进一步探讨并开发相应的模型。再次, 限于研究议题, 本研究仅分析了RT数据, 而没有同时对RA和RT数据进行联合分析。鉴于RA和RT数据同时包含被试和题目的信息, 今后可基于贝叶斯层级建模框架, 尝试建构可同时分析多维潜在能力和多维潜在加工速度的多维联合模型(Zhan, Jiao, Wang, & Man, 2018); 另外, MLRTM是基于题目内多维度提出的, 可同时处理题目内多维和题目间多维测验情境。但因为实证数据仅涉及题目间多维, 所以从更严谨的角度看, 实证研究结果仅为“潜在加工速度具有与潜在能力相匹配的题目间多维结构”提供证据。因此, 尚缺乏证据表明“潜在加工速度具有与潜在能力相匹配的题目内多维结构”, 有待后续研究进行补充。再另外, 实证研究中的题目数量较少, 可能会影响参数估计的精度和结论的准确性。因此, 所得结论的普适性仍有待在更多的实证研究中进行验证。最后, 本研究采用了相对简单的模拟研究来探究MLRTM的心理计量学性能, 主要目的在于进一步支持实证研究中的结论。尽管研究结果表明新模型的参数估计返真性较好且为实证数据分析结果提供了支撑(例如, 忽略潜在加工速度的多维性对题目时间强度参数无影响, 但会低估题目时间区分度参数), 但未来仍可考虑增加模拟研究中的自变量(条件), 进而在更复杂、丰富的情境下探究新模型的心理计量学性能, 为后续实证研究提供更丰富的理论参考。

附录:MLRTM中各待估计参数的先验分布设定

对于MLRTM, 首先, 根据条件独立性假设,

$\log {{T}_{ni}}\tilde{\ }N\left( {{\xi }_{i}}-\sum\limits_{k=1}^{K}{{{q}_{ik}}{{\tau }_{nk}}},\text{ }\omega _{i}^{-2} \right)$.

其中, 多维潜在加工速度向量的先验分布为:

${{\tau }_{n}}\tilde{\ }N(\mathbf{0},{{\Sigma }_{\tau }})$,

其中, 方差-协方差矩阵的超先验为:

$Στ ~ InvWishart(R, K)$,

其中, R为K维对角矩阵。

对题目参数而言,

${{\xi }_{i}}\tilde{\ }N({{\mu }_{\xi }},\sigma _{\xi }^{2})$,

其中, 均值和方差的超先验为:

${{\mu }_{\xi }}\tilde{\ }N(4.3,\ 2)$和$\sigma _{\xi }^{2}\text{ }\!\!\tilde{\ }\!\!\text{ InvGamma}(1,\ 1)$.

Zhan等(2018)的研究表明, 对于2012年PISA计算机化数学测验数据中所有被试在所有题目上的平均log RT约为4.301, 因此我们将μξ的均值设定4.3。另外, $\omega _{i}^{-2}\tilde{\ } \text{InvGamma}(1,\ 1)$.

参考文献

Adams, R. J., Wilson, M., & Wang, W. (1997).

The multidimensional random coefficients multinomial logit model

Applied Psychological Measurement, 21(1), 1-23.

DOI:10.1177/0146621697211001      URL     [本文引用: 1]

Bolsinova, M., & Tijmstra, J. (2018).

Improving precision of ability estimation: Getting more from response times

British Journal of Mathematical and Statistical Psychology, 71(1), 13-38.

DOI:10.1111/bmsp.12104      URL     PMID:28635139      [本文引用: 1]

By considering information about response time (RT) in addition to response accuracy (RA), joint models for RA and RT such as the hierarchical model (van der Linden, 2007) can improve the precision with which ability is estimated over models that only consider RA. The hierarchical model, however, assumes that only the person's speed is informative of ability. This assumption of conditional independence between RT and ability given speed may be violated in practice, and ignores collateral information about ability that may be present in the residual RTs. We propose a posterior predictive check for evaluating the assumption of conditional independence between RT and ability given speed. Furthermore, we propose an extension of the hierarchical model that contains cross-loadings between ability and RT, which enables one to take additional collateral information about ability into account beyond what is possible in the standard hierarchical model. A Bayesian estimation procedure is proposed for the model. Using simulation studies, the performance of the model is evaluated in terms of parameter recovery, and the possible gain in precision over the standard hierarchical model and an RA-only model is considered. The model is applied to data from a high-stakes educational test.

Curran, P. J., & Bauer, D J. (2011).

The disaggregation of within-person and between-person effects in longitudinal models of change

Annual Review of Psychology, 62, 583-619.

DOI:10.1146/annurev.psych.093008.100356      URL     [本文引用: 1]

Longitudinal models are becoming increasingly prevalent in the behavioral sciences, with key advantages including increased power, more comprehensive measurement, and establishment of temporal precedence. One particularly salient strength offered by longitudinal data is the ability to disaggregate between-person and within-person effects in the regression of an outcome on a time-varying covariate. However, the ability to disaggregate these effects has not been fully capitalized upon in many social science research applications. Two likely reasons for this omission are the general lack of discussion of disaggregating effects in the substantive literature and the need to overcome several remaining analytic challenges that limit existing quantitative methods used to isolate these effects in practice. This review explores both substantive and quantitative issues related to the disaggregation of effects over time, with a particular emphasis placed on the multilevel model. Existing analytic methods are reviewed, a general approach to the problem is proposed, and both the existing and proposed methods are demonstrated using several artificial data sets. Potential limitations and directions for future research are discussed, and recommendations for the disaggregation of effects in practice are offered.

de Boeck, P., & Jeon, M. (2019).

An overview of models for response times and processes in cognitive tests

Frontiers in Psychology, 10, 102.

DOI:10.3389/fpsyg.2019.00102      URL     PMID:30787891      [本文引用: 1]

Response times (RTs) are a natural kind of data to investigate cognitive processes underlying cognitive test performance. We give an overview of modeling approaches and of findings obtained with these approaches. Four types of models are discussed: response time models (RT as the sole dependent variable), joint models (RT together with other variables as dependent variable), local dependency models (with remaining dependencies between RT and accuracy), and response time as covariate models (RT as independent variable). The evidence from these approaches is often not very informative about the specific kind of processes (other than problem solving, information accumulation, and rapid guessing), but the findings do suggest dual processing: automated processing (e.g., knowledge retrieval) vs. controlled processing (e.g., sequential reasoning steps), and alternative explanations for the same results exist. While it seems well-possible to differentiate rapid guessing from normal problem solving (which can be based on automated or controlled processing), further decompositions of response times are rarely made, although possible based on some of model approaches.

Ferrando, P. J., & Lorenzo-Seva, U. (2007).

A measurement model for Likert responses that incorporates response time

Multivariate Behavioral Research, 42(4), 675-706.

DOI:10.1080/00273170701710247      URL     [本文引用: 1]

Fox, J.-P. & Marianti, S. (2017).

Person-fit statistics for joint models for accuracy and speed

Journal of Educational Measurement, 54(2), 243-262.

DOI:10.1111/jedm.12143      URL     [本文引用: 1]

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013).

Bayesian data analysis

New York: Chapman & Hall.

[本文引用: 1]

Goudie, R. J., Turner, R. M., de Angelis, D., & Thomas, A. (2017).

MultiBUGS: A parallel implementation of the BUGS modelling framework for faster Bayesian inference

arXiv Preprint arXiv:1704.03216.

[本文引用: 1]

Guo, L. Shang, P., & Xia, L. (2017).

Advantages and illustrations of application of response time model in psychological and educational testing

Advances in Psychological Science, 25(4), 701-712.

DOI:10.3724/SP.J.1042.2017.00701      URL     [本文引用: 1]

[ 郭磊, 尚鹏丽, 夏凌翔. (2017).

心理与教育测验中反应时模型应用的优势与举例

心理科学进展, 25(4), 701-712.]

[本文引用: 1]

Guo, X., Luo, Z., & Yu, X. (2020).

A speed-accuracy tradeoff hierarchical model based on cognitive experiment

Frontiers in Psychology, 10, 2910.

DOI:10.3389/fpsyg.2019.02910      URL     PMID:31969855      [本文引用: 1]

Most tests are administered within an allocated time. Due to the time limit, examinees might have different trade-offs on different items. In educational testing, the traditional hierarchical model cannot adequately account for the tradeoffs between response time and accuracy. Because of this, some joint models were developed as an extension of the traditional hierarchical model based on covariance. However, they cannot directly reflect the dynamic relationship between response time and accuracy. In contrast, response moderation models took the residual response time as the independent variable of the response model. Nevertheless, the models enlarge the time effect. Alternatively, the speed-accuracy tradeoff (SAT) model is superior to other experimental models in the SAT experiment. Therefore, this paper incorporates the SAT model with the traditional hierarchical model to establish a SAT hierarchical model. The results demonstrated that the Bayesian Markov chain Monte Carlo (MCMC) algorithm performed well in the SAT hierarchical model of parameters by using simulation. Finally, the deviance information criterion (DIC) more preferred the SAT hierarchical model than other models in empirical data. This means that it is indispensable to add the effect of response time on accuracy, but likewise should limit the effect on the empirical data.

Horwitz, B., Tagamets, M. A., & McIntosh, A. R. (1999).

Neural modeling, functional brain imaging, and cognition

Trends in Cognitive Sciences, 3(3), 91-98.

DOI:10.1016/s1364-6613(99)01282-6      URL     PMID:10322460      [本文引用: 1]

The richness and complexity of data sets acquired from PET or fMRI studies of human cognition have not been exploited until recently by computational neural-modeling methods. In this article, two neural-modeling approaches for use with functional brain imaging data are described. One, which uses structural equation modeling, estimates the functional strengths of the anatomical connections between various brain regions during specific cognitive tasks. The second employs large-scale neural modeling to relate functional neuroimaging signals in multiple, interconnected brain regions to the underlying neurobiological time-varying activities in each region. Delayed match-to-sample (visual working memory for form) tasks are used to illustrate these models.

Hu, L. T., & Bentler, P. M. (1999).

Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives

Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.

DOI:10.1080/10705519909540118      URL     [本文引用: 1]

Klein Entink, R. H., van der Linden, W. J., & Fox, J.-P. (2009).

A Box-Cox normal model for response times

British Journal of Mathematical and Statistical Psychology, 62(3), 621-640.

DOI:10.1348/000711008X374126      URL     [本文引用: 2]

Lu, J., Wang, C., Zhang, J., & Tao, J. (2019).

A mixture model for responses and response times with a higher-order ability structure to detect rapid guessing behaviour

British Journal of Mathematical and Statistical Psychology. Online First, https://doi.org/10.1111/bmsp.12175

DOI:10.1111/bmsp.12192      URL     PMID:31705539      [本文引用: 1]

Deterministic blockmodelling is a well-established clustering method for both exploratory and confirmatory social network analysis seeking partitions of a set of actors so that actors within each cluster are similar with respect to their patterns of ties to other actors (or, in some cases, other objects when considering two-mode networks). Even though some of the historical foundations for certain types of blockmodelling stem from the psychological literature, applications of deterministic blockmodelling in psychological research are relatively rare. This scarcity is potentially attributable to three factors: a general unfamiliarity with relevant blockmodelling methods and applications; a lack of awareness of the value of partitioning network data for understanding group structures and processes; and the unavailability of such methods on software platforms familiar to most psychological researchers. To tackle the first two items, we provide a tutorial presenting a general framework for blockmodelling and describe two of the most important types of deterministic blockmodelling applications relevant to psychological research: structural balance partitioning and two-mode partitioning based on structural equivalence. To address the third problem, we developed a suite of software programs that are available as both Fortran executable files and compiled Fortran dynamic-link libraries that can be implemented in the R software system. We demonstrate these software programs using networks from the literature.

Man, K., Harring, J. R., Jiao, H., & Zhan, P. (2019).

Joint modeling of compensatory multidimensional item responses and response times

Applied Psychological Measurement, 43(8), 639-654.

DOI:10.1177/0146621618824853      URL     PMID:31551641      [本文引用: 3]

Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers' latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive attributes and the working speed of test-takers, this research proposes a joint-modeling approach that integrates compensatory multidimensional latent traits and response speediness using item responses and RTs. The joint model is cast as a multilevel model in which the structural relation between working speed and accuracy are connected through their variance-covariance structures. The feasibility of this modeling approach is investigated via a Monte Carlo simulation study using a Bayesian estimation scheme. The results indicate that integrating RTs increased model parameter recovery and precision. In addition, Program of International Student Assessment (PISA) 2015 mathematics standard unit items are analyzed to further evaluate the feasibility of the approach to recover model parameters.

Meng, X.-B. (2016).

A log-skew-normal model for item response times

Journal of Psychological Science, 39, 727-734.

[本文引用: 2]

[ 孟祥斌. (2016).

项目反应时间的对数偏正态模型

心理科学, 39(3), 727-734.]

[本文引用: 2]

Mesulam, M. M. (1990).

Large‐scale neurocognitive networks and distributed processing for attention, language, and memory

Annals of Neurology, 28(5), 597-613.

DOI:10.1002/ana.410280502      URL     PMID:2260847      [本文引用: 1]

Cognition and comportment are subserved by interconnected neural networks that allow high-level computational architectures including parallel distributed processing. Cognitive problems are not resolved by a sequential and hierarchical progression toward predetermined goals but instead by a simultaneous and interactive consideration of multiple possibilities and constraints until a satisfactory fit is achieved. The resultant texture of mental activity is characterized by almost infinite richness and flexibility. According to this model, complex behavior is mapped at the level of multifocal neural systems rather than specific anatomical sites, giving rise to brain-behavior relationships that are both localized and distributed. Each network contains anatomically addressed channels for transferring information content and chemically addressed pathways for modulating behavioral tone. This approach provides a blueprint for reexploring the neurological foundations of attention, language, memory, and frontal lobe function.

Muthén, L. K., & Muthén, B. (2019). Mplus: The comprehensive modeling program for applied researchers: User’s guide, 5.

[本文引用: 1]

Ntzoufras, I. (2009).

Bayesian modeling using WinBUGS

Manhattan: John Wiley & Sons.

[本文引用: 1]

OECD, (2013).

PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy

OECD Publishing. http://dx.doi.org/ 10.1787/9789264190511-en

URL     [本文引用: 1]

Reckase, M. D. (2009).

Multidimensional item response theory

New York: Springer.

[本文引用: 1]

Steiger, J. H. (1990).

Structural model evaluation and modification: An interval estimation approach

Multivariate Behavioral Research, 25(2), 173-180.

DOI:10.1207/s15327906mbr2502_4      URL     PMID:26794479      [本文引用: 1]

Tatsuoka, K. K. (1983).

Rule Space: An approach for dealing with misconceptions based on item response theory

Journal of Educational Measurement, 20(4), 345-354.

DOI:10.1111/jedm.1983.20.issue-4      URL     [本文引用: 1]

van der Linden, W. J. (2006).

A lognormal model for response times on test items

Journal of Educational and Behavioral Statistics, 31(2), 181-204. http://dx.doi.org/10.3102/ 10769986031002181

DOI:10.3102/10769986031002181      URL     [本文引用: 4]

van der Linden, W. J. (2007).

A hierarchical framework for modeling speed and accuracy on test items

Psychometrika, 72, 287-308. http://dx.doi.org/10.1007/s11336-006-1478-z

DOI:10.1007/s11336-006-1478-z      URL     [本文引用: 2]

Current modeling of response times on test items has been strongly influenced by the paradigm of experimental reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Also, several response-time models seem to be unclear as to the level of parametrization they represent. A hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models. The framework allows a “plug-and-play approach” with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for their parameters with Gibbs sampling from the joint posterior distribution.]]>

van der Linden, W. J. (2009).

Conceptual issues in response- time modeling

Journal of Educational Measurement, 46(3), 247-272. http://dx.doi.org/10.1111/j.1745-3984.2009.00080.x

DOI:10.1111/jedm.2009.46.issue-3      URL     [本文引用: 3]

van der Linden, W. J. (2011).

Test design and speededness

Journal of Educational Measurement, 48(1), 44-60.

DOI:10.1111/jedm.2011.48.issue-1      URL     [本文引用: 1]

van der Linden, W. J., Klein Entink, R., & Fox, J.-P. (2010).

IRT parameter estimation with response times as collateral information

Applied Psychological Measurement, 34(5), 327-347.

DOI:10.1177/0146621609349800      URL     [本文引用: 1]

Wang, C., Chang, H. H., & Douglas, J. A. (2013).

The linear transformation model with frailties for the analysis of item response times

British Journal of Mathematical and Statistical Psychology, 66(1), 144-168.

DOI:10.1111/j.2044-8317.2012.02045.x      URL     PMID:22506914      [本文引用: 2]

The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees' latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box-Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested.

Wang, C., Weiss, D. J., & Su, S. (2019).

Modeling response time and responses in multidimensional health measurement

Frontiers in Psychology, 10, 51.

DOI:10.3389/fpsyg.2019.00051      URL     PMID:30761036      [本文引用: 2]

This study explored calibrating a large item bank for use in multidimensional health measurement with computerized adaptive testing, using both item responses and response time (RT) information. The Activity Measure for Post-Acute Care is a patient-reported outcomes measure comprised of three correlated scales (Applied Cognition, Daily Activities, and Mobility). All items from each scale are Likert type, so that a respondent chooses a response from an ordered set of four response options. The most appropriate item response theory model for analyzing and scoring these items is the multidimensional graded response model (MGRM). During the field testing of the items, an interviewer read each item to a patient and recorded, on a tablet computer, the patient's responses and the software recorded RTs. Due to the large item bank with over 300 items, data collection was conducted in four batches with a common set of anchor items to link the scale. van der Linden's (2007) hierarchical modeling framework was adopted. Several models, with or without interviewer as a covariate and with or without interaction between interviewer and items, were compared for each batch of data. It was found that the model with the interaction between interviewer and item, when the interaction effect was constrained to be proportional, fit the data best. Therefore, the final hierarchical model with a lognormal model for RT and the MGRM for response data was fitted to all batches of data via a concurrent calibration. Evaluation of parameter estimates revealed that (1) adding response time information did not affect the item parameter estimates and their standard errors significantly; (2) adding response time information helped reduce the standard error of patients' multidimensional latent trait estimates, but adding interviewer as a covariate did not result in further improvement. Implications of the findings for follow up adaptive test delivery design are discussed.

Wang, C., & Xu, G. (2015).

A mixture hierarchical model for response times and response accuracy

British Journal of Mathematical and Statistical Psychology, 68(3), 456-477.

DOI:10.1111/bmsp.12054      URL     PMID:25873487      [本文引用: 1]

In real testing, examinees may manifest different types of test-taking behaviours. In this paper we focus on two types that appear to be among the more frequently occurring behaviours - solution behaviour and rapid guessing behaviour. Rapid guessing usually happens in high-stakes tests when there is insufficient time, and in low-stakes tests when there is lack of effort. These two qualitatively different test-taking behaviours, if ignored, will lead to violation of the local independence assumption and, as a result, yield biased item/person parameter estimation. We propose a mixture hierarchical model to account for differences among item responses and response time patterns arising from these two behaviours. The model is also able to identify the specific behaviour an examinee engages in when answering an item. A Monte Carlo expectation maximization algorithm is proposed for model calibration. A simulation study shows that the new model yields more accurate item and person parameter estimates than a non-mixture model when the data indeed come from two types of behaviour. The model also fits real, high-stakes test data better than a non-mixture model, and therefore the new model can better identify the underlying test-taking behaviour an examinee engages in on a certain item.

Wang, S., Zhang, S., Douglas, J., & Culpepper, S. (2018).

Using response times to assess learning progress: A joint model for responses and response times

Measurement: Interdisciplinary Research and Perspectives, 16(1), 45-58.

DOI:10.1080/15366367.2018.1435105      URL     [本文引用: 1]

Wang, T., & Hanson, B. A. (2005).

Development and calibration of an item response model that incorporates response time

Applied Psychological Measurement, 29(5), 323-339.

DOI:10.1177/0146621605275984      URL     [本文引用: 1]

Zhan, P. (2019).

Joint modeling for response times and response accuracy in computer-based multidimensional assessments

Journal of Psychological Science, 42, 170-178.

[本文引用: 4]

[ 詹沛达. (2019).

计算机化多维测验中作答时间和作答精度数据的联合分析

心理科学, 42, 170-178.]

[本文引用: 4]

Zhan, P., Jiao, H., & Liao, D. (2018).

Cognitive diagnosis modelling incorporating item response times

British Journal of Mathematical and Statistical Psychology, 71(2), 262-286.

DOI:10.1111/bmsp.12114      URL     PMID:28872185      [本文引用: 2]

To provide more refined diagnostic feedback with collateral information in item response times (RTs), this study proposed joint modelling of attributes and response speed using item responses and RTs simultaneously for cognitive diagnosis. For illustration, an extended deterministic input, noisy 'and' gate (DINA) model was proposed for joint modelling of responses and RTs. Model parameter estimation was explored using the Bayesian Markov chain Monte Carlo (MCMC) method. The PISA 2012 computer-based mathematics data were analysed first. These real data estimates were treated as true values in a subsequent simulation study. A follow-up simulation study with ideal testing conditions was conducted as well to further evaluate model parameter recovery. The results indicated that model parameters could be well recovered using the MCMC approach. Further, incorporating RTs into the DINA model would improve attribute and profile correct classification rates and result in more accurate and precise estimation of the model parameters.

Zhan, P., Jiao, H., Wang, W.-C., and Man, K. (2018). A multidimensional hierarchical framework for modeling speed and ability in computer-based multidimensional tests. arXiv:1807.04003. Available online at: https://arxiv.org/abs/ 1807.04003

URL     [本文引用: 5]

/


版权所有 © 《心理学报》编辑部
地址:北京市朝阳区林萃路16号院 
邮编:100101 
电话:010-64850861 
E-mail:xuebao@psych.ac.cn
备案编号:京ICP备10049795号-1 京公网安备110402500018号

本系统由北京玛格泰克科技发展有限公司设计开发