ISSN 0439-755X
CN 11-1911/B

心理学报 ›› 2022, Vol. 54 ›› Issue (11): 1416-1423.doi: 10.3724/SP.J.1041.2022.01416

• 研究报告 • 上一篇    



  1. 1浙江师范大学教师教育学院心理学系
    2浙江省智能教育技术与应用重点实验室, 金华 321004
  • 收稿日期:2021-06-10 发布日期:2022-09-08 出版日期:2022-11-25
  • 通讯作者: 詹沛达
  • 基金资助:

Joint-cross-loading multimodal cognitive diagnostic modeling incorporating visual fixation counts

ZHAN Peida1,2()   

  1. 1Department of Psychology, College of Teacher Education, Zhejiang Normal University
    2Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua 321004, China
  • Received:2021-06-10 Online:2022-09-08 Published:2022-11-25
  • Contact: ZHAN Peida


多模态数据为实现对认知结构的精准诊断及其他认知特征(如, 认知风格)的全面反馈提供了可能性。为实现对题目作答精度、作答时间(RT)和视觉注视点数(FC)的联合分析, 本文基于联合-交叉负载建模法提出3个多模态认知诊断模型。实证研究及模拟研究结果表明: (1)联合分析比分离分析更适用于多模态数据; (2)新模型可直接利用RT和FC中信息提高潜在能力或潜在属性的估计准确性; (3)新模型的参数估计返真性较好; (4)忽略交叉负载所导致的负面结果比冗余考虑交叉负载所导致的更严重。

关键词: 认知诊断, 多模态数据, 题目作答时间, 注视点, 认知风格, 眼动


Students’ observed behavior (e.g., learning behavior and problem-solving behavior) comprises of activities that represent complicated cognitive processes and latent conceptions that are frequently systematically related to one another. Cognitive characteristics such as cognitive styles and fluency may differ between students with the same cognitive/knowledge structure. However, practically all cognitive diagnosis models (CDMs) that merely assess item response accuracy (RA) data are currently incapable of estimating or inferring individual differences in cognitive traits. With advances in technology-enhanced assessments, it is now possible to capture multimodal data, such as outcome data (e.g., response accuracy), process data (e.g., response times (RTs), and biometric data (e.g., visual fixation counts (FCs)), automatically and simultaneously during the problem-solving activity. Multimodal data allows for precise cognitive structure diagnosis as well as comprehensive feedback on various cognitive characteristics.

First, using joint analysis of RA, RT, and FC data as an example, this study elaborated three multimodal data analysis methods and models, including separate modeling (whose model is denoted as S-MCDM), joint- hierarchical modeling (whose model is denoted as H-MCDM) (Zhan et al., 2022), and joint-cross-loading modeling (whose model is denoted as C-MCDM). Following that, three C-MCDMs with distinct hypotheses were presented based on joint-cross-loading modeling, namely, the C-MCDM-θ, C-MCDM-D, and C-MCDM-C, respectively. Three C-MCDMs, in comparison to the H-MCDM, introduce two item-level weight parameters (i.e., φi and λi) into the RT and FC measurement models, respectively, to quantify the impact of latent ability or latent attributes on RT and FC. The Markov Chain Monte Carlo method was used to estimate model parameters using a full Bayesian approach. To illustrate the three proposed models’ application and compare them to the S-MCDM and H-MCDM, multimodal data for a real-world mathematics test was used. Data was gathered at a prominent university on the East Coast of the United States in an eye-tracking lab. An I = 10 mathematics items test was given to N = 93 university students with normal or corrected vision. The test included K = 4 attributes, and the related Q-matrix is shown in Figure 3. The data is divided into three modalities: RA, RT, and FC, which were all collected at the same time. The data was fitted to all five multimodal models.

In addition, two simulation studies were conducted further to explore the psychometric performance of the proposed models. The purpose of simulation study 1 was to explore whether the parameter estimates of the proposed models can converge effectively and explore the recovery of parameter estimation under different simulated test situations. The purpose of simulation study 2 was to explore the relative merits of C-MCDMs and H-MCDM, that is, to explore the necessity of considering cross-loading in multimodal data analysis.

The results of the empirical study showed that (1) the C-MCDM-θ has the best model-data fitting, followed by the H-MCDM and the S-MCDM. Although the DIC showed that the C-MCDM-D and C-MCDM-C also fitted the data well, the results were only for reference because some parameter estimates in these two models did not converge; that (2) the correlation coefficients between latent ability and latent processing speed and that between latent ability and latent concentration were weak, making it difficult to fully exploit the theoretical advantages of H-MCDM over S-MCDM (Ranger, 2013). By contrast, since the C-MCDM-θ can directly utilize the information from RT and FC data, the standard error of the estimates of its latent ability was significantly lower than that of the previous two competing models; and that (3) the median of the estimates of φi was less than 0, which indicated that for most items, the higher the participant’s latent ability is, the longer the time it will take to solve the items; and the median of the estimates of λi was higher than 0, which indicated that for most items, the higher the participant’s latent ability is, the more number of fixation counts he/she shown in problem-solving. Furthermore, it should be noted that the estimates of φi and λi do not always have the same sign for different items, indicating that the influence of latent abilities on RT and FC has different directions (i.e., facilitation or inhibition) for different items. Furthermore, simulation study 1 indicated that the parameter estimation of the proposed three models could converge effectively and the recovery of model parameters was good under different simulated test situations. The results of simulation study 2 indicated that the adverse effects of ignoring the possible cross- loadings are more severe than redundantly considering the cross-loadings.

Overall, the results of this study indicate that (1) fusion analysis is more suitable for multimodal data that provides parallel information than separate analysis; that (2) through cross-loading, the proposed models can directly use information from RT and FC data to improve the parameter estimation accuracy of latent ability or latent attributes; that (3) the results of the proposed models can be used to diagnose cognitive structure and infer other cognitive characteristics such as cognitive styles and fluency; and that (4) the proposed models have better compatibility with different test situations than H-MCDM.

Key words: cognitive diagnosis, multimodal data, item response times, fixation counts, cognitive style, eye-tracking