Please wait a minute...
Acta Psychologica Sinica    2019, Vol. 51 Issue (9) : 1057-1067     DOI: 10.3724/SP.J.1041.2019.01057
Reports of Empirical Studies |
Make adaptive testing know examinees better: The item selection strategies based on recommender systems
WANG Pujue1,LIU Hongyun1,2()
1. Faculty of Psychology, Beijing Normal University
2. Beijing Key Laboratory of Applied Experimental Psychology, Faculty of Psychology, Beijing Normal University, Beijing, 100875, China
Download: PDF(799 KB)   HTML Review File (1 KB) 
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks     Supporting Info
Guide   
Abstract  

Better CAT item selection strategies may be designed by making better use of information from previous examinees’ responses. The past examinees’ data serve as a valuable reference for selecting items more accurately and evenly for new examinees. However, most of the existing strategies proposed under the theoretical framework of IRT only use information from the current examinee and fail to take full advantage of past examinees’ data. A collaborative filtering recommender approach from the recommender system literature is able to find items that best match one’s preference by utilizing information from others, which shares the similar goal as the item selection strategy of CAT. Therefore, the present study adapted the underlying assumptions of collaborative filtering recommender and proposed new item selection strategies which take advantage of past examinees’ data, and then investigated the potential factors that might affect the performance of new strategies.

In light of user-based collaborative filtering, we defined similar examinees as a group of examinees who uniformly answered the same items, and proposed two strategies, Direct Examinee-Based Recommender (DEBR) and Indirect Examinee-Based Recommender (IEBR). Two simulation studies were conducted to examine the measurement accuracy and item exposure control of new strategies under different conditions. In study 1, a simulated item bank was considered. The recommender-based strategies used two different types of past examinees’ data generated by FMI and BAS, respectively, to select items under two fixed-length CATs. In study 2, a real item bank was used to test new strategies under a more realistic setting. The effect of combining two batches of past examinees’ data from different recommender-based strategies was also investigated.

In both studies, when using past examinees’ data with high accuracy but poor item exposure control (generated by FMI), the recommender-based strategies greatly remedied unbalanced item utilization with an acceptable loss of accuracy. When using past examinees’ data with better tradeoff of measurement precision and test security (generated by BAS), the recommender-based strategies kept the accuracy at the same level and further improved item exposure control. More specifically, DEBR focused on maintaining the accuracy and had lower measurement error than IEBR; IEBR was good at improving the control of item exposure and made better use of the whole item bank than all the other strategies. These features of two recommender-based strategies were stable and consistent under different item banks and different length of CATs. The extent to which DEBR and IEBR demonstrated their features was influenced by the quality of item bank, test length, number of past examinees and strategy used to generate data.

In general, this research successfully combined the recommender systems with CAT item selection methods to establish a new flexible framework, which is an unprecedented innovation upon the traditional item selection strategies. This research also provided empirical evidence for the value of past examinees’ data and the recommender system approach as a feasible alternative option for selecting items in CAT. Finally, suggestions for future studies were provided regarding investigating the proposed new strategies in various situations and upgrading recommender-based strategies for more CAT conditions, including finding diverse measures of similarities between examinees or items and employing more complex algorithms of recommender system to meet the demands of large-scale tests.

Keywords selection strategy      past examinees’ data      recommender system      collaborative filtering recommender      simulation study     
ZTFLH:  B841  
Corresponding Authors: Hongyun LIU     E-mail: hyliu@bnu.edu.cn
Issue Date: 24 July 2019
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Pujue WANG
Hongyun LIU
Cite this article:   
Pujue WANG,Hongyun LIU. Make adaptive testing know examinees better: The item selection strategies based on recommender systems[J]. Acta Psychologica Sinica, 2019, 51(9): 1057-1067.
URL:  
http://journal.psych.ac.cn/xlxb/EN/10.3724/SP.J.1041.2019.01057     OR     http://journal.psych.ac.cn/xlxb/EN/Y2019/V51/I9/1057
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
定长20道题目
随机选题 0.323 0.449 0.829 2.595 5.56% 0 0
FMI 0.090 0.234 0.954 127.852 40.80% 315 41
DEBR (FMI) 0.141 0.291 0.930 66.341 21.83% 22 29 14.12%
IEBR (FMI) 0.242 0.383 0.872 8.712 7.09% 1 2 2.53%
BAS 0.224 0.370 0.882 14.164 9.00% 46 6
DEBR (BAS) 0.217 0.365 0.884 11.246 8.25% 44 4 4.25%
IEBR (BAS) 0.222 0.369 0.882 11.187 8.15% 42 4 4.66%
定长40道题目
随机选题 0.198 0.354 0.890 4.572 11.05% 0 0
FMI 0.052 0.178 0.974 118.335 45.72% 240 80
DEBR (FMI) 0.089 0.228 0.956 95.045 34.38% 37 78 19.77%
IEBR (FMI) 0.126 0.277 0.937 7.571 11.80% 0 15 5.19%
BAS 0.126 0.278 0.932 18.962 15.03% 14 36
DEBR (BAS) 0.125 0.276 0.933 15.930 14.27% 13 27 6.98%
IEBR (BAS) 0.128 0.280 0.931 12.012 13.25% 14 17 7.22%
  
选题策略 均方误差 平均绝对误差 能力估计相关 卡方值 测验重叠率 曝光不足 曝光过度 答题者调用率
随机选题 0.320 0.440 0.830 2.551 8.02% 0 0
FMI 0.152 0.307 0.922 150.511 58.48% 214 33
DEBR (FMI) 0.190 0.341 0.901 101.793 40.81% 53 38 25.04%
DEBR (FMI+DEBR) 0.233 0.380 0.875 47.426 21.10% 29 35 12.69%
IEBR (FMI) 0.265 0.408 0.855 43.395 19.63% 0 24 5.24%
IEBR (FMI+IEBR) 0.274 0.414 0.852 11.830 8.19% 0 0 2.86%
BAS 0.259 0.404 0.861 42.965 19.48% 20 27
DEBR (BAS) 0.253 0.395 0.869 43.449 19.65% 12 33 9.75%
DEBR (BAS+DEBR) 0.262 0.403 0.865 39.684 18.29% 13 26 9.51%
IEBR (BAS) 0.266 0.408 0.858 37.491 17.49% 17 24 9.96%
IEBR (BAS+IEBR) 0.267 0.407 0.855 25.305 13.07% 8 18 5.13%
  
  
  
1 Akbay L.., & Kaplan M. , ( 2017). Transition to multidimensional and cognitive diagnosis adaptive testing: An overview of cat. The Online Journal of New Horizons in Education-January.7( 1), 206-214.
2 Barrada J. R., Olea J., Ponsoda V., & Abad F. J . ( 2010). A method for the comparison of item selection rules in computerized adaptive testing. Applied Psychological Measurement.34( 6), 438-452.
3 Chang H.H . ( 2015). Psychometrics behind computerized adaptive testing. Psychometrika.80( 1), 1-20.
4 Chang H. H., Qian J. H., & Ying Z. L . ( 2001). a-stratified multistage computerized adaptive testing with b blocking. Applied Psychological Measurement.25( 4), 333-341.
5 , Chang H.H., & Ying Z.L . ( 1999). a-stratified multistage computerized adaptive testing. Applied Psychological Measurement.23( 3), 211-222.
6 Chen S. Y., Ankenmann R. D., & Spray J. A . ( 2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement.40( 2), 129-145.
7 Chen Y., Li X., Liu J., & Ying Z . ( 2018). Recommendation system for adaptive learning. Applied psychological measurement.42( 1), 24-41.
8 Cheng Y., Patton J. M., & Shao C . ( 2015). a-stratified computerized adaptive testing in the presence of calibration error. Educational and Psychological Measurement.75( 2), 260-283.
9 Covington P., Adams J., & Sargin E . (2016, September). Deep neural networks for Youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems (pp. 191-198). Boston, MA: ACM.
10 Georgiadou E. G., Triantafillou E., & Economides A. A . ( 2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology.Learning and Assessment, 5( 8), 1-39.
11 He W., Diao Q., & Hauser C . ( 2014). A comparison of four item-selection methods for severely constrained CATs. Educational and Psychological Measurement.74( 4), 677-696.
12 Jia Z., Yang Y., Gao W., & Chen X . ( 2015,February). User-based collaborative filtering for tourist attraction recommendations. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology (pp. 22-25). Ghaziabad, India: IEEE.
13 Kaplan M., de la Torre J., & Barrada J. R . ( 2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied psychological measurement.39( 3), 167-188.
14 Kla?nja-Mili?evi? A., Ivanovi? M., & Nanopoulos A . ( 2015). Recommender systems in e-learning environments: A survey of the state-of-the-art and possible extensions. Artificial Intelligence Review.44( 4), 571-604.
15 Koren Y. & Bell R. , ( 2015). Advances in collaborative filtering. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp. 77-118). Boston, MA: Springer.
16 Lika B., Kolomvatsos K., & Hadjiefthymiades S . ( 2014). Facing the cold start problem in recommender systems. Expert Systems with Applications.41( 4), 2065-2073.
17 Liu Q., Chen E. H., Zhu T. Y., Huang Z. Y., Wu R. Z., Su Y., & Hu G. P . ( 2018). Research on educational data mining for online intelligent learning. Pattern Recognition and Artificial Intelligence.31( 1), 77-90.
18 [ 刘淇, 陈恩红, 朱天宇, 黄振亚, 吴润泽, 苏喻, 胡国平 . ( 2018). 面向在线智慧学习的教育数据挖掘技术研究. 模式识别与人工智能.31( 1), 77-90.]
19 Lord F.M . ( 1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.
20 Mao X.Z., & Xin T. , ( 2011). Item selection method in computerized adaptive testing. Advances in Psychological Science.19( 10), 1552-1562.
21 [ 毛秀珍, 辛涛 . ( 2011). 计算机化自适应测验选题策略述评. 心理科学进展.19( 10), 1552-1562.]
22 , Mao X.Z., & Xin T. , ( 2015). Multidimensional computerized adaptive testing: Model, techniques and methods. Advances in Psychological Science.23( 5), 907-918.
23 [ 毛秀珍, 辛涛 . ( 2015). 多维计算机化自适应测验: 模型, 技术和方法. 心理科学进展.23( 5), 907-918.]
24 Pirasteh P., Jung J. J., & Hwang D . (2014, April). Item-based collaborative filtering with attribute correlation: A case study on movie recommendation. In N. T. Nguyen, B. Attachoo, B. Trawiński, & K. Somboonviwat (Eds.), In Proceedings of the 6th Asian Conference on Intelligent Information and Database Systems (pp. 245-252). Cham, Switzerland: Springer.
25 Quijano-Sánchez L., Recio-García J. A., Díaz-Agudo B., & Jiménez-Díaz G . ( 2011, March). Happy movie: A group recommender application in facebook. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference (pp. 419-420). Palm Beach, FL: AAAI.
26 Ricci F., Rokach L., & Shapira B . ( 2015). Recommender systems: Introduction and challenges. In F. Ricci, L. Rokach, & B. Shapira (Eds.), Recommender Systems Handbook (2nd ed., pp.1-34). Boston, MA: Springer.
27 Smith B.., & Linden G. , ( 2017). Two decades of recommender systems at Amazon. com. IEEE Internet Computing.21( 3), 12-18.
28 Tan P. N., Steinbach M., & Kumar V. .,( 2006). Introduction to Data Mining .New York, NY: Pearson Education.
29 Thai-Nghe N., Drumond L., Krohn-Grimberghe A., & Schmidt-Thieme L . ( 2010). Recommender system for predicting student performance. Procedia Computer Science.1( 2), 2811-2819.
30 Wang H., Wang N., & Yeung D. Y . ( 2015, August). Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1235-1244).Sydney, NSW, Australia: ACM.
31 Weiss D.J . ( 1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement.6( 4), 473-492.
32 Zhang S., &Chang, H.H . ( 2016). From smart testing to smart learning: How testing technology can assist the new generation of education. International Journal of Smart Technology and Learning.1( 1), 67-92.
33 Zhu T. Y., Huang Z. Y., Chen E. H., Liu Q., Wu R. Z., Wu L., … Hu G. P . ( 2017). Cognitive diagnosis based personalized question recommendation. Chinese Journal of Computers.40( 1), 176-191.
34 [ 朱天宇, 黄振亚, 陈恩红, 刘淇, 吴润泽, 吴乐, .. 胡国平 . ( 2017). 基于认知诊断的个性化试题推荐方法. 计算机学报.40( 1), 176-191.]
[1] GUO Lei; ZHENG Chanjin; BIAN Yufang; SONG Naiqing; XIA Lingxiang. New item selection methods in cognitive diagnostic computerized adaptive testing: Combining item discrimination indices[J]. Acta Psychologica Sinica, 2016, 48(7): 903-914.
[2] GUO Lei;WANG Zhuoran;WANG Feng;BIAN Yufang. a-Stratified Methods Combining Item Exposure Control and General Test Overlap in Computerized Adaptive Testing[J]. Acta Psychologica Sinica, 2014, 46(5): 702-713.
[3] CHENG Xiao-Yang,DING Shu-Liang,YAN Shen-Hai,ZHU Long-Yin. New Item Selection Criteria of Computerized Adaptive Testing with Exposure-Control Factor[J]. , 2011, 43(02): 203-212.
[4] LIU Zhen,DING Shu-Liang,LIN Hai-Jing. Item Selection Strategies for Computerized Adaptive Testing with the Generalized Partial Credit Model[J]. , 2008, 40(05): 618-625.
[5] Dai Haiqi,Chen Dezhi,Ding Shuliang,Deng Taiping. The Comparison Among Item Selection Strategies of CAT with Multiple-choice Items[J]. , 2006, 38(05): 778-783.
[6] Chen-Ping,Ding-Shuliang,Lin-,Zhou-Jie. Item Selection Strategies of Computerized Adaptive Testing based on Graded Response Model[J]. , 2006, 38(03): 461-467.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Acta Psychologica Sinica
Support by Beijing Magtech