[1] Arthur W., Edwards B. D., & Barrett G. V. (2002). Multiple-choice and constructed response tests of ability: Race-based subgroup performance differences on alternative paper-and-pencil test formats.Personnel Psychology, 55(4), 985-1008. [2] Bacon, D. R. (2003). Assessing learning outcomes: A comparison of multiple-choice and short-answer questions in a marketing context.Jounal of Marketing Education, 25(1), 31-36. [3] Basu, T., & Murthy, C. A. (2013, December). Effective text classification by a supervised feature selection approach. IEEE 12th International Conference on Data Mining Workshops (ICDM), 918-925, Brussels, Belgium. [4] Burrows S., Gurevych I., & Stein B. (2015). The eras and trends of automatic short answer grading.Int J Artif Intell Educ, 25(1), 60-117. [5] Burrus J., Betancourt A., Holtzman S., Minsky J., MacCann C., & Roberts R. D. (2012). Emotional intelligence relates to well-being: Evidence from the situational judgment test of emotional management.Applied Psychology: Health and Well-Being, 4(2), 151-166. [6] Cucina J. M., Su C., Busciglio H. H., Thomas P. H., & Peyton S. T. (2015). Video-based testing: A high-fidelity job simulation that demonstrates reliability, validity, and utility. International Journal of Selection and Assessment, 23(3), 197-209. [7] Downer K., Wells C., & Crichton C. (2019). All work and no play: A text analysis.International Journal of Market Research, 61(3), 236-251. [8] Edwards B. D., & Arthur W., Jr. (2007). An examination of factors contributing to a reduction in subgroup differences on a constructed-response paper-and-pencil test of scholastic achievement.Journal of Applied Psychology, 92(3), 794-801. [9] Finch W. H., Finch M. E. H., Mcintosh C. E., & Braun C. (2018). The use of topic modeling with latent dirichlet analysis with open-ended survey items. Translational Issues in Psychological Science, 4(4), 403-424. [10] Funke, U., & Schuler, H. (1998). Validity of stimulus and response components in a video test of social competence.International Journal of Selection and Assessment, 6(2), 115-123. [11] Gu, H. L., & Wen, Z. L. (2017). Reporting and interpreting multidimensional test scores: A bi-factor perspective. Psychological Development and Education, 33(4), 504-512. [顾红磊, 温忠麟. (2017). 多维测验分数的报告与解释: 基于双因子模型的视角. 心理发展与教育, 33(4), 504-512.] [12] Guo F., Gallagher C. M., Sun T., Tavoosi S., & Min H. (2021). Smarter people analytics with organizational text data: Demonstrations using classic and advanced NLP models. Human Resource Management Journal. Advance Online Publication. [13] Iliev R., Dehghani M., & Sagi E. (2015). Automated text analysis in psychology: Methods, applications, and future developments.Language and Cognition, 7(2), 265-290. [14] Kastner, M., & Stangla, B. (2011). Multiple choice and constructed response tests: Do test format and scoring matter?Procedia-Social and Behavioral Sciences. 12, 263-273. [15] Kim, Y. (2014). Convolutional neural networks for sentence classification. Proceedings of the 19th Conference on Empirical Methods in Natural Language Processing, 1746-1751. [16] Kjell O. E., Kjell K., Garcia D., & Sikstrom S. (2019). Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs.Psychological Methods, 24(1), 92-115. [17] Lai S., Xu L., Liu K., & Zhao J. (2015). Recurrent convolutional neural networks for text classification.Proceedings of the 29th AAAI Conference on Artificial Intelligence, 2267-2273. [18] Lee, B. C., & Kim, B. Y. (2021). Development of an AI-based interview system for remote hiring.International Journal of Advanced Research in Engineering and Technology, 12(3), 654-663. [19] Lievens F., de Corte W., & Westerveld L. (2015). Understanding the building blocks of selection procedures: Effects of response fidelity on performance and validity.Journal of Management, 41(6), 1604-1627. [20] Lievens F., Sackett P. R., Dahlke J. A., Oostrom J. K., & de Soete B. (2019). Constructed response formats and their effects on minority-majority differences and validity.Journal of Applied Psychology, 104(5), 715-726. [21] Ling, C. (2020). Development of Classroom Observation Scale to Promote the Professional Development of New Teachers (Unpublished master's thesis). Beijing Normal University. [凌晨. (2020). 课堂观察量表的开发——促进初任教师专业发展 (硕士学位论文). 北京师范大学.] [22] Lubis F. F., Mutaqin, Putri A., Waskita D., Sulistyaningtyas T., Arman A. A., & Rosmansyah Y. (2021). Automated short-answer grading using semantic similarity based on word embedding.International Journal of Technology. 12(3), 571-581. [23] Marentette B. J., Meyers L. S., Hurtz G. M., & Kuang D. C. (2012). Order effects on situational judgment test items: A case of construct-irrelevant difficulty.International Journal of Selection and Assessment, 20(3), 319-332. [24] McDaniel M. A., Hartman N. S., Whetzel D. L., & Grubb W. L. (2007). Situational judgment tests, response instructions, and validity: A meta‐analysis.Personnel Psychology, 60(1), 63-91. [25] McDaniel M. A., Morgeson F. P., Finnegan E. B., Campion M. A., & Braverman E. P. (2001). Use of situational judgment tests to predict job performance: A clarification of the literature.Journal of Applied Psychology, 86(4), 730-740. [26] McDaniel M. A., Psotka J., Legree P. J., Yost A. P., & Weekley J. A. (2011). Toward an understanding of situational judgment item validity and group differences.Journal of Applied Psychology, 96(2), 327-336. [27] Oostrom J. K., Born M. P., Serlie A. W., & van der Molen, H. T. (2010). Webcam testing: Validation of an innovative open-ended multimedia test.European Journal of Work and Organizational Psychology, 19(5), 532-550. [28] Oostrom J. K., Born M. P., Serlie A. W., & van der Molen, H. T. (2011). A multimedia situational test with a constructed-response format: Its relationship with personality, cognitive ability, job experience, and academic performance.Journal of Personnel Psychology, 10(2), 78-88. [29] Oostrom J. K., Born M. P., Serlie A. W., & van der Molen, H. T. (2012). Implicit trait policies in multimedia situational judgment tests for leadership skills: Can they predict leadership behavior?Human Performance, 25(4), 335-353. [30] Pang N., Zhao X., Wang W., Xiao W., & Guo D. (2021). Few-shot text classification by leveraging bi-directional attention and cross-class knowledge.Science China Information Sciences. 64(3), 130103. [31] Qi, S. Q., & Dai, H. Q. (2003). The property, function and the development of situational judgment tests.Psychological Exploration, 23(4), 42-46. [漆书青, 戴海琦. (2003). 情景判断测验的性质、功能与开发编制.心理学探新, 23(4), 42-46.] [32] Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review.Artificial Intelligence Review, 55(3), 2495-2527. [33] Ramineni C., Trapani C. S., Williamson D. M., David T., & Bridgeman B. (2012). Evaluation of the e-rater® scoring engine for the GRE® Issue and Argument Prompts. ETS Research Report Series,(1), i-106. [34] Robson S. M., Jones A., & Abraham J. (2007). Personality, faking, and convergent validity: A warning concerning warning statements.Human Performance, 21(1), 89-106. [35] Rogers, W. T., & Harley, D. (1999). An empirical comparison of three-and four-choice items and tests: Susceptibility to testwiseness and internal consistency reliability.Educational and Psychological Measurement, 59(2), 234-247. [36] Rudner, L. M., & Liang, T. (2002). Automated essay scoring using Bayes’ theorem.The Journal of Technology, Learning and Assessment, 1(2), 1-22. [37] Slaughter J. E., Christian M. S., Podsakoff N. P., Sinar E. F., & Lievens F. (2014). On the limitations of using situational judgment tests to measure interpersonal skills: The moderating influence of employee anger.Personnel Psychology, 67(4), 847-885. [38] Süzen N., Gorban A. N., Levesley J., & Mirkes E. M. (2020). Automatic short answer grading and feed-back using text mining methods.Procedia Computer Science, 169, 726-743. [39] Tavoosi, S. (2022). Development and Validation of a Counterproductive Work Behavior Situational Judgment Test With an Open-ended Response Format: A Computerized Scoring Approach (Unpublished master’s thesis). University of Central Florida. [40] Wang, Y., & Peng, H. L. (2019). Validation on Automatic Scoring for Open-ended Questions in Chinese Oral Tests.China Examinations, 9, 63-71. [王妍, 彭恒利. (2019). 汉语口语开放性试题计算机自动评分的效度验证.中国考试, 9, 63-71.] [41] Weekley, J. A., & Ployhart, R. E. (2005). Situational judgment: Antecedents and relationships with performance.Human Performance, 18(1), 81-104. [42] Whetzel, D. L., & McDaniel, M. A. (2009). Situational judgment tests: An overview of current research. Human Resource Management Review, 19(3), 188-202. [43] Williamson D. M., Xi X., & Breyer F. J. (2012). A framework for evaluation and use of automated scoring.Educational Measurement: Issues and Practice, 31(1), 2-13. [44] Xie, X. Q. (2013). Validation: From reasonable to plausible interpretation of test score.China Examinations, 7, 3-8. [谢小庆. (2013). 效度: 从分数的合理解释到可接受解释.中国考试, 7, 3-8.] [45] Xu, J. P. (2004). Research on Teacher Competency Model and evaluation (Unpublished doctorial dissertation). Beijing Normal University. [徐建平. (2004). 教师胜任力模型与测评研究 (博士学位论文). 北京师范大学.] [46] Yang L., Xin T., Luo F., Zhang S., & Tian X. (2022). Automated evaluation of the quality of ideas in compositions based on concept maps.Natural Language Engineering, 28(4), 449-486. [47] Zhang Y., Lin C., & Chi M. (2020). Going deeper: Automatic short-answer grading by combining student and question models.User Modeling and User-Adapted Interaction, 30(1), 51-80. [48] Zhao Y., Shen Y., & Yao J. (2019, August). Recurrent neural network for text classification with hierarchical multiscale dense connections.Proceedings of the 28th International Joint Conference on Artificial Intelligence, 5450-5456. |