ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2026, Vol. 58 ›› Issue (3): 558-568.doi: 10.3724/SP.J.1041.2026.0558 cstr: 32110.14.2026.0558

• 研究报告 • 上一篇    下一篇

基于长短期记忆网络的探索性因子分析因子保留方法

郭磊1,2, 秦海江1   

  1. 1西南大学心理学部;
    2中国基础教育质量监测协同创新中心西南大学分中心, 重庆 400715
  • 收稿日期:2025-04-02 发布日期:2025-12-26 出版日期:2026-03-25
  • 通讯作者: 郭磊, E-mail: happygl1229@swu.edu.cn
  • 基金资助:
    中央高校基本科研业务费专项资金(SWU2109222; SWU-XJLJ202307); 西南大学2035先导计划项目(SWUPilotPlan006)

Factor retention in exploratory factor analysis based on LSTM

GUO Lei1,2, QIN Haijiang1   

  1. 1Faculty of Psychology, Southwest University;
    2Southwest University Branch, Collaborative Innovation Center of Assessment toward Basic Education Quality, Chongqing 400715, China
  • Received:2025-04-02 Online:2025-12-26 Published:2026-03-25

摘要: 心理学研究中, 确定心理特质的维度及其特征极为重要。探索性因子分析(EFA)是识别潜在维度的一种重要统计方法。准确识别因子数量是EFA的关键技术之一, 低估或者高估因子数量都会带来不良后果。为准确识别因子数量, 本研究将特征根视作序列数据, 采用长短期记忆(LSTM)网络构建的深度神经网络的各项评估指标(准确率、精确率、召回率、F1、Kappa)均在83%以上。通过大规模的模拟实验及实证研究, 验证了LSTM在不同数据条件中的性能。结果表明: LSTM比CDF、EKC和PA方法具有更高的准确率, 平均提升率为48.50%, 最大提升率高达171.09%。而且, LSTM比CDF、EKC和PA方法具有更小偏差, 表现出更好稳健性。研究者可使用R包LSTMfactors调用本研究所训练的LSTM分析实证数据。

关键词: 探索性因子分析, 长短期记忆, 因子保留, 深度学习

Abstract: Psychological research focuses on the latent traits of individuals, necessitating clear operational definitions to delineate the constructs of interest. Following this, the exploration and description of the dimensions and characteristics of these traits are essential. Exploratory Factor Analysis (EFA) is a pivotal statistical method for identifying these latent dimensions, widely utilized, especially in the development of psychological scales and instruments.
A critical aspect of employing EFA is the accurate determination of the number of factors. Underestimating the number of factors may result in the omission of theoretically significant psychological structures or sub-dimensions, leading to the loss of critical information, increased estimation errors in factor loadings, and diminished accuracy of factor scores. Conversely, overestimating the number of factors may lead to factors splitting, where the primary loadings of manifest variables are dispersed across multiple factors, thereby weakening the association between the manifest variables and the intended factor. Moreover, this may result in a model characterized by undue complexity and structures of limited practical or theoretical utility. To address these challenges, researchers have proposed various methods, including the Kaiser criterion (i.e., eigenvalues greater than one), the empirical Kaiser criterion, Parallel Analysis, the Hull method, Comparison Data, Factor Forest, and Comparison Data Forest. With the rapid advancement of machine learning, its application in EFA has begun to attract attention. This study introduces an innovative approach by treating eigenvalues as sequential data and leveraging Long Short-Term Memory (LSTM) networks to construct a predictive model. The performance of the LSTM-based method was subsequently evaluated through extensive simulations and empirical studies under diverse data conditions, demonstrating its robustness and applicability.
The findings of the study indicate that: (1) After hyperparameter tuning, an optimal combination was identified, enabling the LSTM model to achieve excellent performance across accuracy, precision, and other evaluation metrics, demonstrating high classification capability. (2) In the simulation study, the LSTM model significantly outperformed Comparison Data Forest, the Empirical Kaiser Criterion, and Parallel Analysis under nearly all data conditions, with an average improvement in estimation accuracy of 48.50% and a maximum improvement of 171.09%.
Furthermore, an empirical study was conducted using data from a parental psychological control scale administered to a cohort of 987 high school students in a city in 2022. Both traditional methods and the LSTM approach were employed to assess ecological validity. The results demonstrated that the LSTM provided the most accurate estimation of the number of factors, while the CDF method exhibited a significant tendency to overestimate. Overall, the LSTM proposed in this study demonstrates strong practical value and is worthy of broader adoption. Researchers can use the R package LSTMfactors to call the LSTM trained in this study to analyze empirical data.

Key words: exploratory factor analysis, LSTM, factor retention, deep learning

中图分类号: