ISSN 0439-755X
CN 11-1911/B
主办:中国心理学会
   中国科学院心理研究所
出版:科学出版社

心理学报 ›› 2016, Vol. 48 ›› Issue (11): 1489-1498.doi: 10.3724/SP.J.1041.2016.01489

• 论文 • 上一篇    

缺失数据的结构方程建模:全息极大似然估计时辅助变量的作用

王孟成; 邓俏文   

  1. (广州大学心理系; 广州大学心理测量与潜变量建模研究中心, 广州 510006)
  • 收稿日期:2015-06-05 发布日期:2016-11-25 出版日期:2016-11-25
  • 通讯作者: 王孟成, E-mail: wmcheng2006@126.com
  • 基金资助:

    国家自然科学基金(31400904)和广州大学“创新强校工程” (2014WQNCX069)项目资助。

The mechanism of auxiliary variables in full information maximum likelihood–based structural equation models with missing data

WANG Meng-Cheng; DENG Qiaowen   

  1. (Department of Psychology, Guangzhou University; Center for Psychometric and Latent Variable Modeling, Guangzhou University, Guangzhou 510006, China)
  • Received:2015-06-05 Online:2016-11-25 Published:2016-11-25
  • Contact: WANG Meng, E-mail: wmcheng2006@126.com

摘要:

本研究通过蒙特卡洛模拟考查了采用全息极大似然估计进行缺失数据建模时辅助变量的作用。具体考查了辅助变量与研究变量的共缺机制、共缺率、相关程度、辅助变量数目与样本量等因素对参数估计结果精确性的影响。结果表明, 当辅助与研究变量共缺时:(1) 对于完全随机缺失的辅助变量, 结果更容易出现偏差; (2) 对于MAR-MAR组合机制, 纳入单个辅助变量是有益的; 对于MAR-MCAR或MAR-MNAR组合机制, 纳入多于一个辅助变量的效果更好; (3) 纳入与研究变量低相关的辅助变量对结果也是有益的。

关键词: 缺失数据, 缺失机制, 结构方程, 全息极大似然估计, 辅助变量, 蒙特卡洛模拟

Abstract:

In social and behavioral studies, missing data cannot be avoided in the process of data collection, especially in longitudinal studies. Because sample with missing data lose the balance characteristics of their complete counterparts, which may distort parameter estimates and degrade the performance of confidence intervals, special methods have to be developed for these analysis. Two modern missing data analysis techniques, maximum likelihood estimation and multiple imputation, have been widely studied in the methodological literature during the last decade. Since the maximum likelihood estimation and multiple imputation require the MAR (missing at random) assumption, including auxiliary variables can help fine-tune the missing data handling procedure, either by reducing bias or by increasing power. A useful auxiliary variable is a potential cause or a correlate of the incomplete variables in the analysis model. Notably, Graham (2003) proposed a “saturated correlates model”, which allows us to include auxiliary variables in FIML-based structural equation models easily. However, some questions about the inclusion of auxiliary variables are needed to further study. The main research question was under what condition the auxiliary variables will be effective in the FIML-based structural equation modeling. The current study investigates the effect of including auxiliary variables during estimation of structural equation modeling parameters with FIML estimation through Monte Carlo simulation. It focused on the missing values of the auxiliary variables and variables of interests simultaneously. The simulation repeated 5,000 times for each of 576 combinations: common missing rates (5 percent, 10 percent, 15 percent, and 20 percent), missing mechanism combinations (MCAR-MCAR, MCAR-MAR, MCAR-MNAR, MAR-MCAR, MAR-MAR, and MAR-MNAR), correlations (low, moderate to high), number of auxiliary variables (1, 3, 5), and sample sizes (100, 200, 500, 1000). The evaluation criteria are bias and confidence intervals coverage of parameters. Data generates according to Enders (2008) model. All data generate and analyze by Mplus 7.0. Auxiliary variables without missing values outperformed auxiliary variables with missing values. Including auxiliary variables which had missing values in the analysis procedure was found to improve parameter estimation efficiently in most cases. Results showed that the bias was more serious when the missing mechanism of the auxiliary variables was MCAR than MNAR. In the FIML-based structural equation modeling, the inclusion of more than a single auxiliary variable for MAR-MCAR or MAR-MNAR combined mechanisms is beneficial, while for MAR-MAR combined mechanism, a single auxiliary variable would be better. In addition, it is beneficial to include auxiliary variables which had low correlation with variables of interests in this model. However, simulation results indicated that the common missing rates had little impact on bias. Overall, this study indicates that the inclusion of incomplete auxiliary variables is beneficial, even if the auxiliary variables and variables of interests have a relative proportion of missing data.

Key words: missing data, missing mechanism, SEM, full information maximum likelihood, auxiliary variable, Monte Carlo simulation