ISSN 0439-755X
CN 11-1911/B

Acta Psychologica Sinica ›› 2016, Vol. 48 ›› Issue (11): 1489-1498.doi: 10.3724/SP.J.1041.2016.01489

Previous Articles    

The mechanism of auxiliary variables in full information maximum likelihood–based structural equation models with missing data

WANG Meng-Cheng; DENG Qiaowen   

  1. (Department of Psychology, Guangzhou University; Center for Psychometric and Latent Variable Modeling, Guangzhou University, Guangzhou 510006, China)
  • Received:2015-06-05 Published:2016-11-25 Online:2016-11-25
  • Contact: WANG Meng, E-mail:


In social and behavioral studies, missing data cannot be avoided in the process of data collection, especially in longitudinal studies. Because sample with missing data lose the balance characteristics of their complete counterparts, which may distort parameter estimates and degrade the performance of confidence intervals, special methods have to be developed for these analysis. Two modern missing data analysis techniques, maximum likelihood estimation and multiple imputation, have been widely studied in the methodological literature during the last decade. Since the maximum likelihood estimation and multiple imputation require the MAR (missing at random) assumption, including auxiliary variables can help fine-tune the missing data handling procedure, either by reducing bias or by increasing power. A useful auxiliary variable is a potential cause or a correlate of the incomplete variables in the analysis model. Notably, Graham (2003) proposed a “saturated correlates model”, which allows us to include auxiliary variables in FIML-based structural equation models easily. However, some questions about the inclusion of auxiliary variables are needed to further study. The main research question was under what condition the auxiliary variables will be effective in the FIML-based structural equation modeling. The current study investigates the effect of including auxiliary variables during estimation of structural equation modeling parameters with FIML estimation through Monte Carlo simulation. It focused on the missing values of the auxiliary variables and variables of interests simultaneously. The simulation repeated 5,000 times for each of 576 combinations: common missing rates (5 percent, 10 percent, 15 percent, and 20 percent), missing mechanism combinations (MCAR-MCAR, MCAR-MAR, MCAR-MNAR, MAR-MCAR, MAR-MAR, and MAR-MNAR), correlations (low, moderate to high), number of auxiliary variables (1, 3, 5), and sample sizes (100, 200, 500, 1000). The evaluation criteria are bias and confidence intervals coverage of parameters. Data generates according to Enders (2008) model. All data generate and analyze by Mplus 7.0. Auxiliary variables without missing values outperformed auxiliary variables with missing values. Including auxiliary variables which had missing values in the analysis procedure was found to improve parameter estimation efficiently in most cases. Results showed that the bias was more serious when the missing mechanism of the auxiliary variables was MCAR than MNAR. In the FIML-based structural equation modeling, the inclusion of more than a single auxiliary variable for MAR-MCAR or MAR-MNAR combined mechanisms is beneficial, while for MAR-MAR combined mechanism, a single auxiliary variable would be better. In addition, it is beneficial to include auxiliary variables which had low correlation with variables of interests in this model. However, simulation results indicated that the common missing rates had little impact on bias. Overall, this study indicates that the inclusion of incomplete auxiliary variables is beneficial, even if the auxiliary variables and variables of interests have a relative proportion of missing data.

Key words: missing data, missing mechanism, SEM, full information maximum likelihood, auxiliary variable, Monte Carlo simulation