LGM-based analyses with missing data: Comparison between ML method and Diggle-Kenward selection model
ZHANG Shanshan1; CHEN Nan2,3; LIU Hongyun2
(1 School of Labor Economics, Capital University of Economics and Business, Beijing, 100070, China) (2 Beijing Key Laboratory of Applied Experimental Psychology, School of Psychology, Beijing Normal University, Beijing, 100875, China) (3 QuintilesIMS Incorporated, Beijing, 100005, China)
In longitudinal studies, missing data are common. The missing not at random (MNAR) data may lead to biasd parameter estimates and even distort the results of analyses. In this article we compared two techniques based on different mechanisms [i.e., the maximum likelihood approach based on the Missing at Random (MAR) mechanism and the Diggle-Kenward selection model based on the MNAR mechanism] for handling different types of missing data using the Monte Carlo simulation method. Estimates of parameters and standard errors using each of these methods were contrasted under different model assumptions. Four possible influential factors were considered: the dropout missingness proportions, the sample size, the distribution shape (i.e., skewness and kurtosis), and the missing mechanisms. The results indicated that (1) The Diggle-Kenward selection model were affected less by the missingness mechanism than the ML approach. At the MAR condition, the Diggle-Kenward selection model based on the MNAR mechanism kept stable and would provide similar estimation results with the ML approach based on the MAR assumption. At the MNAR condition, the ML approach was not much different from the Diggle-Kenward selection model in their variance of latent variances (σi2 and σs2) but had greater discrepancy in their means of the latent variables (μi and μs). (2) The distribution shape had more impact on the Diggle-Kenward selection model. For the mean and variance of the intercept and the variance of the slope, the sample size and the degrees of skewness and kurtosis had significant interactions. With large sample sizes, the influence of distribution shape on the estimation precision would decrease. The ML approach was not easily affected by the distribution shape. (3) When fitting a growth curve model, compared to the means of the latent variables (μi and μs), the variances (σi2 and σs2) were influenced much more by the distribution shape (i.e., the degree of skewness and kurtosis). (4) The level of dropout missingness proportion was the major factor affecting the parameter estimation precision. Greater sample size would improve the estimation precision in most cases.