首页 期刊介绍 编 委 会 投稿指南 期刊订阅 联系我们 English

## 回归混合模型：方法进展与软件实现

,1,2,3, ,4

1. 广州大学心理系

2. 广州大学心理测量与潜变量建模研究中心

3. 广东省未成年人心理健康与教育认知神经科学实验室, 广州 510006

4. 中国政法大学社会学院, 北京 102249

## Regression mixture modeling: Advances in method and its implementation

,1,2,3, ,4

1. Department of Psychology, Guangzhou University

2. The Center for Psychometric and Latent Variable Modeling, Guangzhou University

3. The Key Laboratory for Juveniles Mental Health and Educational Neuroscience in Guangdong Province, Guangzhou University, Guangzhou 510006, China

4. School of Sociology, China University of Political Science and Law, Beijing 102249, China

 基金资助: *国家自然科学基金.  31400904广州大学“创新强校工程”青年创新人才类项目.  2014WQNCX069广州大学青年拔尖人才培养项目.  BJ201715

Abstract

The person-centered methods, including latent class analysis (LCA) and latent profile analysis (LPA), are increasingly popular in recent years. Researchers often add covariate variables (i.e., predictor and distal variables) into LCA and LPA models. This kind of models are also called regression mixture models. In this paper, we introduce several new methods. Those methods include (1) the LTB method proposed by Lanza, Tan and Bray (2013) to model categorical outcome variables; and (2) the BCH method proposed by Bolck, Croon and Hagenaars (2004) to deal with continuous distal variables. Using an empirical example, we demonstrate the process of analyses in Mplus. The future directions of those new methods were also discussed.

Keywords： person-centered method ; mixture modeling ; latent class analysis ; latent variable modeling ; Mplus

WANG Meng-Cheng, BI Xiangyang. (2018). Regression mixture modeling: Advances in method and its implementation. Advances in Psychological Science, 26(12), 2272-2280

## 1 潜类别模型

### 图1

$p\left( {{Y}_{i}}\text{ }\!\!|\!\!\text{ }{{c}_{i}}=k \right)=\underset{j=1}{\overset{J}{\mathop \prod }}\,p\left( {{Y}_{ij}}\text{ }\!\!|\!\!\text{ }{{c}_{i}}=k \right)$

$P\left( {{\text{Y}}_{i}} \right)=\underset{t=1}{\overset{T}{\mathop \sum }}\,P\left( C=t \right)\underset{k=1}{\overset{K}{\mathop \prod }}\,P\left( {{Y}_{ik}}\text{ }\!\!|\!\!\text{ }C=t \right)$

$p\left( {{c}_{i}}=k \right)$表示某一类别组k所占总体的比率, 亦称潜类别概率。

### 2.1包含预测变量的潜类别模型

(1)单步法

$P\left( {{\text{Y}}_{i}}\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\underset{t=1}{\overset{T}{\mathop \sum }}\,P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)\underset{k=1}{\overset{K}{\mathop \prod }}\,P\left( {{Y}_{ik}}\text{ }\!\!|\!\!\text{ }C=t \right)$

$P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)$为考虑协变量Z时, 属于潜类别t的概率, 该值可通过多项式逻辑斯特回归获得(Bakk&Vermunt, 2016)：

$P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\frac{{{\text{e}}^{{{\alpha }_{t}}+{{\beta }_{t}}{{Z}_{i}}}}}{\sum\limits_{s=1}^{T}{{{\text{e}}^{{{\alpha }_{s}}+{{\beta }_{s}}{{Z}_{i}}}}}}$

(2)简单三步法

### 图3

(3)概率回归法和加权概率回归法

(4)虚拟类别法

LCA根据一次分析的后验概率将个体分组, 这种做法存在抽样误差的问题3(3这里类似参数估计的点估计, 为了考虑抽样误差的影响通常采用区间估计。)。虚拟类别法(pseudoclass method, PC法)采用类似缺失值分析时使用的多重插补法, 从个体的后验概率分布中随机抽取若干个(通常20次)可能的后验概率值4(4因为存在分类不确定性所以抽取多个可能值作为分类误差。), 根据每次的概率值将个体分配到不同的类别, 然后平均若干次的结果作为最终的分类结果(Wang, Brown, &Bandeen-Roche, 2005)。

Clark和MuthÉn (2009)的模拟发现, 当分类精确性较高时(entropy > 0.8), 该方法表现较好; 然而在最近的模拟研究中发现, 与稳健三步法和单步法相比, 虚拟类别法在同等条件下表现最差 (Asparouhov&MuthÉn, 2014), 在实际应用中并不被推荐使用。

(5)稳健三步法或MML法

### 图4

${{p}_{{{C}_{1}},{{C}_{2}}}}=P\left( C={{c}_{2}}\text{ }\!\!|\!\!\text{ }N={{c}_{1}} \right)=\frac{1}{{{N}_{{{C}_{1}}}}}\underset{{{N}_{i}}={{c}_{1}}}{\mathop \sum }\,P\left( {{C}_{i}}={{c}_{2}}\text{ }\!\!|\!\!\text{ }{{U}_{i}} \right)$

${{q}_{{{c}_{2}},{{c}_{1}}}}=P\left( N={{c}_{1}}\text{ }\!\!|\!\!\text{ }C={{c}_{2}} \right)=\frac{{{p}_{{{c}_{1}},{{c}_{2}}}}{{N}_{{{c}_{1}}}}}{\mathop{\sum }^{}{{~}_{c}}{{p}_{c,{{c}_{2}}}}{{N}_{c}}}$

Nc是根据N将个体分配到C的数量。稳健三步法使用$\text{log}\left( {{q}_{{{c}_{1}},{{c}_{2}}}}/{{q}_{k,{{c}_{2}}}} \right)$作为N估计C的权重。

(6)修正的BCH法

BCH法最早由Bolck等(2004)提出, 用于处理包含分类预测变量的LCA。该方法与稳健三步法逻辑类似, 区别在于稳健三步法的第三步的估计方程采用极大似然估计, 而BCH将其转换成加权方差分析, 分类误差作为权重。

BCH法的不足在于, 当类别距离很小以及小样本量时, 类别内的误差方差可能是负值。此时如果把类别内方差固定相等, 也可以获得正确的类别组内结果变量的均值(Bakk&Vermunt, 2016)。

### 2.2包含结果变量的LCA

2.2.1结果变量是连续变量

(1) 单步法

$P\left( {{\text{Y}}_{i}}\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\underset{t=1}{\overset{T}{\mathop \sum }}\,P\left( C=t \right)\underset{k=1}{\overset{K}{\mathop \prod }}\,P\left( {{Y}_{ik}}\text{ }\!\!|\!\!\text{ }C=t \right)f\left( {{Z}_{i}}\text{ }\!\!|\!\!\text{ }C=t \right)$

$f\left( {{Z}_{i}}\text{ }\!\!|\!\!\text{ }X=t \right)$为协变量Z在特定类别内的分布, 连续变量时为正态分布, 如果存在多个连续变量则为多元正态分布。

(2)LTB法

Lanza等(2013)最近提出了一种新的方法可以避免单步法违反假设时结果不准确的问题, 因为这种方法并没有特定的分布假设。在LTB法中, 首先将结果变量Z作为协变量纳入LCA分析(过程同包含预测变量的单步法), 流程如图5

### 图5

${{\mu }_{t}}=\underset{Z}{\overset{~}{\mathop \int }}\,Z~f\left( Z\text{ }\!\!|\!\!\text{ }C=t \right)$

$f\left( Z\text{ }\!\!|\!\!\text{ }C=t \right)=\frac{f\left( Z \right)P\left( C=t\text{ }\!\!|\!\!\text{ }Z \right)}{P\left( C=t \right)}$

${{\mu }_{t}}=\underset{i=1}{\overset{N}{\mathop \sum }}\,{{Z}_{i}}\frac{P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)}{N~~P\left( C=t \right)}$

Lanza等(2013)并没有给出${{\mu }_{t}}$的标准误公式, Asparouhov和MuthÉn (2014)建议使用类别特定的方差的均方根除以类别特定的样本量获得, 但模拟研究发现这种做法会低估标准误(Bakk&Vermunt, 2016)。随后, Bakk, Oberski和Vermunt (2016)提出了Jackknife和Bootstrap再抽样的标准误。

(3)修正的LTB法

$P\left( {{N}_{i}}=s\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\underset{~}{\overset{T}{\mathop \sum }}\,P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)P\left( {{N}_{i}}=s\text{ }\!\!|\!\!\text{ }C=t \right)$

### 图6

$P\left( C=t\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\frac{\text{exp}\left( {{\alpha }_{t}}+{{\beta }_{t}}{{Z}_{i}}+{{\gamma }_{t}}Z_{i}^{2} \right)}{\sum\limits_{{t}'=1}^{T}{\text{exp}\left( {{\alpha }_{{{t}'}}}+{{\beta }_{{{t}'}}}{{Z}_{i}}+{{\gamma }_{t}}Z_{i}^{2} \right)}}$

(4)修正BCH法

(5)稳健三步法

$P\left( N=s\text{ }\!\!|\!\!\text{ }{{Z}_{i}} \right)=\underset{t=1}{\overset{T}{\mathop \sum }}\,P\left( C=t \right)f\left( {{Z}_{i}}\text{ }\!\!|\!\!\text{ }C=t \right)P\left( N=s\text{ }\!\!|\!\!\text{ }C=t \right)$

$P\left( N=s\text{ }\!\!|\!\!\text{ }C=t \right)$被固定为第二步估计的分类精确性参数, $f\left( {{Z}_{i}}\text{ }\!\!|\!\!\text{ }C=t \right)$通常服从正态分布。如前所述, 结果变量是连续变量的LCA的目的在于估计结果变量在潜类别不同水平上的均值差异, 但结果变量的方差在不同类别组内可能相等也可能存在差异(类似方差分析时的组内方差同质假设)。针对方差的不同情况, 稳健三步法有两种不同的变式：类别组内方差同质和类别组内方差异质。

2.2.2结果变量是类别变量

LTB法在处理分类结果变量时表现较好, 不会像分析连续结果变量时出现违反正态和方差同质假设后的估计偏差问题。在Asparouhov和MuthÉn (2014)的模拟研究中, 检验了3个样本量(N = 200, 500和2000)和2种分类精确性(entropy= 0.5和0.65)下LTB的表现, 结果发现仅在N = 200和entropy = 0.5时才会出现明显的偏差。

### 2.3回归混合模型方法的适用情境汇总表

Auxiliary=()

LTB DCAT 是处理类别结果变量最好的方法之一, 推荐使用。

BCH BCH 是处理连续结果变量最好的方法之一, 在 DU3STEP不报告结果时使用。

LTB DCON 对假设前提比较敏感, 当假设违反时会扭曲估计结果, 不推荐使用
PC method E 精确性较差, 不推荐实际使用

## 3 实例分析

(1)潜类别分析

 Title: Lantent Class AnalysisData: File is older_survey.dat ;Variable: Names = C2A C2B C2C C2D C2E C2F C2G C2H C2I C2J C2K C2L C2M C2N C2P C2Q ifold age gds agesq11(11 年龄平方项(/100)); USEVARIABLES = C2A-C2Q; MISSING are all (-9999) ; CATEGORICAL = C2A-C2Q; CLASSES = C (2);Analysis: TYPE = MIXTURE; Starts = 50 3; PROCESSORS = 4; !根据电脑情况指定PLOT: TYPE = PLOT3; SERIES = C2A-C2Q (*);Savedata: file is older_survey.txt ; save is cprob; output: tech11 tech14;

### 图7

(2)加入预测变量的回归混合模型

 Title: Regression Mixture Modeling with Predictive VariableData: File is older_survey.dat ;Variable: Names = C2A C2B C2C C2D C2E C2F C2G C2H C2I C2J C2K C2L C2M C2N C2P C2Q ifold age gdsagesq; USEVARIABLES = C2A-C2Q; MISSING are all (-9999) ; CATEGORICAL = C2A-C2Q; CLASSES = C (2); AUXILIARY = age (R3STEP);！选择稳健三步法Analysis: TYPE = MIXTURE; PROCESSORS = 4;PLOT:TYPE = PLOT3; SERIES = C2A-C2Q (*);Savedata: file is older_survey.txt ; save is cprob; output: tech11 tech14;

 TESTS OF CATEGORICAL LATENT VARIABLE MULTINOMIAL LOGISTIC REGRESSIONS USING THE 3-STEP PROCEDURE Two-Tailed Estimate S.E. Est./S.E. P-Value C#1 ON AGE 0.153 0.014 11.219 0.000 Intercepts C#1 -12.935 1.031 -12.541 0.000

(3)加入分类结果变量的回归混合模型

 Title: Regression Mixture Modeling with categorical outcome variableData: File is older_survey.dat ;Variable: Names = C2A C2B C2C C2D C2E C2F C2G C2H C2I C2J C2K C2L C2M C2N C2P C2Q ifold age gdsagesq; USEVARIABLES = C2A-C2Q; MISSING are all (-9999) ; CATEGORICAL = C2A-C2Q; CLASSES = C (2); AUXILIARY = ifold (DCAT);！选择DCAT法Analysis: TYPE = MIXTURE; PROCESSORS = 4; LRTSTARTS = 2 1 80 16;PLOT: TYPE = PLOT3; SERIES = C2A-C2Q (*);Savedata: file is older_survey.txt ; save is cprob;output: tech11 tech14;

 EQUALITY TESTS OF MEANS/PROBABILITIES ACROSS CLASSESIFOLDProb S.E. Odds Ratio S.E. 2.5% C.I. 97.5% C.I. Class 1 Category 1 0.265 0.033 1.000 0.000 1.000 1.000 Category 2 0.735 0.0337 2.133 0.389 1.492 3.049 Class 2 Category 1 0.435 0.016 1.000 0.000 1.000 1.000 Category 2 0.565 0.016 1.000 0.000 1.000 1.000

(4)加入连续结果变量的回归混合模型

 Title: Regression Mixture Modeling with continuous outcome variableData: File is older_survey.dat ;Variable: Names = C2A C2B C2C C2D C2E C2F C2G C2H C2I C2J C2K C2L C2M C2N C2P C2Q ifold age gdsagesq; USEVARIABLES = C2A-C2Q; MISSING are all (-9999); CATEGORICAL = C2A-C2Q; CLASSES = C (2); AUXILIARY = gds (BCH);!选择BCH法Analysis: TYPE = MIXTURE; PROCESSORS = 4; LRTSTARTS = 2 1 80 16; !配合tech14PLOT: TYPE = PLOT3; SERIES = C2A-C2Q (*);Savedata: file is older_survey.txt ; save is cprob;output: tech11 tech14;

 EQUALITY TESTS OF MEANS ACROSS CLASSES USING THE BCH PROCEDUREWITH 1 DEGREE (S) OF FREEDOM FOR THE OVERALL TESTGDS Mean S.E. Class 1 4.540 0.211 Class 2 2.903 0.075 Chi-Square P-Value Overall test 52.233 0.000

## 4 小结与展望

The authors have declared that no competing interests exist.

## 参考文献 原文顺序 文献年度倒序 文中引用次数倒序 被引期刊影响因子

Asparouhov, T., &MuthÉn, B. ( 2014).

Auxiliary variables in mixture modeling: Three-step approaches using M plus

Structural Equation Modeling, 21( 3), 329-341.

This article discusses alternatives to single-step mixture modeling. A 3-step method for latent class predictor variables is studied in several different settings, including latent class analysis, latent transition analysis, and growth mixture modeling. It is explored under violations of its assumptions such as with direct effects from predictors to latent class indicators. The 3-step method is also considered for distal variables. The Lanza, Tan, and Bray (2013) method for distal variables is studied under several conditions including violations of its assumptions. Standard errors are also developed for the Lanza method because these were not given in Lanza et al. (2013).

Asparouhov, T., &MuthÉn, B(2015 ).

Auxiliary Variables in Mixture Modeling: Using the BCH Method in Mplus to Estimate a Distal Outcome Model and an Arbitrary Secondary Model

Bakk Z., Oberski D. L., &Vermunt J. K . ( 2016).

Relating latent class membership to continuous distal outcomes: Improving the LTB approach and a modified three-step implementation

Structural Equation Modeling, 23( 2), 278-289.

Latent class analysis often aims to relate the classes to continuous external consequences (“distal outcomes”), but estimating such relationships necessitates distributional assumptions. Lanza, Tan, and Bray (2013) suggested circumventing such assumptions with their LTB approach: Linear logistic regression of latent class membership on each distal outcome is first used, after which this estimated relationship is reversed using Bayes’ rule. However, the LTB approach currently has 3 drawbacks, which we address in this article. First, LTB interchanges the assumption of normality for one of homoskedasticity, or, equivalently, of linearity of the logistic regression, leading to bias. Fortunately, we show introducing higher order terms prevents this bias. Second, we improve coverage rates by replacing approximate standard errors with resampling methods. Finally, we introduce a bias-corrected 3-step version of LTB as a practical alternative to standard LTB. The improved LTB methods are validated by a simulation study, and an example application demonstrates their usefulness.

Bakk Z., Tekle F. B., &Vermunt J. K . ( 2013).

Estimating the association between latent class membership and external variables using bias-adjusted three-step approaches

Sociological methodology.43( 1), 272-311.

Bakk, Z., &Vermunt, J.K. ( 2016).

Robustness of stepwise latent class modeling with continuous distal outcomes

Structural Equation Modeling, 23( 1), 20-31.

Recently, several bias-adjusted stepwise approaches to latent class modeling with continuous distal outcomes have been proposed in the literature and implemented in generally available software for latent class analysis. In this article, we investigate the robustness of these methods to violations of underlying model assumptions by means of a simulation study. Although each of the 4 investigated methods yields unbiased estimates of the class-specific means of distal outcomes when the underlying assumptions hold, 3 of the methods could fail to different degrees when assumptions are violated. Based on our study, we provide recommendations on which method to use under what circumstances. The differences between the various stepwise latent class approaches are illustrated by means of a real data application on outcomes related to recidivism for clusters of juvenile offenders.

Bauer, D.J., &Curran, P.J . ( 2003).

Distributional assumptions of growth mixture models: Implications for overextraction of latent trajectory classes

Psychological Methods, 8( 3), 338-363.

URL     PMID:14596495

Abstract Growth mixture models are often used to determine if subgroups exist within the population that follow qualitatively distinct developmental trajectories. However, statistical theory developed for finite normal mixture models suggests that latent trajectory classes can be estimated even in the absence of population heterogeneity if the distribution of the repeated measures is nonnormal. By drawing on this theory, this article demonstrates that multiple trajectory classes can be estimated and appear optimal for nonnormal data even when only 1 group exists in the population. Further, the within-class parameter estimates obtained from these models are largely uninterpretable. Significant predictive relationships may be obscured or spurious relationships identified. The implications of these results for applied research are highlighted, and future directions for quantitative developments are suggested.

Bolck A., Croon M., &Hagenaars J . ( 2004).

Estimating latent structure models with categorical variables: One-step versus three-step estimators

Political Analysis, 12( 1), 3-27.

We study the properties of a three-step approach to estimating the parameters of a latent structure model for categorical data and propose a simple correction for a common source of bias. Such models have a measurement part (essentially the latent class model) and a structural (causal) part (essentially a system of logit equations). In the three-step approach, a stand-alone measurement model is first defined and its parameters are estimated. Individual predicted scores on the latent variables are then computed from the parameter estimates of the measurement model and the individual observed scoring patterns on the indicators. Finally, these predicted scores are used in the causal part and treated as observed variables. We show that such a naive use of predicted latent scores cannot be recommended since it leads to a systematic underestimation of the strength of the association among the variables in the structural part of the models. However, a simple correction procedure can eliminate this systematic bias. This approach is illustrated on simulated and real data. A method that uses multiple imputation to account for the fact that the predicted latent variables are random variables can produce standard errors for the parameters in the structural part of the model.

Clark, S.L., &MuthÉn, B . ( 2009).

Relating latent class analysis results to variables not included in the analysis

Collins, L.M., &Lanza, S.T . ( 2010).

Latent class and latent transition analysis: With applications in the social, behavioral, and health sciences

. New York: Wiley.

Lanza S. T., Tan X., & Bray B. C . ( 2013).

Latent class analysis with distal outcomes: A flexible model-based approach

Structural Equation Modeling, 20( 1), 1-26.

URL     PMID:4240499

Although prediction of class membership from observed variables in latent class analysis is well understood, predicting an observed distal outcome from latent class membership is more complicated. A flexible model-based approach is proposed to empirically derive and summarize the class-dependent density functions of distal outcomes with categorical, continuous, or count distributions. A Monte Carlo simulation study is conducted to compare the performance of the new technique to 2 commonly used classify-analyze techniques: maximum-probability assignment and multiple pseudoclass draws. Simulation results show that the model-based approach produces substantially less biased estimates of the effect compared to either classify-analyze technique, particularly when the association between the latent class variable and the distal outcome is strong. In addition, we show that only the model-based approach is consistent. The approach is demonstrated empirically: latent classes of adolescent depression are used to predict smoking, grades, and delinquency. SAS syntax for implementing this approach using PROC LCA and a corresponding macro are provided.

Morin A. J. S., Morizot J., Boudrias J-S., &Madore I . ( 2011).

A multifoci person-centered perspective on workplace affective commitment: A latent profile/factor mixture analysis

Organizational Research Methods,14( 1), 58-90.

ABSTRACT The current study aims to explore the usefulness of a person-centered perspective to the study of workplace affective commitment (WAC). Five distinct profiles of employees were hypothesized based on their levels of WAC directed toward seven foci (organization, workgroup, supervisor, customers, job, work, and career). This study applied latent profile analyses and factor mixture analyses to a sample of 404 Canadian workers. The construct validity of the extracted latent profiles was verified by their associations with multiple predictors (gender, age, tenure, social relationships at work, workplace satisfaction, and organizational justice perceptions) and outcomes (in-role performance, organizational citizenship behaviors, and intent to quit). The analyses confirmed that a model with five latent profiles adequately represented the data: (a) highly committed toward all foci; (b) weakly committed toward all foci; (c) committed to their supervisor and moderately committed to the other foci; and (d) committed to their career and moderately uncommitted to the other foci; (e) committed mostly to their proximal work environment. These latent profiles present theoretically coherent patterns of associations with the predictors and outcomes, which suggests their adequate construct validity. [ABSTRACT FROM AUTHOR] Copyright of Organizational Research Methods is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Sterba, S.K. ( 2013).

Multivariate Behavioral Research, 48( 6), 775-815.

URL     PMID:26745595

The methodological literature on mixture modeling has rapidly expanded in the past 15 years, and mixture models are increasingly applied in practice. Nonetheless, this literature has historically been diffuse, with different notations, motivations, and parameterizations making mixture models appear disconnected. This pedagogical review facilitates an integrative understanding of mixture models. First, 5 prototypic mixture models are presented in a unified format with incremental complexity while highlighting their mutual reliance on familiar probability laws, common assumptions, and shared aspects of interpretation. Second, 2 recent extensionshybrid mixtures and parallel-process mixturesare discussed. Both relax a key assumption of classic mixture models but do so in different ways. Similarities in construction and interpretation among hybrid mixtures and among parallel-process mixtures are emphasized. Third, the combination of both extensions is motivated and illustrated by means of an example on oppositional defiant and depressive symptoms. By clarifying how existing mixture models relate and can be combined, this article bridges past and current developments and provides a foundation for understanding new developments.

Vermunt, J.K. ( 2010).

Latent class modeling with covariates: Two improved three-step approaches

Political Analysis, 18, 450-469.

Researchers using latent class (LC) analysis often proceed using the following three steps: (1) an LC model is built for a set of response variables, (2) subjects are assigned to LCs based on their posterior class membership probabilities, and (3) the association between the assigned class membership and external variables is investigated using simple cross-tabulations or multinomial logistic regression analysis. Bolck, Croon, and Hagenaars (2004) demonstrated that such a three-step approach underestimates the associations between covariates and class membership. They proposed resolving this problem by means of a specific correction method that involves modifying the third step. In this article, I extend the correction method of Bolck, Croon, and Hagenaars by showing that it involves maximizing a weighted log-likelihood function for clustered data. This conceptualization makes it possible to apply the method not only with categorical but also with continuous explanatory variables, to obtain correct tests using complex sampling variance estimation methods, and to implement it in standard software for logistic regression analysis. In addition, a new maximum likelihood (ML)u2014based correction method is proposed, which is more direct in the sense that it does not require analyzing weighted data. This new three-step ML method can be easily implemented in software for LC analysis. The reported simulation study shows that both correction methods perform very well in the sense that their parameter estimates and their SEs can be trusted, except for situations with very poorly separated classes. The main advantage of the ML method compared with the Bolck, Croon, and Hagenaars approach is that it is much more efficient and almost as efficient as one-step ML estimation.

Wang C-P., Brown C. H., &Bandeen-Roche K . ( 2005).

Residual diagnostics for growth mixture models: Examining the impact of a preventive intervention on multiple trajectories of aggressive behavior

Journal of the American Statistical Association, 100( 471), 1054-1076.

Growth mixture modeling has become a prominent tool for studying the heterogeneity of developmental trajectories within a population. In this article we develop graphical diagnostics to detect misspecification in growth mixture models regarding the number of growth classes, growth trajectory means, and covariance structures. For each model misspecification, we propose a different type of empirical Bayes residual to quantify the departure. Our procedure begins by imputing multiple independent sets of growth classes for the sample. Then, from these so-called "pseudoclass" draws, we form diagnostic plots to examine the averaged empirical distributions of residuals in each such class. Our proposals draw on the property that each single set of pseudoclass adjusted residuals is asymptotically normal with known mean and (co)variance when the underlying model is correct. These methods are justified in simulation studies involving two classes of linear growth curves that also differ by their covariance structures. These are then applied to longitudinal data from a randomized field trial that tests whether children's trajectories of aggressive behavior could be modified during elementary and middle school. Our diagnostics lead to a solution involving a mixture of three growth classes. When comparing the diagnostics obtained from multiple pseudoclasses with those from multiple imputations, we show the computational advantage of the former and obtain a criterion for determining the minimum number of pseudoclass draws.