Confidence interval width contours: Sample size planning for linear mixed-effects models

doi:10.3724/SP.J.1041.2024.00124

Abstract

Abstract:

Hierarchical data, which is observed frequently in psychological experiments, is usually analyzed with the linear mixed-effects models (LMEMs), as it can account for multiple sources of random effects due to participants, items, and/or predictors simultaneously. However, it is still unclear of how to determine the sample size and number of trials in LMEMs. In history, sample size planning was conducted based purely on power analysis. Later, the influential article of Maxwell et al. (2008) has made clear that sample size planning should consider statistical power and accuracy in parameter estimation (AIPE) simultaneously. In this paper, we derive a confidence interval width contours plot with the codes to generate it, providing power and AIPE information simultaneously. With this plot, sample size requirements in LMEMs based on power and AIPE criteria can be decided. We also demonstrated how to run simulation studies to assess the impact of the magnitude of experiment effect size and random slope variance on statistical power, AIPE and the results of sample size planning.

There were two sets of simulation studies based on different LMEMs. Simulation study 1 investigated how the experiment effect size influenced power, AIPE and the requirement of sample size for within-subject experiment design, while simulation study 2 investigated the impact of random slope variance on optimal sample size based on power and AIPE analysis for the cross-level interaction effect. The results for binary and continuous between-subject variables were compared. In these simulation studies, two factors regarding sample size varied: number of subjects (I= 10, 30, 50, 70, 100, 200, 400, 600, 800), number of trials (J= 10, 20, 30, 50, 70, 100, 150, 200, 250, 300). The additional manipulated factor was the effect size of experiment effect (standard coefficient of experiment condition = 0.2, 0.5, 0.8, in simulation study 1) and the magnitude of random slope variance (0.01, 0.09 and 0.25, in simulation study 2). In addition, we generated data under balance design (the number of trials for different levels of independent variable was equal) and unbalance design (the number of trials for different levels of independent variable was unequal). A random slope model was used in simulation study 1, while a random slope model with level-2 independent variable was used in simulation study 2. Data-generating model and fitted model were the same. Estimation performance was evaluated in terms of convergence rate, power, AIPE for the fixed effect, and the random effect.

The results are as following. First, there were no convergence problems under all the conditions, except that when the variance of random slope was small and a maximal model was used to fit the data. Second, power increased as sample size, number of trials or effect size increased. However, the number of trials played a key role for the power of within-subject effect, while sample size was more important for the power of cross-level effect. Power was larger for continuous between-subject variable than for binary between-subject variable. Power was larger under balance design than unbalance design. Third, although the fixed effect was accurately estimated under all the simulation conditions, the width 95% confidence interval (95% width) was extremely large under some conditions. Lastly, AIPE for the random effect increased as sample size and/or number of trials increased. The variance of residual was estimated accurately. As the variance of random slope increased, the accuracy of the estimates of variances of random intercept decreased, and the accuracy of the estimates of random slope increased. To simplify the results of these simulation studies, a final set of summary guidelines are presented in the form of confidence interval width contours. Take Figure 1 for example, in Figure 1(a), the shaded area represents the conditions when power is higher than 0.8. While in Figure 1(b), the shaded area represents the conditions when power is higher than 0.8 and rbias of all the random effects are less than 0.1. Color represents different levels of width of 95% credible interval. The shaded area shows recommended sample sizes in terms of power. Practitioners can choose a sample size in the shaded area meets the requirement of width of credible interval, which evaluates the accuracy of parameter estimates for the fixed experimental effect.

In conclusion, if sample size planning was conducted solely based on power analysis, the chosen sample size might not be large enough to obtain accurate estimates of effects size. Therefore, the rational for considering statistical power and AIPE during sample size planning was adopted. To shed light on this issue, this article provided a standard procedure based on a confidence interval width contours plot to recommend sample size and number of trials for using LMEMs. This plot visualizes the combined effect of sample size and number of trials per participant on 95% width, power and AIPE for random effects. Based on this tool and other empirical considerations, practitioners can make informed choices about how many participants to test, and how many trials to test each one for.

Key words: linear mixed-effects models, multilevel models, power analysis, effect size, confidence interval width

LIU Yue, XU Lei, LIU Hongyun, HAN Yuting, YOU Xiaofeng, WAN Zhilin. (2024). Confidence interval width contours: Sample size planning for linear mixed-effects models. Acta Psychologica Sinica, 56(1), 124-138.

[1]	LIU Hongyun, YUAN Ke-Hai, GAN Kaiyu. Two-level mediated moderation models with single level data and new measures of effect sizes [J]. Acta Psychologica Sinica, 2021, 53(3): 322-338.
[2]	WEN Zhonglin; FAN Xitao; YE Baojuan; CHEN Yushuai. Characteristics of an effect size and appropriateness of mediation effect size measures revisited [J]. Acta Psychologica Sinica, 2016, 48(4): 435-443.
[3]	ZHENG Chan-Jin,GUO Cong-Ying,BIAN Yu-Fang. Using Testlet DIF Procedures to Detect Testlet DIF in Chinese Passage-based Reading Testing [J]. , 2011, 43(07): 830-835.
[4]	Guo Chunyan 1, Zhu Ying 2 , Wang Quanzhen 3 ( 1Institute of Educational Sciences,Capital Normal University, Beijing 100037 , 2Department of Psychology, Beijing University,Beijing 100871, 3English Department , Beijing Administrative College,Be. A SIMULATED COMPARATIVE STUDY OF SIGNIFICANCE T-TEST AND META-ANALYSIS [J]. , 2002, 34(02): 46-50.
[5]	Guo Chunyan Zhu Ying (Capital Normal University) (Beijing university). A COMPARATIVE STUDY OF SIGNIFICANCE T-TEST AND META-ANALYSIS [J]. , 1997, 29(04): 436-442.

Confidence interval width contours: Sample size planning for linear mixed-effects models

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 1

References

Related Articles 5

Recommended Articles

Metrics

Comments