Data aggregation adequacy testing in multilevel research: A critical literature review and preliminary solutions to key issues

doi:10.3724/SP.J.1042.2020.01392

Abstract

Abstract:

The measurement of shared unit property constructs is ubiquitous in multilevel organizational research, of which the most frequently used approach is to aggregate the ratings of several unit members to the unit level. The data aggregation adequacy testing (DAAT) is a statistical hurdle to ensure the validity and representativeness of aggregated scores. Well-established indicators of DAAT include within-group agreement index, r_WG, and within-group reliability indices, ICC(1) and ICC(2); nonetheless, some key issues are still open to debate, for instance, the superiority of the two families of indicators, the null distribution and data screening decision of r_WG, and appropriate cut-off values. To address the above questions, the current research firstly conducted a content analysis of 166 studies adopting DAAT procedure published on 9 Chinese journals in the field of management and psychology since 2014, coupled with 85 studies from Journal of Applied Psychology as a comparison. Common problems in routine practice of DAAT were identified and related suggestions were proposed as follows: (1) Disentangling and differentiating the role of DAAT indicators; specifically, r_WG should be used as the exclusive indicator of aggregation adequacy, whereas ICC(1) and ICC(2) should be deemed as indices of validity and reliability, respectively. (2) Making prudent and justifiable decisions in choosing null distributions when calculating r_WG index, and excluding groups with low within-group agreement. (3) Applying more reasonable and moderately flexible cut-off values instead of arbitrary and rough practical standards. Last but not the least, researchers should always prioritize theoretical considerations in the process of framework building and DAAT, and unload disproportionate dependence on statistical results.

Key words: multilevel research, shared unit property, aggregation, within-group agreement, within-group reliability

CLC Number:

ZHU Haiteng. Data aggregation adequacy testing in multilevel research: A critical literature review and preliminary solutions to key issues[J]. Advances in Psychological Science, 2020, 28(8): 1392-1408.

Figures/Tables 10

References 75

[1]	毕向阳. (2019). 基于多水平验证性因子分析的城市社区社会资本测量——实例研究及相关方法综述. 社会学研究, (6), 213-237.
[2]	邓今朝, 喻梦琴, 丁栩平. (2018). 员工建言行为对团队创造力的作用机制. 科研管理, 39(12), 171-178.
[3]	方杰, 邱皓政, 张敏强. (2011). 基于多层结构方程模型的情境效应分析——兼与多层线性模型比较. 心理科学进展, 19(2), 284-292.
[4]	方杰, 张敏强, 邱皓政. (2010). 基于阶层线性理论的多层级中介效应. 心理科学进展, 18(8), 1329-1338.
[5]	韩志伟, 刘丽红. (2019). 团队领导组织公民行为的有效性: 以双维认同为中介的多层次模型检验. 心理科学, 42(1), 137-143.
[6]	蒋丽, 李永娟, 田晓明. (2012). 气氛强度: 理论基础及其研究框架. 心理科学, 35(6), 1466-1473.
[7]	李敏, 周恋. (2015). 基于工会直选调节作用的劳动关系氛围、心理契约破裂感知和工会承诺的关系研究. 管理学报, 12(3), 364-371.
[8]	廖卉, 庄瑷嘉. (2012). 多层次理论模型的建立及研究方法. 见陈晓萍, 徐淑英, 樊景立(编), 组织与管理研究的实证方法(第二版) (pp. 442-476). 北京: 北京大学出版社.
[9]	林钲棽, 彭台光. (2006). 多层次管理研究: 分析层次的概念、理论和方法. 管理学报(台), 23(6), 649-675.
[10]	罗胜强, 姜嬿. (2014). 管理学问卷调查研究方法. 重庆: 重庆大学出版社.
[11]	吕洁, 张钢. (2015). 知识异质性对知识型团队创造力的影响机制: 基于互动认知的视角. 心理学报, 47(4), 533-544.
[12]	马君, 张昊民, 杨涛. (2015). 绩效评价、成就目标导向对团队成员工作创新行为的跨层次影响. 管理工程学报, 29(3), 62-71.
[13]	田雪垠, 郑蝉金, 郭少阳, 贺冠瑞. (2019). 基于多层验证性因素分析的各种信度系数方法. 心理学探新, 39(5), 461-467.
[14]	王孟成, 毕向阳. (2018). 潜变量建模与Mplus应用·进阶篇. 重庆: 重庆大学出版社.
[15]	温福星, 邱皓政. (2015). 多层次模式方法论: 阶层线性模式的关键问题与试解. 北京: 经济管理出版社.
[16]	辛自强. (2018). 心理学研究方法新进展. 北京: 北京师范大学出版社.
[17]	徐晓锋, 刘勇. (2007). 评分者内部一致性的研究和应用. 心理科学, 30(5), 1175-1178.
[18]	杨建锋, 王重鸣. (2008). 类内相关系数的原理及其应用. 心理科学, 31(2), 434-437.
[19]	于海波, 方俐洛, 凌文辁. (2004). 组织研究中的多层面问题. 心理科学进展, 12(2), 462-471.
[20]	张勇, 龙立荣, 贺伟. (2014). 绩效薪酬对员工突破性创造力和渐进性创造力的影响. 心理学报, 46(12), 1880-1896.
[21]	张志学. (2010). 组织心理学研究的情境化及多层次理论. 心理学报, 42(1), 10-21.
[22]	Bartko, J. J. (1976). On various intraclass correlation reliability coefficients. Psychological Bulletin, 83(5), 762-765.
[23]	Biemann, T., Cole, M. S., & Voelpel, S. (2012). Within- group agreement: On the use (and misuse) of r_WG and r_WG(J) in leadership research and some best practice guidelines. The Leadership Quarterly, 23(1), 66-80.
[24]	Bliese, P. D. (1998). Group size, ICC values, and group-level correlations: A simulation. Organizational Research Methods, 1(4), 355-373.
[25]	Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 349-381). San Francisco: Jossey-Bass.
[26]	Bliese, P. D., Maltarich, M. A., Hendricks, J. L., Hofmann, D. A., & Adler, A. B. (2019). Improving the measurement of group-level constructs by optimizing between-group differentiation. Journal of Applied Psychology, 104(2), 293-302. URL pmid: 30221952
[27]	Brown, R. D., & Hauenstein, N. M. A. (2005). Interrater agreement reconsidered: An alternative to the r_wg indices. Organizational Research Methods, 8(2), 165-184.
[28]	Burke, M. J., Cohen, A., Doveh, E., & Smith-Crowe, K. (2018). Central tendency and matched difference approaches for assessing interrater agreement. Journal of Applied Psychology, 103(11), 1198-1229. URL pmid: 29963898
[29]	Carron, A. V., Brawley, L. R., Eys, M. A., Bray, S., Dorsch, K., Estabrooks, P., … Terry, P. C. (2003). Do individual perceptions of group cohesion reflect shared beliefs? An empirical analysis. Small Group Research, 34(4), 468-496.
[30]	Castro, S. L. (2002). Data analytic methods for the analysis of multilevel questions: A comparison of intraclass correlation coefficients, r_wg ( _j), hierarchical linear modeling, within- and between-analysis, and random group resampling. The Leadership Quarterly, 13(1), 69-93.
[31]	Chan, D. (1998). Functional relations among constructs in the same content domain at different levels of analysis: A typology of composition models. Journal of Applied Psychology, 83(2), 234-246.
[32]	Cohen, A., Doveh, E., & Eick, U. (2001). Statistical properties of the r_WG ( _J) index of agreement. Psychological Methods, 6(3), 297-310. URL pmid: 11570234
[33]	Cohen, A., Doveh, E., & Nahum-Shani, I. (2009). Testing agreement for multi-item scales with the indices r_WG(J) and AD_M(J). Organizational Research Methods, 12(1), 148-164.
[34]	Dixon, M. A., & Cunningham, G. B. (2006). Data aggregation in multilevel analysis: A review of conceptual and statistical issues. Measurement in Physical Education and Exercise Science, 10(2), 85-107.
[35]	Dunlap, W. P., Burke, M. J., & Smith-Crowe, K. (2003). Accurate tests of statistical significance for r_WG and average deviation interrater agreement indexes. Journal of Applied Psychology, 88(2), 356-362. doi: 10.1037/0021-9010.88.2.356 URL pmid: 12731720
[36]	Dyer, N. G., Hanges, P. J., & Hall, R. J. (2005). Applying multilevel confirmatory factor analysis techniques to the study of leadership. The Leadership Quarterly, 16(1), 149-167.
[37]	Farmer, S. M., Van Dyne, L., & Kamdar, D. (2015). The contextualized self: How team-member exchange leads to coworker identification and helping OCB. Journal of Applied Psychology, 100(2), 583-595. URL pmid: 25111250
[38]	Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72-91. URL pmid: 23646988
[39]	George, J. M., & James, L. R. (1993). Personality, affect, and behavior in groups revisited: Comment on aggregation, levels of analysis, and a recent application of within and between analysis. Journal of Applied Psychology, 78(5), 798-804.
[40]	Glick, W. H. (1985). Conceptualizing and measuring organizational and psychological climate: Pitfalls in multilevel research. Academy of Management Review, 10(3), 601-616.
[41]	González-Romá, V. (2019). Three issues in multilevel research. The Spanish Journal of Psychology, 22(e4), 1-7.
[42]	James, L. R. (1982). Aggregation bias in estimates of perceptual agreement. Journal of Applied Psychology, 67(2), 219-229.
[43]	James, L. R., Demaree, R. G., & Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology, 69(1), 85-98.
[44]	James, L. R., Demaree, R. G., & Wolf, G. (1993). r_wg: An assessment of within-group interrater agreement. Journal of Applied Psychology, 78(2), 306-309.
[45]	Jebb, A. T., Tay, L., Ng, V., & Woo, S. (2019). Construct validation in multilevel studies. In S. E. Humphrey & J. M. LeBreton (Eds.), The handbook of multilevel theory, measurement, and analysis (pp. 253-278). Washington, DC: American Psychological Association.
[46]	Jiang, K., Chuang, C.-H., & Chiao, Y.-C. (2015). Developing collective customer knowledge and service climate: The interaction between service-oriented high-performance work systems and service leadership. Journal of Applied Psychology, 100(4), 1089-1106. URL pmid: 25486260
[47]	Kirkman, B. L., Tesluk, P. E., & Rosen, B. (2001). Assessing the incremental validity of team consensus ratings over aggregation of individual-level data in predicting team effectiveness. Personnel Psychology, 54(3), 645-667.
[48]	Klein, K. J., Conn, A. B., Smith, D. B., & Sorra, J. S. (2001). Is everyone in agreement? An exploration of within-group agreement in employee perceptions of the work environment. Journal of Applied Psychology, 86(1), 3-16. URL pmid: 11302231
[49]	Klein, K. J., & Kozlowski, S. W. J. (2000). From micro to meso: Critical steps in conceptualizing and conducting multilevel research. Organizational Research Methods, 3(3), 211-236.
[50]	Kozlowski, S. W. J., & Hattrup, K. (1992). A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology, 77(2), 161-167. doi: 10.1037/0021-9010.77.2.161 URL
[51]	Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations: Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 3-90). San Francisco: Jossey-Bass.
[52]	Krasikova, D. V., & LeBreton, J. M. (2019). Multilevel measurement: Agreement, reliability, and nonindependence. In S. E. Humphrey & J. M. LeBreton (Eds.), The handbook of multilevel theory, measurement, and analysis (pp. 279-304). Washington, DC: American Psychological Association.
[53]	Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9(2), 202-220.
[54]	Lang, J. W. B., Bliese, P. D., & de Voogt, A. (2018). Modeling consensus emergence in groups using longitudinal multilevel methods. Personnel Psychology, 71(2), 255-281.
[55]	Lang, J. W. B., Bliese, P. D., & Runge, J. M. (in press). Detecting consensus emergence in organizational multilevel data: Power simulations. Organizational Research Methods. doi: 10.1177/1094428119873950 URL pmid: 25620870
[56]	LeBreton, J. M., James, L. R., & Lindell, M. K. (2005). Recent issues regarding r_WG, r*_WG, r_WG(J), and r*_WG(J). Organizational Research Methods, 8(1), 128-138. doi: 10.1177/1094428104272181 URL
[57]	LeBreton, J. M., & Senter, J. L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11(4), 815-852.
[58]	Lüdtke, O., & Robitzsch, A. (2009). Assessing within-group agreement: A critical examination of a random-group resampling approach. Organizational Research Methods, 12(3), 461-487.
[59]	Mathieu, J. E., & Chen, G. (2011). The etiology of the multilevel paradigm in management research. Journal of Management, 37(2), 610-641.
[60]	Meyer, R. D., Mumford, T. V., Burrus, C. J., Campion, M. A., & James, L. R. (2014). Selecting null distributions when calculating r_wg: A tutorial and review. Organizational Research Methods, 17(3), 324-345.
[61]	Morgeson, F. P., & Hofmann, D. A. (1999). The structure and function of collective constructs: Implications for multilevel research and theory development. Academy of Management Review, 24(2), 249-265.
[62]	Moritz, S. E., & Watson, C. B. (1998). Levels of analysis issues in group psychology: Using efficacy as an example of a multilevel model. Group Dynamics: Theory, Research, and Practice, 2(4), 285-298.
[63]	Newman, D. A., & Sin, H.-P. (2020). Within-group agreement (r_WG): Two theoretical parameters and their estimators. Organizational Research Methods, 23(1), 30-64.
[64]	Ng, K.-Y., Koh, C., Ang, S., Kennedy, J. C., & Chan, K.-Y. (2011). Rating leniency and halo in multisource feedback ratings: Testing cultural assumptions of power distance and individualism-collectivism. Journal of Applied Psychology, 96(5), 1033-1044. URL pmid: 21480684
[65]	O’Neill, T. A. (2017). An overview of interrater agreement on Likert scales for researchers and practitioners. Frontiers in Psychology, 8, 777. doi: 10.3389/fpsyg.2017.00777 doi: 10.3389/fpsyg.2017.00777 URL pmid: 28553257
[66]	Quigley, N. R., Tekleab, A. G., & Tesluk, P. E. (2007). Comparing consensus- and aggregation-based methods of measuring team-level variables: The role of relationship conflict and conflict management processes. Organizational Research Methods, 10(4), 589-608.
[67]	Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, USA: Sage.
[68]	Rego, A., Cunha, M. P., & Simpson, A. V. (2018). The perceived impact of leaders’ humility on team effectiveness: An empirical study. Journal of Business Ethics, 148(1), 205-218.
[69]	Schaubroeck, J. M., Shen, Y., & Chong, S. (2017). A dual-stage moderated mediation model linking authoritarian leadership to follower outcomes. Journal of Applied Psychology, 102(2), 203-214. doi: 10.1037/apl0000165 URL pmid: 27786498
[70]	Schneider, B., White, S. S., & Paul, M. C. (1998). Linking service climate and customer perceptions of service quality: Tests of a causal model. Journal of Applied Psychology, 83(2), 150-163. doi: 10.1037/0021-9010.83.2.150 URL pmid: 9577232
[71]	Shen, J. (2016). Principles and applications of multilevel modeling in human resource management research. Human Resource Management, 55(6), 951-965.
[72]	Smith-Crowe, K., Burke, M. J., Cohen, A., & Doveh, E. (2014). Statistical significance criteria for the r_WG and average deviation interrater agreement indices. Journal of Applied Psychology, 99(2), 239-261.
[73]	Smith-Crowe, K., Burke, M. J., Kouchaki, M., & Signal, S. M. (2013). Assessing interrater agreement via the average deviation index given a variety of theoretical and methodological problems. Organizational Research Methods, 16(1), 127-151.
[74]	Van Mierlo, H., Vermunt, J. K., & Rutte, C. G. (2009). Composing group-level constructs from individual-level survey data. Organizational Research Methods, 12(2), 368-392.
[75]	Woehr, D. J., Loignon, A. C., Schmidt, P. B., Loughry, M. L., & Ohland, M. W. (2015). Justifying aggregation with consensus-based constructs: A review and examination of cutoff values for common aggregation indices. Organizational Research Methods, 18(4), 704-737.

报告项目	报告数量 (按变量计)^a		报告数量 (按文献计)^b
报告项目	n	%	n	%
均值	313^c/157	87.43/72.69	138/53	88.46/74.65
中位数	92/76	25.70/35.19	41/29	26.28/40.85
范围	53/13	14.80/6.02	21/6	13.46/8.45
达到划界值的组数或比例	32/4	8.94/1.85	12/3	7.69/4.23
计算依据的原分布	8/31	2.23/14.35	3/15	1.92/21.13

报告项目	报告数量 (按变量计)^a		报告数量 (按文献计)^b
报告项目	n	%	n	%
均值	313^c/157	87.43/72.69	138/53	88.46/74.65
中位数	92/76	25.70/35.19	41/29	26.28/40.85
范围	53/13	14.80/6.02	21/6	13.46/8.45
达到划界值的组数或比例	32/4	8.94/1.85	12/3	7.69/4.23
计算依据的原分布	8/31	2.23/14.35	3/15	1.92/21.13

来源	统计量	n	M	SD	Me	范围	达到相应值的变量数量及比例
来源	统计量	n	M	SD	Me	范围	≥0.7	≥0.8	≥0.9
中文文献	r_WG均值	313	0.871	0.071	0.876	0.572~0.990	311 (99.36%)	265 (84.66%)	134 (42.81%)
	r_WG中位数	92	0.908	0.067	0.926	0.750~0.980	92 (100%)	84 (91.30%)	65 (70.65%)
JAP文献	r_WG均值	148	0.840	0.084	0.840	0.630~0.990	142 (95.95%)	102 (68.92%)	42 (28.38%)
	r_WG中位数	74	0.878	0.089	0.895	0.610~0.990	70 (94.59%)	61 (82.43%)	37 (50.00%)

来源	统计量	n	M	SD	Me	范围	达到相应值的变量数量及比例
来源	统计量	n	M	SD	Me	范围	≥0.7	≥0.8	≥0.9
中文文献	r_WG均值	313	0.871	0.071	0.876	0.572~0.990	311 (99.36%)	265 (84.66%)	134 (42.81%)
	r_WG中位数	92	0.908	0.067	0.926	0.750~0.980	92 (100%)	84 (91.30%)	65 (70.65%)
JAP文献	r_WG均值	148	0.840	0.084	0.840	0.630~0.990	142 (95.95%)	102 (68.92%)	42 (28.38%)
	r_WG中位数	74	0.878	0.089	0.895	0.610~0.990	70 (94.59%)	61 (82.43%)	37 (50.00%)

来源	n	M	SD	Me	范围	达到相应值的数量及比例
来源	n	M	SD	Me	范围	≥0.12	≥0.20	≥0.30	≥0.40
中文文献	336^a	0.276	0.141	0.250	0.011~0.790	304 (90.48%)	231 (68.75%)	127 (37.80%)	61 (18.15%)
JAP文献	247	0.241	0.157	0.210	0.010~0.851	195 (78.95%)	132 (53.44%)	69 (27.94%)	39 (15.79%)