解读不显著结果：基于500个实证研究的量化分析

doi:10.3724/SP.J.1042.2021.00381

心理科学进展 ›› 2021, Vol. 29 ›› Issue (3): 381-393.doi: 10.3724/SP.J.1042.2021.00381

• ·研究方法· • 下一篇

解读不显著结果：基于500个实证研究的量化分析

王珺¹, 宋琼雅¹, 许岳培²^,³, 贾彬彬⁴, 陆春雷⁵, 陈曦⁶, 戴紫旭⁷, 黄之玥⁸, 李振江⁹, 林景希¹⁰, 罗婉莹¹¹, 施赛男¹², 张莹莹¹³, 臧玉峰¹⁴, 左西年¹⁵, 胡传鹏¹⁶()

¹中山大学心理学系, 广州 510006
²中国科学院行为科学重点实验室 (中国科学院心理研究所), 北京 100101
³中国科学院大学心理学系, 北京 100049
⁴上海体育学院心理学院, 上海 200438
⁵浙江师范大学教师教育学院, 金华 321000
⁶个人, 上海 200122
⁷华南师范大学心理学院, 广州 510631
⁸Tisch School of the Arts, New York University, New York 11201, the United States
⁹苏州大学教育学院, 苏州 215123
¹⁰黑龙江大学教育科学研究院, 哈尔滨 150080
¹¹北京大学心理与认知科学学院, 北京 100871
¹²华东师范大学心理与认知科学学院, 上海 200063
¹³西南大学心理学部, 重庆 400715
¹⁴杭州师范大学认知与脑疾病研究中心, 杭州 311121
¹⁵北京师范大学认知神经科学与学习国家重点实验室, 北京 100875
¹⁶Leibniz Institute for Resilience Research, 55131 Mainz, Germany

收稿日期:2020-07-14 出版日期:2021-03-15 发布日期:2021-01-26
通讯作者: 胡传鹏 E-mail:hcp4715@hotmail.com

Interpreting nonsignificant results: A quantitative investigation based on 500 Chinese psychological research

WANG Jun¹, SONG Qiongya¹, XU Yuepei²^,³, JIA Binbin⁴, LU Chunlei⁵, CHEN Xi⁶, DAI Zixu⁷, HUANG Zhiyue⁸, LI Zhenjiang⁹, LIN Jingxi¹⁰, LUO Wanying¹¹, SHI Sainan¹², ZHANG Yingying¹³, ZANG Yufeng¹⁴, ZUO Xi-Nian¹⁵, HU Chuanpeng¹⁶()

¹Department of Psychology, Sun Yat-Sen University, Guangzhou 510006, China
²Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
³Department of Psychology, Chinese Academy of Sciences, Beijing 100049, China
⁴School of Psychology, Shanghai University of Sport, Shanghai 200438, China
⁵College of Teacher Education, Zhejiang Normal University, Jinhua 321000, China
⁶Person, Shanghai 200122, China
⁷School of Psychology, South China Normal University, Guangzhou 510631, China
⁸Tisch School of the Arts, New York University, New York 11201, the United States
⁹School of Education, Soochow University, Suzhou 215123, China
¹⁰Institute of Education Science, Heilongjiang University, Harbin 150080, China
¹¹School of Psychology and Cognitive Sciences, Peking University, Beijing 100871, China
¹²School of Psychology and Cognitive Sciences, East China Normal University, Shanghai 200063, China
¹³Faculty of Psychology, Southwest University, Chongqing 400715, China
¹⁴Center for Cognition and Brain Disorders, Hangzhou Normal University, Hangzhou 311121, China
¹⁵National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China
¹⁶Leibniz Institute for Resilience Research, 55131 Mainz, Germany

Received:2020-07-14 Online:2021-03-15 Published:2021-01-26
Contact: HU Chuanpeng E-mail:hcp4715@hotmail.com

摘要/Abstract

摘要：

不显著结果(如, p > 0.05)在心理学研究中十分常见, 且容易被误解为接受零假设的证据, 并可能导致分组匹配研究的错误推断或者忽视被小样本的不显著结果掩盖的真实效应。但国内目前尚无实证研究对不显著结果的普遍性及其解读进行调查。本研究调查500篇中文心理学实证研究, 统计其摘要中出现与不显著结果相关的阴性陈述的频率, 判断并统计基于阴性陈述的推断准确性, 并使用贝叶斯因子对不显著结果中包含t值的研究进行重新评估。结果表明, 36%的摘要提及不显著结果, 共包含236个阴性陈述。其中, 41%的阴性陈述对不显著结果的解读出现偏差(如, 解读为支持了零假设)。对包含t值的研究进行贝叶斯因子分析, 结果显示仅有5.1%的不显著结果可以提供强证据支持零假设(BF₀₁ > 10)。与先前对国际心理学期刊的调查结果相比(32%的摘要包含阴性陈述; 72%的阴性陈述对不显著结果的解读错误), 中文心理学期刊中报告不显著结果的比例更高, 且对不显著结果解读错误的比例更低。但国内研究者仍需进一步加强对不显著结果的认识, 推广适于评估不显著结果的统计方法。

关键词: 不显著结果, 零假设显著性检验, 贝叶斯因子, 元研究

Abstract:

Background: P-value is the most widely used statistical index for inference in science. A p-value greater than 0.05, i.e., nonsignificant results, however, cannot distinguish the two following cases: the absence of evidence or the evidence of absence. Unfortunately, researchers in psychological science may not be able to interpret p-values correctly, resulting in wrong inference. For example, Aczel et al (2018), after surveying 412 empirical studies published in Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science, found that about 72% of nonsignificant results were misinterpreted as evidence in favor of the null hypothesis. Misinterpretations of nonsignificant results may lead to severe consequences. One such consequence is missing potentially meaningful effects. Also, in matched-group clinical trials, misinterpretations of nonsignificant results may lead to false “matched” groups, thus threatening the validity of interventions. So far, how nonsignificant results are interpreted in Chinese psychological literature is unknown. Here we surveyed 500 empirical papers published in five mainstream Chinese psychological journals, to address the following questions: (1) how often are nonsignificant results reported; (2) how do researchers interpret nonsignificant results in these published studies; (3) if researchers interpreted nonsignificant as “evidence for absence,” do empirical data provide enough evidence for null effects?
Method: Based on our pre-registration (https://osf.io/czx6f), we first randomly selected 500 empirical papers from all papers published in 2017 and 2018 in five mainstream Chinese psychological journals (Acta Psychologica Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). Second, we screened abstracts of these selected articles to check whether they contain negative statements. For those studies which contain negative statements in their abstracts, we searched nonsignificant statistics in their results and checked whether the corresponding interpretations were correct. More specifically, all those statements were classified into four categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on available t values and sample sizes associated with those nonsignificant results. The Bayes factors can help us to estimate to what extent those results provided evidence for the absence of effects (i.e., the way researchers incorrectly interpreted nonsignificant results).
Results: Our survey revealed that: (1) out of 500 empirical papers, 36% of their abstracts (n = 180) contained negative statements; (2) there are 236 negative statements associated with nonsignificant statistics in those selected studies, and 41% of these 236 negative statements misinterpreted nonsignificant results, i.e., the authors inferred that the results provided evidence for the absence of effects; (3) Bayes factor analyses based on available t-values and sample sizes found that only 5.1% (n = 2) nonsignificant results could provide strong evidence for the absence of effects (BF01 > 10). Compared with the results from Aczel et al (2019), we found that empirical papers published in Chinese journals contain more negative statements (36% vs. 32%), and researchers made fewer misinterpretations of nonsignificant results (41% vs. 72%). It worth noting, however, that there exists a categorization of ambiguous interpretations of nonsignificant results in the Chinese context. More specifically, many statements corresponding to nonsignificant results were “there is no significant difference between condition A and condition B”. These statements can be understood either as “the difference is not statistically significant”, which is correct, or “there is no difference”, which is incorrect. The percentage of misinterpretations of nonsignificant results raised to 64% if we adopt the second way to understand these statements, in contrast to 41% if we used the first understanding.
Conclusion: Our results suggest that Chinese researchers need to improve their understanding of nonsignificant results and use more appropriate statistical methods to extract information from nonsignificant results. Also, more precise wordings should be used in the Chinese context.

Key words: nonsignificant results, null-hypothesis significance testing, Bayes factors, meta-research

中图分类号:

B841

王珺, 宋琼雅, 许岳培, 贾彬彬, 陆春雷, 陈曦, 戴紫旭, 黄之玥, 李振江, 林景希, 罗婉莹, 施赛男, 张莹莹, 臧玉峰, 左西年, 胡传鹏. (2021). 解读不显著结果：基于500个实证研究的量化分析. 心理科学进展 , 29(3), 381-393.

WANG Jun, SONG Qiongya, XU Yuepei, JIA Binbin, LU Chunlei, CHEN Xi, DAI Zixu, HUANG Zhiyue, LI Zhenjiang, LIN Jingxi, LUO Wanying, SHI Sainan, ZHANG Yingying, ZANG Yufeng, ZUO Xi-Nian, HU Chuanpeng. (2021). Interpreting nonsignificant results: A quantitative investigation based on 500 Chinese psychological research. Advances in Psychological Science, 29(3), 381-393.

图/表 4

参考文献 71

[1]	程开明, 李泗娥. (2019). 科学研究中的P值: 误解、操纵及改进. 数量经济技术经济研究, (7), 117-136. doi: 10.13653/j.cnki.jqte.2019.07.007
[2]	崔诣晨, 王沛, 崔亚娟. (2019). 知觉冲突印象形成的认知控制策略: 以刻板化信息与反刻板化信息为例. 心理学报, 51(10), 1157-1170. doi: 10.3724/SP.J.1041.2019.01157
[3]	郝丽, 刘乐平, 申亚飞. (2016). 统计显著性: 一个被误读的P值. 统计与信息论坛, 31(12), 3-10.
[4]	胡传鹏, 孔祥祯, Eric-Jan, Wagenmakers, Alexander, Ly, 彭凯平. (2018). 贝叶斯因子及其在JASP中的实现. 心理科学进展, 26(6), 951-965. doi: 10.3724/SP.J.1042.2018. 00951
[5]	胡传鹏, 王非, 过继成思, 宋梦迪, 隋洁, 彭凯平. (2016). 心理学研究中的可重复性问题: 从危机到契机. 心理科学进展, 24(9), 1504-1518. doi: 10.3724/SP.J.1042.2016. 01504 doi: 10.3724/SP.J.1042.2016.01504 URL
[6]	陆春雷, 王珺, 宋琼雅, 贾彬彬, 许岳培, 胡传鹏. (2020). 从不显著结果中提取信息的方法: 原理及其实现. 2020-10-21取自 www.chinaxiv.org/abs/202001.00113
[7]	骆大森. (2017). 心理学可重复性危机两种根源的评估. 心理与行为研究, 15(5), 577-586.
[8]	吕小康. (2012). Fisher与Neyman-Pearson的分歧与心理统计中的假设检验争议. 心理科学, 35(6), 1502-1506. doi: 10.16719/j.cnki.1671-6981.2012.06.042
[9]	吕小康. (2014). 从工具到范式: 假设检验争议的知识社会学反思. 社会, 34(6), 216-236. doi: 10.15992/j.cnki.31- 1123/c.2014.06.011
[10]	卢淑华. (2009). 社会统计学(第四版). 北京: 北京大学出版社.
[11]	谢书书, 张积家, 朱君. (2019). 颜色范畴知觉效应发生在大脑两半球: 来自纳西族和汉族的证据. 心理学报, 51(11), 1229-1243. doi: 10.3724/SP.J.1041.2019.01229
[12]	张厚粲, 徐建平. (2015). 现代心理与教育统计学(第四版). 北京: 北京师范大学出版社.
[13]	仲晓波. (2016). 关于假设检验的争议: 问题的澄清与解决. 心理科学进展, 24(10), 1670-1676. doi: 10.3724/SP.J. 1042.2016.01670 doi: 10.3724/SP.J.1042.2016.01670 URL
[14]	Aczel, B., Palfi, B., & Szaszi, B. (2017). Estimating the evidential value of significant results in psychological science. PloS One, 12(8), e0182651. doi: 10.1371/journal. pone.0182651 URL pmid: 28859154
[15]	Aczel, B., Palfi, B., Szollosi, A., Kovacs, M., Szaszi, B., Szecsi, P., … Wagenmakers, E. -J. (2018). Quantifying support for the null hypothesis in psychology: An empirical investigation. Advances in Methods and Practices in Psychological Science, 1(3), 357-366. doi: 10.1177/ 251524591877374 doi: 10.1177/2515245918773742 URL
[16]	Algermissen, J., & Mehler, D. M. (2018). May the power be with you: Are there highly powered studies in neuroscience, and how can we get more of them? Journal of Neurophysiology, 119(6), 2114-2117. doi: 10.1152/jn.00765. 2017 doi: 10.1152/jn.00765.2017 URL pmid: 29465324
[17]	American Psychological Association. (2010). Publication Manual of the American Psychological Association. Washington DC: American Psychological Association.
[18]	Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567, 305-307. doi: 10.1038/d41586-019-00857-9 doi: 10.1038/d41586-019-00857-9 URL pmid: 30894741
[19]	Baker, M. (2016). 1, 500 scientists lift the lid on reproducibility. Nature, 553, 452-454. doi: 10.1038/533452a
[20]	Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376. doi: 10.1038/nrn3475 URL pmid: 23571845
[21]	Cassidy, S. A., Dimova, R., Giguère, B., Spence, J. R., & Stanley, D. J. (2019). Failing grade: 89% of introduction- to-psychology textbooks that define or explain statistical significance do so incorrectly. Advances in Methods and Practices in Psychological Science, 2(3), 233-239. doi: 10. 1177/2515245919858072 doi: 10.1177/2515245919858072 URL
[22]	Chen, X., Lu, B., & Yan, C. -G. (2018). Reproducibility of R-fMRI metrics on the impact of different strategies for multiple comparison correction and sample sizes. Human Brain Mapping, 39(1), 300-318. doi: 10.1101/128645 doi: 10.1002/hbm.23843 URL pmid: 29024299
[23]	Chuard, P. J., Vrtílek, M., Head, M. L., & Jennions, M. D. (2019). Evidence that nonsignificant results are sometimes preferred: Reverse P-hacking or selective reporting? PLoS Biology, 17(1), e3000127. doi: 10.1371/journal.pbio.3000127 doi: 10.1371/journal.pbio.3000127 URL pmid: 30682013
[24]	Dienes, Z. (2014). Using Bayes to get the most out of non- significant results. Frontiers in Psychology, 5, 781. doi: 10.3389/fpsyg.2014.00781 doi: 10.3389/fpsyg.2014.00781 URL pmid: 25120503
[25]	Dienes, Z. (2016). How Bayes factors change scientific practice. Journal of Mathematical Psychology, 72, 78-89. doi: 10.1016/j.jmp.2015.10.003 doi: 10.1016/j.jmp.2015.10.003 URL
[26]	Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70(3), 193-242. doi: 10.1037/h0044139 doi: 10.1037/h0044139 URL
[27]	Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891-904. doi: 10.1007/s11192-011-0494-7 doi: 10.1007/s11192-011-0494-7 URL
[28]	Fiedler, K., Kutzner, F., & Krueger, J. I. (2012). The long way from α-error control to validity proper. Perspectives on Psychological Science, 7(6), 661-669. doi: 10.1177/ 1745691612462587 URL pmid: 26168128
[29]	Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. doi: 10.1037/h0031619 doi: 10.1037/h0031619 URL
[30]	Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345(6203), 1502-1505. doi: 10.1126/science. 1255484 doi: 10.1126/science.1255484 URL pmid: 25170047
[31]	Gamer, M., Lemon, J., & Singh, I. F. P.(2019). irr: Various coefficients of interrater reliability and agreement (R package version 0.84.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=irr.
[32]	Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. SAGE Publications, Inc. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences(pp. 392-410). Thousand Oaks, CA: SAGE Publications, Inc.
[33]	Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. doi: 10.1007/s10654-016-0149-3 URL pmid: 27209009
[34]	Gronau, Q. F., Ly, A., & Wagenmakers, E. -J. (2019). Informed bayesian t-tests. The American Statistician, 1-14. doi: 10. 1080/00031305.2018.1562983
[35]	Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13(3). doi: 10.1371/journal.pbio. 1002106 URL pmid: 25761097
[36]	Hoekstra, R., Monden, R., van Ravenzwaaij, D., & Wagenmakers, E. -J. (2018). Bayesian reanalysis of null results reported in medicine: Strong yet variable evidence for the absence of treatment effects. PloS One, 13(4), e0195474. doi: 10.1371/journal.pone.0195474 doi: 10.1371/journal.pone.0196682 URL pmid: 29709011
[37]	Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. doi: 10. 1371/journal.pmed.0020124 URL pmid: 16060722
[38]	Jeffreys, H. (1961). Theory of probability. Oxford, UK: Oxford University Press.
[39]	Jia, X. -Z., Zhao, N., Barton, B., Burciu, R., Carriere, N., Cerasa, A., … Zang, Y. -F. (2018). Small effect size leads to reproducibility failure in resting-state fMRI studies. Retrieved October 21, 2020, from https://www.biorxiv. org/content/10.1101/285171v1
[40]	Kendall, M. G., & Gibbons, J. D. (1990). Rank correlation methods (5th ed.). London, England: Edward Arnold.
[41]	Klein, R. A., Ratliff, K. A., Vianello, M., Adams Jr, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142-152. doi: 10.1027/1864-9335/a000178 doi: 10.1027/1864-9335/a000178 URL
[42]	Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6(3), 299-312. doi: 10.1177/ 1745691611406925 doi: 10.1177/1745691611406925 URL
[43]	Kruschke, J. K., & Liddell, T. M. (2018). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin and Review, 25(1), 178-206. doi: 10.3758/ s13423-016-1221-4 URL pmid: 28176294
[44]	Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PloS One, 9(9), e105825. doi: 10.1371/journal.pone.0105825 doi: 10.1371/journal.pone.0105825 URL pmid: 25192357
[45]	Lakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2018). Improving inferences about null effects with Bayes factors and equivalence tests. The Journals of Gerontology: Series B. Advance online publication. http:// doi/10.1093/geronb/gby065
[46]	Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259-269. doi: 10.1177/2515245918770963 doi: 10.1177/2515245918770963 URL
[47]	Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. doi: 10.2307/2529310 URL pmid: 843571
[48]	Lee, M. D., & Wagenmakers, E. -J.(2014) Bayesian cognitive modeling: A practical course Cambridge, England: Cambridge University Press A practical course . Cambridge, England: Cambridge University Press.
[49]	Ly, A., Raj, A., Etz, A., Marsman, M., Gronau, Q. F., & Wagenmakers, E. -J. (2018). Bayesian reanalyses from summary statistics: A guide for academic consumers. Advances in Methods and Practices in Psychological Science, 1(3), 367-374.doi: 10.1177/2515245918779348
[50]	Ly, A., Verhagen, J., & Wagenmakers, E. -J. (2016a). An evaluation of alternative methods for testing hypotheses, from the perspective of Harold Jeffreys. Journal of Mathematical Psychology, 72, 43-55. doi: 10.1016/j.jmp. 2016.01.003
[51]	Ly, A., Verhagen, J., & Wagenmakers, E. -J. (2016b). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72, 19-32. doi: 10.1016/j.jmp. 2015.06.004
[52]	Lyu, X. -K., Xu, Y. P., Zhao, X. -F., Zuo, X. -N., & Hu, C. -P. (2020). Beyond psychology: The prevalence of misinterpretation of p-value and confidence intervals across different fields. Journal of Pacific Rim Psychology, 14, e6. doi: 10.1017/prp.2019.28
[53]	Lyu, Z. Y., Peng, K. P., & Hu, C. -P. (2018). P-value, confidence intervals and statistical inference: A new dataset of misinterpretation. Frontiers in Psychology, 9, 868. doi: 10.3389/fpsyg.2018.00868 URL pmid: 29937743
[54]	McElreath, R. (2018). Statistical rethinking: A Bayesian course with examples in R and Stan. Virginia Beach, VA: Chapman and Hall/CRC Press.
[55]	Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103-115. doi: 10.1086/288135
[56]	Miller, G. (2011). ESP paper rekindles discussion about statistics. Science, 331(6015), 272-273. doi: 10.1126/ science.331.6015.272 URL pmid: 21252321
[57]	Morey, R. D., Rouder, J. N., & Jamil, T. (2015). BayesFactor: Computation of Bayes factors for common designs (Version 0.9.12-2) [Computer software]. Retrieved from https://cran. r-project.org/web/packages/BayesFactor/BayesFactor. pdf.
[58]	Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241-301. doi: 10.1037/1082-989X.5.2.241 URL pmid: 10937333
[59]	Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 943-950. doi: 10.1126/science.aac4716
[60]	Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553. doi: 10.1037/0033-2909.113.3.553 URL pmid: 8316613
[61]	Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review, 16(2), 225-237. doi: 10.3758/PBR.16.2.225 URL pmid: 19293088
[62]	Schäfer, T., & Schwarz, M. (2019). The meaningfulness of effect sizes in psychological research: Differences between sub-disciplines and the impact of potential biases. Frontiers in Psychology, 10, 813. doi: 10.3389/fpsyg.2019.00813
[63]	Schönbrodt, F. (2015). Grades of evidence - A cheat sheet [Web log post]. Retrieved from http://www.nicebread.de/ grades-of-evidence-a-cheat-sheet/.
[64]	Signorell, A. (2017). DescTools: Tools for descriptive statistics (Version 0.99.22) [Computer software]. Retrieved from . https://cran.r-project.org/web/packages/DescTools/index.html.
[65]	Stussi, Y., Pourtois, G., & Sander, D. (2018). Enhanced pavlovian aversive conditioning to positive emotional stimuli. Journal of Experimental Psychology: General, 147(6), 905-923. doi: 10.1037/xge0000424
[66]	van Doorn, J., Ly, A., Marsman, M., & Wagenmakers, E. -J. (2018). Bayesian inference for Kendall’s rank correlation coefficient. The American Statistician, 72(4), 303-308. doi: 10.1080/00031305.2016.1264998
[67]	Wagenmakers, E. -J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., … Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example Applications with JASP. Psychonomic Bulletin and Review, 25(1), 58-76. doi: 10. 3758/s13423-017-1323-7 URL pmid: 28685272
[68]	Wagenmakers, E. -J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: the case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-432. doi: 10.1037/a0022790 URL pmid: 21280965
[69]	Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133. doi: 10.1080/ 00031305.2016.1154108
[70]	Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E. -J. (2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Perspectives on Psychological Science, 6(3), 291-298. doi: 10.1177/1745691611406923 URL pmid: 26168519
[71]	Ziliak, S. T., & McCloskey, D. N.(2008). The cult of statistical significance. Ann Arbor: University of Michigan Press.

类别	分类标准	示例
基于频率主义的正确解读	根据NHST的逻辑对不显著结果进行解读, 即仅说明其结果无法拒绝零假设, 或无法支持备择假设。	结果表明没有证据支持干预组和控制组有(显著)差异。
基于频率主义的错误解读 ——推广至总体	将不显著结果解读为支持了研究中样本所在总体水平上的零假设。	结果表明干预没有效果。
基于频率主义的错误解读 ——基于当前样本	将不显著结果解读为支持了研究中样本中的零假设。	结果表明干预组和控制组之间没有差异。
基于贝叶斯因子的解读	利用贝叶斯因子支持零假设而非备择假设。	BF₀₁> 10, 表明有强的证据支持零假设。
难以判断	由于阴性陈述的语言措辞, 对其类别难以做出明确判断。	除恐惧情绪外, 基本表情的强度越大, 被试对表情的识别越好。

类别	分类标准	示例
基于频率主义的正确解读	根据NHST的逻辑对不显著结果进行解读, 即仅说明其结果无法拒绝零假设, 或无法支持备择假设。	结果表明没有证据支持干预组和控制组有(显著)差异。
基于频率主义的错误解读 ——推广至总体	将不显著结果解读为支持了研究中样本所在总体水平上的零假设。	结果表明干预没有效果。
基于频率主义的错误解读 ——基于当前样本	将不显著结果解读为支持了研究中样本中的零假设。	结果表明干预组和控制组之间没有差异。
基于贝叶斯因子的解读	利用贝叶斯因子支持零假设而非备择假设。	BF₀₁> 10, 表明有强的证据支持零假设。
难以判断	由于阴性陈述的语言措辞, 对其类别难以做出明确判断。	除恐惧情绪外, 基本表情的强度越大, 被试对表情的识别越好。

解读不显著结果：基于500个实证研究的量化分析

Interpreting nonsignificant results: A quantitative investigation based on 500 Chinese psychological research

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 4

参考文献 71

相关文章 2

编辑推荐

Metrics

本文评价

[1]	朱训, 顾昕. 变量相对重要性评估的方法选择及应用[J]. 心理科学进展, 2023, 31(1): 145-158.
[2]	胡传鹏, 孔祥祯, Eric-Jan Wagenmakers, Alexander Ly, 彭凯平. 贝叶斯因子及其在JASP中的实现[J]. 心理科学进展, 2018, 26(6): 951-965.