ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2021, Vol. 29 ›› Issue (3): 381-393.doi: 10.3724/SP.J.1042.2021.00381

• ·研究方法· •    下一篇

解读不显著结果:基于500个实证研究的量化分析

王珺1, 宋琼雅1, 许岳培2,3, 贾彬彬4, 陆春雷5, 陈曦6, 戴紫旭7, 黄之玥8, 李振江9, 林景希10, 罗婉莹11, 施赛男12, 张莹莹13, 臧玉峰14, 左西年15, 胡传鹏16()   

  1. 1中山大学心理学系, 广州 510006
    2中国科学院行为科学重点实验室 (中国科学院心理研究所), 北京 100101
    3中国科学院大学心理学系, 北京 100049
    4上海体育学院心理学院, 上海 200438
    5浙江师范大学教师教育学院, 金华 321000
    6个人, 上海 200122
    7华南师范大学心理学院, 广州 510631
    8Tisch School of the Arts, New York University, New York 11201, the United States
    9苏州大学教育学院, 苏州 215123
    10黑龙江大学教育科学研究院, 哈尔滨 150080
    11北京大学心理与认知科学学院, 北京 100871
    12华东师范大学心理与认知科学学院, 上海 200063
    13西南大学心理学部, 重庆 400715
    14杭州师范大学认知与脑疾病研究中心, 杭州 311121
    15北京师范大学认知神经科学与学习国家重点实验室, 北京 100875
    16Leibniz Institute for Resilience Research, 55131 Mainz, Germany
  • 收稿日期:2020-07-14 出版日期:2021-03-15 发布日期:2021-01-26
  • 通讯作者: 胡传鹏 E-mail:hcp4715@hotmail.com

Interpreting nonsignificant results: A quantitative investigation based on 500 Chinese psychological research

WANG Jun1, SONG Qiongya1, XU Yuepei2,3, JIA Binbin4, LU Chunlei5, CHEN Xi6, DAI Zixu7, HUANG Zhiyue8, LI Zhenjiang9, LIN Jingxi10, LUO Wanying11, SHI Sainan12, ZHANG Yingying13, ZANG Yufeng14, ZUO Xi-Nian15, HU Chuanpeng16()   

  1. 1Department of Psychology, Sun Yat-Sen University, Guangzhou 510006, China
    2Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
    3Department of Psychology, Chinese Academy of Sciences, Beijing 100049, China
    4School of Psychology, Shanghai University of Sport, Shanghai 200438, China
    5College of Teacher Education, Zhejiang Normal University, Jinhua 321000, China
    6Person, Shanghai 200122, China
    7School of Psychology, South China Normal University, Guangzhou 510631, China
    8Tisch School of the Arts, New York University, New York 11201, the United States
    9School of Education, Soochow University, Suzhou 215123, China
    10Institute of Education Science, Heilongjiang University, Harbin 150080, China
    11School of Psychology and Cognitive Sciences, Peking University, Beijing 100871, China
    12School of Psychology and Cognitive Sciences, East China Normal University, Shanghai 200063, China
    13Faculty of Psychology, Southwest University, Chongqing 400715, China
    14Center for Cognition and Brain Disorders, Hangzhou Normal University, Hangzhou 311121, China
    15National Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing 100875, China
    16Leibniz Institute for Resilience Research, 55131 Mainz, Germany
  • Received:2020-07-14 Online:2021-03-15 Published:2021-01-26
  • Contact: HU Chuanpeng E-mail:hcp4715@hotmail.com

摘要:

不显著结果(如, p > 0.05)在心理学研究中十分常见, 且容易被误解为接受零假设的证据, 并可能导致分组匹配研究的错误推断或者忽视被小样本的不显著结果掩盖的真实效应。但国内目前尚无实证研究对不显著结果的普遍性及其解读进行调查。本研究调查500篇中文心理学实证研究, 统计其摘要中出现与不显著结果相关的阴性陈述的频率, 判断并统计基于阴性陈述的推断准确性, 并使用贝叶斯因子对不显著结果中包含t值的研究进行重新评估。结果表明, 36%的摘要提及不显著结果, 共包含236个阴性陈述。其中, 41%的阴性陈述对不显著结果的解读出现偏差(如, 解读为支持了零假设)。对包含t值的研究进行贝叶斯因子分析, 结果显示仅有5.1%的不显著结果可以提供强证据支持零假设(BF01 > 10)。与先前对国际心理学期刊的调查结果相比(32%的摘要包含阴性陈述; 72%的阴性陈述对不显著结果的解读错误), 中文心理学期刊中报告不显著结果的比例更高, 且对不显著结果解读错误的比例更低。但国内研究者仍需进一步加强对不显著结果的认识, 推广适于评估不显著结果的统计方法。

关键词: 不显著结果, 零假设显著性检验, 贝叶斯因子, 元研究

Abstract:

Background: P-value is the most widely used statistical index for inference in science. A p-value greater than 0.05, i.e., nonsignificant results, however, cannot distinguish the two following cases: the absence of evidence or the evidence of absence. Unfortunately, researchers in psychological science may not be able to interpret p-values correctly, resulting in wrong inference. For example, Aczel et al (2018), after surveying 412 empirical studies published in Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science, found that about 72% of nonsignificant results were misinterpreted as evidence in favor of the null hypothesis. Misinterpretations of nonsignificant results may lead to severe consequences. One such consequence is missing potentially meaningful effects. Also, in matched-group clinical trials, misinterpretations of nonsignificant results may lead to false “matched” groups, thus threatening the validity of interventions. So far, how nonsignificant results are interpreted in Chinese psychological literature is unknown. Here we surveyed 500 empirical papers published in five mainstream Chinese psychological journals, to address the following questions: (1) how often are nonsignificant results reported; (2) how do researchers interpret nonsignificant results in these published studies; (3) if researchers interpreted nonsignificant as “evidence for absence,” do empirical data provide enough evidence for null effects? 
Method: Based on our pre-registration (https://osf.io/czx6f), we first randomly selected 500 empirical papers from all papers published in 2017 and 2018 in five mainstream Chinese psychological journals (Acta Psychologica Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). Second, we screened abstracts of these selected articles to check whether they contain negative statements. For those studies which contain negative statements in their abstracts, we searched nonsignificant statistics in their results and checked whether the corresponding interpretations were correct. More specifically, all those statements were classified into four categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on available t values and sample sizes associated with those nonsignificant results. The Bayes factors can help us to estimate to what extent those results provided evidence for the absence of effects (i.e., the way researchers incorrectly interpreted nonsignificant results). 
Results: Our survey revealed that: (1) out of 500 empirical papers, 36% of their abstracts (n = 180) contained negative statements; (2) there are 236 negative statements associated with nonsignificant statistics in those selected studies, and 41% of these 236 negative statements misinterpreted nonsignificant results, i.e., the authors inferred that the results provided evidence for the absence of effects; (3) Bayes factor analyses based on available t-values and sample sizes found that only 5.1% (n = 2) nonsignificant results could provide strong evidence for the absence of effects (BF01 > 10). Compared with the results from Aczel et al (2019), we found that empirical papers published in Chinese journals contain more negative statements (36% vs. 32%), and researchers made fewer misinterpretations of nonsignificant results (41% vs. 72%). It worth noting, however, that there exists a categorization of ambiguous interpretations of nonsignificant results in the Chinese context. More specifically, many statements corresponding to nonsignificant results were “there is no significant difference between condition A and condition B”. These statements can be understood either as “the difference is not statistically significant”, which is correct, or “there is no difference”, which is incorrect. The percentage of misinterpretations of nonsignificant results raised to 64% if we adopt the second way to understand these statements, in contrast to 41% if we used the first understanding.
Conclusion: Our results suggest that Chinese researchers need to improve their understanding of nonsignificant results and use more appropriate statistical methods to extract information from nonsignificant results. Also, more precise wordings should be used in the Chinese context.


Key words: nonsignificant results, null-hypothesis significance testing, Bayes factors, meta-research

中图分类号: