ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2022, Vol. 30 ›› Issue (8): 1667-1681.doi: 10.3724/SP.J.1042.2022.01667

• 国内心理统计方法研究热点回顾 •    下一篇

新世纪20年国内假设检验及其关联问题的方法学研究

温忠麟1(), 谢晋艳1, 方杰2, 王一帆1   

  1. 1华南师范大学心理学院/心理应用研究中心, 广州 510631
    2广东财经大学新发展研究院/应用心理学系, 广州 510320
  • 收稿日期:2021-12-29 出版日期:2022-08-15 发布日期:2022-06-23
  • 通讯作者: 温忠麟 E-mail:wenzl@scnu.edu.cn
  • 基金资助:
    国家自然科学基金项目(32171091);国家社会科学基金项目(17BTJ035)

Methodological research on hypothesis test and related issues in China’s mainland from 2001 to 2020

WEN Zhonglin1(), XIE Jinyan1, FANG Jie2, WANG Yifan1   

  1. 1School of Psychology & Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, China
    2Institute of New Development & Department of Applied Psychology, Guangdong University of Finance & Economics, Guangzhou 510320, China
  • Received:2021-12-29 Online:2022-08-15 Published:2022-06-23
  • Contact: WEN Zhonglin E-mail:wenzl@scnu.edu.cn

摘要:

新世纪20年来国内假设检验方法学研究内容可分为如下几类: 零假设显著性检验的不足、p值的使用问题、心理学研究的可重复性问题、效应量、检验力、等效性检验、其他与假设检验关联的研究。零假设显著性检验已经发展成一套组合流程: 为了保证检验力和节省成本, 实验研究需要做先验检验力分析预估样本容量, 但问卷超过160人在传统统计中就没有必要这样做。当拒绝零假设时, 应当结合效应量做出结论。当不拒绝零假设时, 需要报告后验检验力; 如果效应量中或大而检验力不够高, 则可增加被试再行分析, 但这一过程应主动披露, 报告最后的实际p值并对可能犯的第一类错误率做出评估。

关键词: 假设检验, p值, 效应量, 检验力, 等效性检验

Abstract:

Hypothesis testing is an important part of inferential statistics. Most reported statistical test results are based on the null hypothesis significance test (NHST). In the first two decades of the 21st century, the studies on hypothesis testing and related topics in China’s mainland cover such topics as the deficiency of the null hypothesis significance test, use of P-value, repeatability of psychological research, effect size, power of a statistical test, and equivalence test, among others. This systematic review summarizes the main findings and gives suggestions.

NHST has a wide range of applications to a variety of fields, from mathematical statistics to psychology. In the past two decades, Chinese researchers have experienced a process from knowing, using, misunderstanding, understanding, and questioning it, to constantly proposing improvement methods. NHST still occupies an important position in scientific research, despite some shortcomings. When providing statistically significant results, it is recommended to offer precise P-values in order to better evaluate the type I error rate. When one wants to verify is equivalence (or zero effect), a better approach is to set an equivalent boundary value and put the equivalence hypothesis in the position of alternative hypothesis.

NHST has been developed into a set of procedures as follows: First, to ensure the power of a statistical test and save costs, one should do a priori power analysis before sampling, and calculate the required sample size. The only exception is questionnaire studies with more than 160 participants which usually do not need such priori power analysis in the traditional statistical analysis. Second, to collect and analyze data, and report NHST results and confidence intervals. Third, to calculate and report the effect size if the results are statistically significant (at this time only the Type Ⅰ error is possible), and draw conclusions based on the magnitude of the effect size. Fourth, to calculate the effect size if the results are not statistically significant (at this time only the Type Ⅱ error is possible), and accept the null hypothesis if the effect size is small. However, a posterior power analysis is required when the effect size is medium or large. If the test power is high, the null hypothesis will be accepted; if the test power is less than 80%, more participants could be added for further analysis. The process of increasing the sample size should be reported clearly, with the final P-value presented and the type I error rate evaluated.

Furthermore, the reproducibility crisis of psychological research is partly attributable to NHST. But the reproducibility of scientific research must be strictly defined. Although the failure to replicate a study may result from inaccurate operations and improper methods, it may also be caused by moderating effect. We can't judge the scientificity of a study simply by whether it is replicable.

There are three major aspects for expanding the research on the related issues of hypothesis testing. Firstly, the equivalence test has been extended to the evaluation of structural equation models. Second, the analysis of test power has been extended to models other than those in traditional statistics, such as mediation effect models and structural equation models. Third, the effect size has also been extended to models other than those in traditional statistics, and a new R2-type effect size was proposed by using variance decomposition.

Key words: hypothesis testing, p-value, effect size, power of statistical test, equivalence test

中图分类号: