ISSN 1671-3710
CN 11-4766/R

Advances in Psychological Science ›› 2022, Vol. 30 ›› Issue (8): 1667-1681.doi: 10.3724/SP.J.1042.2022.01667

• Section of Research Methods •     Next Articles

Methodological research on hypothesis test and related issues in China’s mainland from 2001 to 2020

WEN Zhonglin1(), XIE Jinyan1, FANG Jie2, WANG Yifan1   

  1. 1School of Psychology & Center for Studies of Psychological Application, South China Normal University, Guangzhou 510631, China
    2Institute of New Development & Department of Applied Psychology, Guangdong University of Finance & Economics, Guangzhou 510320, China
  • Received:2021-12-29 Online:2022-08-15 Published:2022-06-23
  • Contact: WEN Zhonglin


Hypothesis testing is an important part of inferential statistics. Most reported statistical test results are based on the null hypothesis significance test (NHST). In the first two decades of the 21st century, the studies on hypothesis testing and related topics in China’s mainland cover such topics as the deficiency of the null hypothesis significance test, use of P-value, repeatability of psychological research, effect size, power of a statistical test, and equivalence test, among others. This systematic review summarizes the main findings and gives suggestions.

NHST has a wide range of applications to a variety of fields, from mathematical statistics to psychology. In the past two decades, Chinese researchers have experienced a process from knowing, using, misunderstanding, understanding, and questioning it, to constantly proposing improvement methods. NHST still occupies an important position in scientific research, despite some shortcomings. When providing statistically significant results, it is recommended to offer precise P-values in order to better evaluate the type I error rate. When one wants to verify is equivalence (or zero effect), a better approach is to set an equivalent boundary value and put the equivalence hypothesis in the position of alternative hypothesis.

NHST has been developed into a set of procedures as follows: First, to ensure the power of a statistical test and save costs, one should do a priori power analysis before sampling, and calculate the required sample size. The only exception is questionnaire studies with more than 160 participants which usually do not need such priori power analysis in the traditional statistical analysis. Second, to collect and analyze data, and report NHST results and confidence intervals. Third, to calculate and report the effect size if the results are statistically significant (at this time only the Type Ⅰ error is possible), and draw conclusions based on the magnitude of the effect size. Fourth, to calculate the effect size if the results are not statistically significant (at this time only the Type Ⅱ error is possible), and accept the null hypothesis if the effect size is small. However, a posterior power analysis is required when the effect size is medium or large. If the test power is high, the null hypothesis will be accepted; if the test power is less than 80%, more participants could be added for further analysis. The process of increasing the sample size should be reported clearly, with the final P-value presented and the type I error rate evaluated.

Furthermore, the reproducibility crisis of psychological research is partly attributable to NHST. But the reproducibility of scientific research must be strictly defined. Although the failure to replicate a study may result from inaccurate operations and improper methods, it may also be caused by moderating effect. We can't judge the scientificity of a study simply by whether it is replicable.

There are three major aspects for expanding the research on the related issues of hypothesis testing. Firstly, the equivalence test has been extended to the evaluation of structural equation models. Second, the analysis of test power has been extended to models other than those in traditional statistics, such as mediation effect models and structural equation models. Third, the effect size has also been extended to models other than those in traditional statistics, and a new R2-type effect size was proposed by using variance decomposition.

Key words: hypothesis testing, p-value, effect size, power of statistical test, equivalence test

CLC Number: