ISSN 1671-3710
CN 11-4766/R
主办:中国科学院心理研究所
出版:科学出版社

心理科学进展 ›› 2013, Vol. 21 ›› Issue (10): 1874-1882.doi: 10.3724/SP.J.1042.2013.01874

• 研究方法 • 上一篇    下一篇

全基因组关联研究中的多重校正方法比较

黄杨岳;孔祥祯;甄宗雷;刘嘉   

  1. (北京师范大学认知神经科学与学习国家重点实验室, 北京 100875)
  • 收稿日期:2013-02-01 出版日期:2013-10-15 发布日期:2013-10-15
  • 通讯作者: 甄宗雷
  • 基金资助:

    国家自然科学基金项目(30800295, 31230031, 91132703和31221003), 国家重点基础研究发展计划(2010CB833903)。

The Comparison of Multiple Testing Corrections Methods in Genome-Wide Association Studies

HUANG Yangyue;KONG Xiangzhen;ZHEN Zonglei;LIU Jia   

  1. (State Key Laboratory of Cognitive Science and Learning, Beijing Normal University, Beijing 100875, China)
  • Received:2013-02-01 Online:2013-10-15 Published:2013-10-15
  • Contact: ZHEN Zonglei

摘要: 全基因组关联研究(Genome-Wide Association Studies, GWAS)可以直接研究人类行为能力和基因型间的关联, 为心理学研究者从全基因组层次探索人类行为能力的遗传基础提供了新的手段。GWAS中涉及大量位点和行为的关联检验, 所以必须采用多重校正来控制整体虚报。尽管存在多种校正方法可供选择, 但GWAS研究中不同校正方法的适用性, 目前尚缺少系统研究, 使得GWAS中多重校正方法的选择缺少理论和经验依据。GWAS中常用的校正方法有基于族错误率(Family-Wise Error Rate, FWER)标准的Bonferroni校正法, Holm递减调整法, 排列检验法和基于错误发现率(False Discovery Rate, FDR)标准的BH法。对这4种多重校正方法的原理和流程进行了详细阐述; 提出了一种GWAS数据仿真方法, 并基于仿真数据对不同多重校正方法进行了定量比较。结果显示, 前3种基于FWER的方法差别很小, 它们对虚报的控制最为严格, 但是检测出的真实关联的位点数却显著低于基于FDR的BH法。独立数据上, BH法所报告的SNPs对行为具有最高的解释率, 即相对于其它方法, BH方法更好的平衡了虚报和击中。未来研究中可以考虑用BH法来对结果进行校正。

关键词: 全基因组关联研究, 多重校正, FWER, FDR, 仿真

Abstract: Genome-Wide Association Studies (GWAS) can reveal the genetic basis of the behavior. However, the association analysis embodies a massive multiple testing problem, where millions of SNPs (Single Nucleotide Polymorphisms) are tested. It is vital to reduce the risk of false positive in multiple testing with an appropriate corrections method. Firstly, Family-Wise Error Rate (FWER) and False Discovery Rate (FDR), the two standard measures of Type I errors in multiple testing were introduced. Secondly, three FWER (i.e., Bonferroni, Holm Step–Down and Permutation) and one FDR (i.e., BH) multiple testing corrections method were discussed from the concept to implementation. Finally, a method to simulate GWAS data was proposed, and the four multiple testing corrections methods were evaluated on the simulated GWAS data. Results showed that SNPs reported without multiple testing corrections had both the highest average hit and the average false alarm. FWER methods reported fewer false alarms, but their average hits were also fewer than that from uncorrected or BH method. In contrast, BH method did well in balance between the false alarm and hit. Furthermore, a comprehensive index, called explained rate, was introduced to evaluate the different methods quantitatively. Results showed BH method had the highest explained rate. In the future GWAS study, researchers would better do multiple testing corrections with BH method.

Key words: Genome-Wide Association Studies, multiple testing corrections, FWER, FDR, simulation