Lasso回归：从解释到预测

doi:10.3724/SP.J.1042.2020.01777

摘要/Abstract

摘要：

传统的最小二乘回归法关注于对当前数据集的准确估计, 容易导致模型的过拟合, 影响模型结论的可重复性。随着方法学领域的发展, 涌现出的新兴统计工具可以弥补传统方法的局限, 从过度关注回归系数值的解释转向提升研究结果的预测能力也愈加成为心理学领域重要的发展趋势。Lasso方法通过在模型估计中引入惩罚项的方式, 可以获得更高的预测准确度和模型概化能力, 同时也可以有效地处理过拟合和多重共线性问题, 有助于心理学理论的构建和完善。

关键词: 回归, 正则化, 预测, Lasso

Abstract:

Regression analysis, a method to evaluate the relationship between variables, is widely used in psychological studies. However, due to its highly focus on the interpretation of sample data, the traditional ordinary least squares regression has several drawbacks, such as over-fitting problem and limitation on dealing with multicollinearity, which may undermine the generalizability of the model. With the rapid development of methodology research, a shift from focusing on interpretation of the regression coefficients to improving the prediction of the model has emerged and become more and more important. Least absolute shrinkage and selection operator (Lasso) regression has been emerged to better compensate for the limitations of traditional methods. By introducing a penalty term in the model and shrinking the regression coefficients to zero, Lasso regression can achieve a higher accuracy of model prediction and model generalizability with the cost of a certain estimation bias. Besides, Lasso regression can also effectively deal with the multicollinearity problem. Therefore, it is helpful for the construction and improvement of psychological theory.

Key words: regression, regularization, Lasso, prediction

中图分类号:

B841

张沥今, 魏夏琰, 陆嘉琦, 潘俊豪. (2020). Lasso回归：从解释到预测. 心理科学进展 , 28(10), 1777-1788.

ZHANG Lijin, WEI Xiayan, LU Jiaqi, PAN Junhao. (2020). Lasso regression: From explanation to prediction. Advances in Psychological Science, 28(10), 1777-1788.

图/表 6

参考文献 91

[1]	胡传鹏, 王非, 过继成思, 宋梦迪, 隋洁, 彭凯平. (2016). 心理学研究中的可重复性问题: 从危机到契机. 心理科学进展, 24(9), 1504-1518. doi: 10.3724/SP.J.1042.2016.01504 URL
[2]	刘建伟, 崔立鹏, 刘泽宇, 罗雄麟. (2015). 正则化稀疏模型. 计算机学报, 38(7), 1307-1325.
[3]	彭运石, 李璜. (2011, 十月). 论西方心理学发展中的说明与理解之争. 文章展示于第十四届全国心理学学术会议, 北京.
[4]	邱怡轩. 统计之都访谈第 9 期:Hadley Wickham. 2019-8-30 取自 https://mp.weixin.qq.com/s/IPejDdwIFIx93UxsRwtQ1Q
[5]	吴喜之. (2019). 从模型驱动的集体推断到数据驱动的个体预测. 第12届中国R语言会议, 北京.
[6]	谢宇. (2010). 回归分析. 北京: 社会科学文献出版社.
[7]	许树红, 王慧, 孙红卫, 王彤. (2017). 基于lasso类方法的Ⅰ类错误的控制. 中国卫生统计, 4, 660-667.
[8]	张凤莲. (2010). 多元线性回归中多重共线性问题的解决办法探讨(硕士学位论文). 华南理工大学, 广州.
[9]	张厚粲, 徐建平. (2015). 现代心理与教育统计学. 北京: 北京师范大学出版社.
[10]	张沥今, 陆嘉琦, 魏夏琰, 潘俊豪. (2019). 贝叶斯结构方程模型及其研究现状. 心理科学进展, 27(11), 1812-1825.
[11]	Ayers, K. L., & Cordell, H. J. (2010). SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genetic Epidemiology, 34(8), 879-891. doi: 10.1002/gepi.20543 URL pmid: 21104890
[12]	Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66, 411-421. doi: 10.1097/01.psy.0000127692.23278.a9 URL pmid: 15184705
[13]	Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B., Wagenmakers, E. J., Berk, R., & Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6-10. doi: 10.1038/s41562-017-0189-z URL pmid: 30980045
[14]	Candes, E., & Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35(6), 2313-2351. doi: 10.1214/009053606000001523 URL
[15]	Chartterjee, S., & Hadi, A. S. (2006). Regression by Example: 4th Edition. Hoboken: John Wiley and Sons.
[16]	Chartterjee, S., Hadi, A. S., & Price, B. (2000). Regression by Example: 3rd Edition. Hoboken: John Wiley and Sons.
[17]	Cho, S., Kim, H., Oh, S., Kim, K., & Park, T. (2009). Elastic-net regularization approaches for genome wide association studies of rheumatoid arthritis. BioMed Central Procedings. 3(Suppl.7), S7-S25.
[18]	Cho, S., Kim, K., Kim, Y. J., Lee, J. K., Cho, Y. S., Lee, J. Y., … Park, T. (2010). Joint identification of multiple genetic variants via elastic net variable selection in a genome-wide association analysis. American Journal of Human Genetics, 74(5), 416-428.
[19]	Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Erlbaum.
[20]	Cortez, P., & Silva, A. (2008, April). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds. Proceedings of 5th FUture BUsiness TEChnology Conference(pp. 5-12). Porto, Portugal.
[21]	Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mottus, R., Waldorp, L. J., & Cramer, A. O. J. (2015a). State of the aRt personality research: A tutorial on network analysis of personality data in R. Journal of Research in Personality, 54(1), 13-29. doi: 10.1016/j.jrp.2014.07.003 URL
[22]	Costantini, G., Richetin, J., Borsboom, D., Fried, E., Rhemtulla, M., & Perugini, M. (2015b). Development of indirect measures of conscientiousness: Combining a facets approach and network analysis. European Journal of Personality, 29(5), 548-567. doi: 10.1002/per.v29.5 URL
[23]	Costantini, G., Richetin, J., Preti, E., Casini, E., Epskamp, S., & Perugini, M. (2019). Stability and variability of personality networks. A tutorial on recent developments in network psychometrics. Personality and Individual Differences, 136, 68-78. doi: 10.1016/j.paid.2017.06.011 URL
[24]	D’Angelo, G. M., Rao, D., & Gu, C. C. (2009). Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies. BioMed Central Procedings, 3(Suppl.7), S7-S62.
[25]	Di Pierro, R., Costantini, G., Benzi, I. M. A., Madeddu, F., & Preti, E. (2018). Grandiose and entitled, but still fragile: A network analysis of pathological narcissistic traits. Personality and Individual Differences, 140, 15-20. doi: 10.1016/j.paid.2018.04.003 URL
[26]	Demjaha, A., Lappin, J. M., Stahl, D., Patel, M. X., Maccabe, J. H., & Howes, O. D., … Murray, R. M. (2017). Antipsychotic treatment resistance in first-episode psychosis: Prevalence, subtypes and predictors. Psychological Medicine, 47(11), 1-9. doi: 10.1017/S0033291716002075 URL
[27]	Derksen, S., & Keselman, H. J. (1992). Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology, 45(2), 265-282. doi: 10.1111/bmsp.1992.45.issue-2 URL
[28]	Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32, 407-499. doi: 10.1214/009053604000000067 URL
[29]	Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50(1), 195-212. doi: 10.3758/s13428-017-0862-1 URL pmid: 28342071
[30]	Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualization of relationships in psychometric data. Journal of Statistical Software, 48(4), 1018.
[31]	Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348-1360. doi: 10.1198/016214501753382273 URL
[32]	Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics, 32(3), 928-961. doi: 10.1214/009053604000000256 URL
[33]	Fomby, T. B., Hill, R. C., & Johnson, S. R. (1984). Advanced Econometric Methods. New York, Berlin, Heidelberg, London, Paris, Tokyo: Springer-Verlag.
[34]	Fontanarosa, J. B., & Dai, Y. (2011). Using lasso regression to detect predictive aggregate effects in genetic studies. BioMed Central Procedings, 5(Suppl.9), 69-74.
[35]	Frank, L. E., & Heiser, W. J. (2011). Feature selection in feature network models: Finding predictive subsets of features with the positive lasso. British Journal of Mathematical & Statistical Psychology, 61(1), 1-27.
[36]	Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. URL pmid: 20808728
[37]	Friedman, J., Hastie, T., & Tibshirani, R. (2019). Bayesian Lasso/NG, Horseshoe, and Ridge Regression. Retrieved August 30, 2019, from https://www.rdocumentation.org/packages/monomvn/versions/1.9-10/topics/blasso
[38]	Giordano, C., & Waller, N, G. (2019). A neglected aspect of the reproducibility crisis: Factor analytic monte carlo studies. Multivariate Behavioral Research, 55(1), 152. doi: 10.1080/00273171.2019.1697864 URL
[39]	Hans, C. (2009). Bayesian Lasso regression. Biometrika, 96(4), 835-845. doi: 10.1093/biomet/asp047 URL
[40]	Harrell, F. E. Jr. (2015). Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis, 2nd. New York: Springer-Verlag.
[41]	Hartmann, A., Zeeck, A., & Barrett, M. S. (2010). Interpersonal problems in eating disorders. International Journal of Eating Disorders, 43(7), 619-627. URL pmid: 19718674
[42]	Helwig, N. E. (2017). Adding bias to reduce variance in psychological results: A tutorial on penalized regression. The Quantitative Methods for Psychology, 13(1), 1-19. doi: 10.20982/tqmp.13.1.p001 URL
[43]	Hesterberg, T., Choi, N. H., Meier, L., & Fraley, C. (2008). Least angle and $\ell $ 1 penalized regression: A review. Statistics Surveys, 2, 61-93. doi: 10.1214/08-SS035 URL
[44]	Hirose, K. (2019). Retrieved August 19, 2019, from https://www.rdocumentation.org/packages/msgps/versions/1.3.1
[45]	Jacobucci, R. (2019). regsem: regularized structural equation models. R package version 1.3.9. Retrieved June 01, 2019, from https://cran.r-project.org/web/packages/regsem/index.html
[46]	Jacobucci, R., Brandmaier, A., & Kievit, R. (2019). A practical guide to variable selection in structural equation models with regularized MIMIC models. Advances in Methods and Practices in Psychological Science, 2(1), 55-76. doi: 10.1177/2515245919826527 URL pmid: 31463424
[47]	Johnson, M., & Sinharay, S. (2011). Remarks from the new editors. Journal of Educational and Behavioral Statistics, 36(1), 3-5. doi: 10.3102/1076998610387267 URL
[48]	Kohannim, O., Hibar, D. P., Stein, J. L., Jahanshad, N., Hua, X., & Rajagopalan, P., … Thompson, P. M. (2012). Discovery and replication of gene influences on brain structure using lasso regression. Frontiers in Neuroscience, 6, 1-13. doi: 10.3389/fnins.2012.00001 URL pmid: 22294978
[49]	Kooperberg, C., LeBlanc, M., & Obenchain, V. (2010). Risk prediction using genome-wide association studies. Genetic Epidemiology, 34(7), 643-652. doi: 10.1002/gepi.20509 URL pmid: 20842684
[50]	Kraemer, N., & Schaefer, J. (2019). parcor: Regularized estimation of partial correlation matrices. Retrieved September 04, from https://www.rdocumentation.org/packages/parcor/versions/0.2-6
[51]	Kyung, M., Gill, J., Ghosh, M., & Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2), 369-411. doi: 10.1214/10-BA607 URL
[52]	Lee, T. F., Chao, P. J., Ting, H. M., Chang, L., Huang, Y. J., Wu, J. M., … Leung, S. W. (2014). Using multivariate regression model with Least Absolute Shrinkage and Selection Operator (LASSO) to predict the incidence of xerostomia after intensity-modulated radiotherapy for head and neck cancer. PLoS ONE, 9(2), e89700. doi: 10.1371/journal.pone.0089700 URL pmid: 24586971
[53]	Li, J., Das, K., Fu, G., Li, R., & Wu, R. (2011). The Bayesian lasso for genome-wide association studies. Bioinformatics, 27(4), 516-523. doi: 10.1093/bioinformatics/btq688 URL
[54]	Lin, Y., Zhang, M., Wang, L., Pungpapong, V., Fleet, J. C., & Zhang, D. (2009). Simultaneous genome-wide association studies of anti-cyclic citrullinated peptide in rheumatoid arthritis using penalized orthogonal-components regression. BioMed Central Procedings, 3(Suppl.20), S17-S20.
[55]	Lippke, S., & Ziegelmann, J. P. (2010). Theory-based health behavior change: Developing, testing, and applying theories for evidence-based interventions. Applied Psychology, 57(4), 698-716. doi: 10.1111/apps.2008.57.issue-4 URL
[56]	Lockhart, R., Taylor, J., Tibshirani, R. J., & Tibshirani, R. (2014). A significance test for the lasso. The Annals of Statistics, 42, 413-468. doi: 10.1214/13-AOS1175 URL pmid: 25574062
[57]	Maddala, G. S. (2002). Introduction to Econometrics: 3rd Edition. John Willey and Sons Limited, England.
[58]	Malo, N., Libiger, O., & Schork, N. J. (2008). Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. American Journal of Human Genetics, 82(2), 375-385. doi: 10.1016/j.ajhg.2007.10.012 URL pmid: 18252218
[59]	Marcus, D. K., Preszler, J., & Zeigler-Hill, V. (2017). A network of dark personality traits: What lies at the heart of darkness? Journal of Research in Personality, 73, 56-62. doi: 10.1016/j.jrp.2017.11.003 URL
[60]	Mcneish, D. M. (2015). Using Lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences. Multivariate Behavioral Research, 50(5), 471-484. doi: 10.1080/00273171.2015.1036965 URL pmid: 26610247
[61]	Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data Analysis, 52(1), 374-393. doi: 10.1016/j.csda.2006.12.019 URL
[62]	Meinshausen, N. (2019). Relaxed Lasso. Retrieved June 01, 2019, from https://www.rdocumentation.org/packages/relaxo/versions/0.1-2
[63]	Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods, 17(3), 313-335. doi: 10.1037/a0026802 URL pmid: 22962886
[64]	Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus User’s Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén.
[65]	Nguyen, T., Duong, T., Venkatesh, S., & Phung, D. (2015). Autism blogs: Expressed emotion, language styles and concerns in personal and community settings. IEEE Transactions on Affective Computing, 6(3), 312-323. doi: 10.1109/TAFFC.2015.2400912 URL
[66]	Nuzzo, R. (2014). Scientific method: Statistical errors. Nature, 506(7487), 150-152. URL pmid: 24522584
[67]	Obuchi, T., & Kabashima, Y. (2016). Cross validation in lasso and its acceleration. Journal of Statistical Mechanics: Theory and Experiment, 2016(5), 1-37.
[68]	Pan, J. H., Ip, E. H., & Dubé, L. (2017). An alternative to post hoc model modification in confirmatory factor analysis: The Bayesian lasso. Psychological Methods, 22(4), 687-704. doi: 10.1037/met0000112 URL pmid: 29265848
[69]	Pan, J. H., Zhang, L.J., & Ip, E. H. (2019). blcfa: Bayesian Lasso Confirmatory Factor Analysis. Retrieved August 30, 2019, from https://github.com/zhanglj37/blcfa
[70]	Park, T., & Casella, G. (2008). The bayesian lasso. Journal of the American Statistical Association, 103(482), 681-686. doi: 10.1198/016214508000000337 URL
[71]	Rao, C. R. (1976). Estimation of parameters in a linear model. The Annals of Statistics, 4(6), 1023-1037. doi: 10.1214/aos/1176343639 URL
[72]	Richetin, J., Preti, E., Costantini, G., & De Panfilis, C. (2017). The centrality of affective instability and identity in Borderline Personality Disorder: Evidence from network analysis. PLoS One, 12(10), 1-14.
[73]	Rosenberg, M. D., Casey, B. J., & Holmes, A. J. (2018). Prediction complements explanation in understanding the developing brain. Nature Communications, 9(1), 1-13. doi: 10.1038/s41467-017-02088-w URL pmid: 29317637
[74]	Scheidt, C. E., Hasenburg, A., Kunze, M., Waller, E., Pfeifer, R., Zimmermann, P., … Waller, N. (2012). Are individual differences of attachment predicting bereavement outcome after perinatal loss? A prospective cohort study. Journal of Psychosomatic Research, 73(5), 375-382. doi: 10.1016/j.jpsychores.2012.08.017 URL pmid: 23062812
[75]	Schmid, N. S., Taylor, K. I., Foldi, N. S., Berres, M., & Monsch, A. U. (2013). Neuropsychological signs of Alzheimer’s disease 8 years prior to diagnosis. Journal of Alzheimer’s Disease, 34(2), 537-546.
[76]	Serang, S., Jacobucci, R., Brimhall, K. C., & Grimm, K. J. (2017). Exploratory mediation analysis via regularization. Structural Equation Modeling: A Multidisciplinary Journal, 24(5), 733-744. doi: 10.1080/10705511.2017.1311775 URL
[77]	Shi, G., Boerwinkle, E., Morrison, A. C., Gu, C. C., Chakravarti, A., & Rao, D. C. (2011). Mining gold dust under the genome wide significance level: A two-stage approach to analysis of GWAS. Genetic Epidemiology, 35(2), 111-118. URL pmid: 21254218
[78]	Spellman, B. A. (2015). A short (personal) future history of revolution 2.0. Perspectives on Psychological Science, 10(6), 886-899 URL pmid: 26581743
[79]	Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. The Journal of Experimental Education, 70(1), 80-93. doi: 10.1080/00220970109599499 URL
[80]	Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1), 267-288.
[81]	Tibshirani, R., Friedman, J., Hastie, T., Narasimhan, B., Simon, N., & Qian, J. (2019). glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. Retrieved May 18, 2019, from https://www.rdocumentation.org/packages/glmnet/versions/2.0-18
[82]	Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistic Society. 67(1), 91-108.
[83]	van de Schoot, R., Winter, S. D., Ryan, O., Zondervan- Zwijnenburg, M., & Depaoli, S. (2017). A systematic review of Bayesian articles in psychology: The last 25 years. Psychological Methods, 22(2), 217-239. URL pmid: 28594224
[84]	Waldmann, P., Mészáros, G., Gredler, B., Fuerst, C., & Sölkner, J. (2013). Corrigendum: evaluation of the lasso and the elastic net in genome-wide association studies. Frontiers in Genetics, 4(4), 270.
[85]	Wilkinson, L. (1979). Tests of significance in stepwise regression. Psychological Bulletin, 86(1), 168-174. doi: 10.1037/0033-2909.86.1.168 URL
[86]	Wu, T. T., & Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics, 2(1), 224-244. doi: 10.1214/07-AOAS147 URL
[87]	Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science A Journal of the Association for Psychological Science, 12(6), 1100-1122. URL pmid: 28841086
[88]	Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67. doi: 10.1111/rssb.2006.68.issue-1 URL
[89]	Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of American Statistical Association, 101(476), 1418-1429. doi: 10.1198/016214506000000735 URL
[90]	Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistic Society, 67(1), 301-320.
[91]	Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the Lasso. The Annals of Statistics, 35(5), 2173-2192. doi: 10.1214/009053607000000127 URL

	系数估计值(p值)
预测变量	OLS	Lasso	Relaxed Lasso
age	-0.206(0.009)**	-(0.072)	-
famrel	0.36(0.001)**	-(0.699)	-
freetime	0.058(0.57)	-(0.913)	-
gout	-0.014(0.891)	-(0.981)	-
dalc	-0.108(0.448)	-(0.646)	-
walc	0.17(0.105)	-(0.294)	-
health	0.046(0.509)	-(0.899)	-
absences	0.042(0.001)**	-(0.089)	-
G1	0.164(0.003)**	0.057(0.005)**	0.153(0.007)**
G2	0.977(<0.001)***	0.903(<0.001)***	0.987(<0.001)***
R²	0.835	-	0.822
adjusted R²	0.831	-	0.821
Mean Square Error	3.446	-	3.723

	系数估计值(p值)
预测变量	OLS	Lasso	Relaxed Lasso
age	-0.206(0.009)**	-(0.072)	-
famrel	0.36(0.001)**	-(0.699)	-
freetime	0.058(0.57)	-(0.913)	-
gout	-0.014(0.891)	-(0.981)	-
dalc	-0.108(0.448)	-(0.646)	-
walc	0.17(0.105)	-(0.294)	-
health	0.046(0.509)	-(0.899)	-
absences	0.042(0.001)**	-(0.089)	-
G1	0.164(0.003)**	0.057(0.005)**	0.153(0.007)**
G2	0.977(<0.001)***	0.903(<0.001)***	0.987(<0.001)***
R²	0.835	-	0.822
adjusted R²	0.831	-	0.821
Mean Square Error	3.446	-	3.723