能辨“单次−多次博弈”的大语言模型: 理解与干预风险决策

doi:10.3724/SP.J.1041.2026.0416

摘要/Abstract

摘要：

风险决策的理论研究主要依赖行为结果的逆向推理和自我报告数据, 缺乏对决策过程的直接观测, 制约了其内在机制解释及有效行为干预方案开发。人工智能大语言模型(LLMs)的运用为克服以上局限提供了途径。本文通过三项研究系统考察了LLMs在风险决策中的模拟潜力, 基于DeepSeek-R1进行单次和多次博弈并生成决策依据, 并运用GPT-4o对其进行归纳性主题分析(ITA), 构建了LLMs生成决策策略文本的技术路径, 并将其用于决策干预。发现: (1) ChatGPT-3.5/4能复现人类单次(更风险规避)与多次(更风险寻求)博弈的典型选择模式; (2) LLMs能分清单次/多次博弈逻辑, 并正确分别运用规范性和描述性理论生成相应策略, 其策略被认可度高; (3) LLMs基于不同策略生成的干预文本能有效影响人们在医疗、金融、内容创作和电商营销情境中固有的风险决策偏好。研究系统验证LLMs对行为偏好的模拟能力, 对决策的理解力, 并构建了基于生成式AI的决策干预新范式, 为人工智能辅助高风险决策提供了理论和实践基础。

关键词: 风险决策, 单次/多次博弈, 大语言模型, 决策策略, 干预

Abstract:

Risky choice (RC) is a common and important form of decision making in daily life. Its theoretical development primarily follows two major theories: normative theory and descriptive theory. The paradigms of single- and repeated-play gambles can provide an effective framework for distinguishing between the theories. However, prior research lacks direct observations of the decision-making process, which can limit the deep understanding of individual behaviour and hinder the development of effective behavioural interventions. In recent years, large language models (LLMs) have demonstrated highly human-like characteristics by not only simulating human preferences in behavioural performance but also exhibiting similar reasoning pathways. This offers a promising solution to the aforementioned limitations. This study, which is grounded in the classic RC paradigms of single versus repeated gambles, investigates the capability of LLMs to simulate and understand risk preferences and decision-making processes. Specifically, this study explores the potential of LLMs’ understanding of decision strategies to generate intervention texts and evaluates their effectiveness in influencing human decisions.

This work comprises three studies. In Study 1, GPT-3.5 and GPT-4 were employed to simulate human responses to gambling decisions under nine probability conditions (with constant expected value), which generated a total of 3, 600 responses across single and repeated gamble scenarios. In Study 2, LLM-generated strategies were constructed through a three-stage process (decision rationale extraction, strategy generation and quality evaluation), then the human participants were required to complete decision-making tasks in two experiments: Experiment 1 replicated the medical/financial scenarios (N = 349, N _male = 174, M _age = 21.79) of Sun et al. (2014) in a 2 (context: medical vs. financial) × 2 (application frequency: single vs. repeated) within-subjects design, and Experiment 2 examined digital contexts with a 2 (context: content creation vs. e-commerce marketing) × 2 (frequency: single vs. repeated) mixed design (context as between subjects). Subsequently, DeepSeek-R1 was used to perform the same tasks and generate strategy texts through the three-stage process. Finally, the participants were instructed to evaluate their acceptance of the LLM-generated strategies. Study 3 extended the Study 2 methodology to determine whether the LLM-generated intervention texts could reverse the participants’ classic choice preference across the single versus repeated gamble scenarios. The Study 2 experimental contexts (Experiment 1: medical vs. financial, N = 460, N _male = 205, M _age = 21.80; Experiment 2: content creation vs. e-commerce marketing, N = 240, N _male = 106, M _age = 29.12) were mirrored in Study 3, in which strategically designed intervention texts were presented during the decision-making tasks to test their capacity to modify the participants’ inherent risk preference between the single and repeated gamble conditions and evaluate the persuasive efficacy of LLM-generated strategies on human decision biases.

Study 1 shows that the LLMs (GPT-3.5 and GPT-4) can successfully replicate the typical human pattern of risk aversion in single-play scenarios and risk seeking in repeated-play scenarios, though both models demonstrated an overall stronger tendency toward risk seeking compared with the human participants. Study 2 demonstrates that the human participants preferred low-EV certain options in single-play contexts and high-EV risky options in repeated-play contexts in both experiments. The participants also showed high agreement with the strategies generated by the LLMs in different scenarios. Study 3 confirms that the LLM-generated intervention texts can significantly influence the participants’ choice tendency in all four scenarios, with strong intervention effects observed in the single-play contexts. The LLM intervention strategies are characterised by reliance on expected value computations (normative) when promoting RCs and emphasis on certainty and robustness (descriptive) when promoting safe choices.

In summary, this study demonstrates that (1) LLMs can effectively simulate context-dependent human preferences in RC, particularly the shift from risk aversion in single plays to risk seeking in repeated plays; (2) LLMs can distinguish between the logic underlying single and repeated gambles and apply normative and descriptive reasoning accordingly to externalise decision strategies; and (3) the decision strategies extracted from LLM-generated reasoning can be used to construct effective intervention texts that can alter human preferences in classic risk decision tasks, thereby validating the feasibility and effectiveness of an LLM-based cognitive intervention pathway. This study offers a new technological paradigm for AI-assisted decision intervention and expands the application boundary of LLMs to human cognitive process modelling and regulation.

Key words: risk decision-making, single-vs. repeated-play gambles, large language models, decision strategy, intervention

中图分类号:

B849: C91

周蕾, 李立统, 王旭, 区桦烽, 胡倩瑜, 李爱梅, 古晨妍. (2026). 能辨“单次−多次博弈”的大语言模型: 理解与干预风险决策. 心理学报, 58(3), 416-436.

ZHOU Lei, LI Litong, WANG Xu, OU Huafeng, HU Qianyu, LI Aimei, GU Chenyan. (2026). Large language models capable of distinguishing between single and repeated gambles: Understanding and intervening in risky choice. Acta Psychologica Sinica, 58(3), 416-436.

图/表 30

参考文献 92

[1]	Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Aleman F. L.,... McGrew B. (2023). GPT-4 technical report. arXiv preprint. https://doi.org/10.48550/arXiv.2303.08774
[2]	Aher G. V., Arriaga R. I., & Kalai A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the 40th International Conference on Machine Learning (pp. 337-371). PMLR. https://proceedings.mlr.press/v202/aher23a.html
[3]	Altay S., Hacquin A. S., Chevallier C., & Mercier H. (2023). Information delivered by a chatbot has a positive impact on COVID-19 vaccines attitudes and intentions. Journal of Experimental Psychology: Applied, 29(1), 52-62. https://doi.org/10.1037/xap0000400 doi: 10.1037/xap0000400 URL
[4]	Anderson M. A. B., Cox D. J., & Dallery J. (2023). Effects of economic context and reward amount on delay and probability discounting. Journal of the Experimental Analysis of Behavior, 120(2), 204-213. https://doi.org/10.1002/jeab.868 doi: 10.1002/jeab.868 URL pmid: 37311053
[5]	Argyle L. P., Busby E. C., Fulda N., Gubler J., Rytting C., & Wingate D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337-351. https://doi.org/10.1017/pan.2023.2 doi: 10.1017/pan.2023.2 URL
[6]	Arora C., Sayeed A. I., Licorish S., Wang F., & Treude C. (2024). Optimizing large language model hyperparameters for code generation. arXiv preprint. https://doi.org/10.48550/arXiv.2408.10577
[7]	Barberis N., & Huang M. (2009). Preferences with frames: A new utility specification that allows for the framing of risks. Journal of Economic Dynamics and Control, 33(8), 1555-1576. https://doi.org/10.1016/j.jedc.2009.01.009 doi: 10.1016/j.jedc.2009.01.009 URL
[8]	Benartzi S., & Thaler R. H. (1999). Risk aversion or myopia? Choices in repeated gambles and retirement investments. Management Science, 45(3), 364-381. https://doi.org/10.1287/mnsc.45.3.364 doi: 10.1287/mnsc.45.3.364 URL
[9]	Binz M., & Schulz E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120 doi: 10.1073/pnas.2218523120 URL
[10]	Brandstätter E., Gigerenzer G., & Hertwig R. (2006). The priority heuristic: Making choices without trade-offs. Psychological Review, 113(2), 409-432. https://doi.org/10.1037/0033-295X.113.2.409 doi: 10.1037/0033-295X.113.2.409 URL pmid: 16637767
[11]	Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural research (pp. 137-164). Sage Publications.
[12]	Brown T., Mann B., Ryder N., Subbiah M., Kaplan J. D., Dhariwal P., … Amodei D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165
[13]	Carvalho T., Negm H., & El-Geneidy A. (2024). A comparison of the results from artificial intelligence-based and human-based transport-related thematic analysis. Findings. https://doi.org/10.32866/001c.94401
[14]	Chen Y., Liu T. X., Shan Y., & Zhong S. (2023). The emergence of economic rationality of GPT. Proceedings of the National Academy of Sciences, 120(51), e2316205120. https://doi.org/10.1073/pnas.2316205120 doi: 10.1073/pnas.2316205120 URL
[15]	Choi S., Kang H., Kim N., & Kim J. (2025). How does artificial intelligence improve human decision-making? Evidence from the AI-powered Go program. Strategic Management Journal, 46(6), 1523-1554. https://doi.org/10.1002/smj.3694 doi: 10.1002/smj.v46.6 URL
[16]	Christensen R. H. B. (2023). ordinal: Regression models for ordinal data (R package version 2023.12-4.1) [Computer software]. https://CRAN.R-project.org/package=ordinal
[17]	Coda-Forno J., Witte K., Jagadish A. K., Binz M., Akata Z., & Schulz E. (2023). Inducing anxiety in large language models can induce bias. arXiv preprint. https://doi.org/10.48550/arXiv.2304.11111
[18]	Dai S. C., Xiong A., & Ku L. W. (2023). LLM-in-the-loop: Leveraging large language model for thematic analysis. arXiv preprint. https://doi.org/10.48550/arXiv.2310.15100
[19]	de Kok T. (2025). ChatGPT for textual analysis? How to use generative LLMs in accounting research. Management Science, 71(9), 7888-7906. https://doi.org/10.1287/mnsc.2023.03253 doi: 10.1287/mnsc.2023.03253 URL
[20]	de Varda A. G., Saponaro C., & Marelli M. (2025). High variability in LLMs’ analogical reasoning. Nature Human Behaviour, 9(7), 1339-1341. https://doi.org/10.1038/s41562-025-02224-3 doi: 10.1038/s41562-025-02224-3 URL
[21]	DeepSeek-AI Guo, D. Yang, D. Zhang, H. Song, J. Zhang, R., … Zhang Z. (2025). Deepseek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.2501.12948
[22]	Deiana G., Dettori M., Arghittu A., Azara A., Gabutti G., & Castiglia P. (2023). Artificial intelligence and public health: Evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines, 11(7), 1217. https://doi.org/10.3390/vaccines11071217 doi: 10.3390/vaccines11071217 URL
[23]	Deiner M. S., Honcharov V., Li J., Mackey T. K., Porco T. C., & Sarkar U. (2024). Large language models can enable inductive thematic analysis of a social media corpus in a single prompt: Human validation study. JMIR Infodemiology, 4(1), e59641. https://doi.org/10.2196/59641 doi: 10.2196/59641 URL
[24]	Demszky D., Yang D., Yeager D. S., Bryan C. J., Clapper M., Chandhok S.,... Pennebaker J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2(11), 688-701. https://doi.org/10.1038/s44159-023-00241-5
[25]	Dillion D., Tandon N., Gu Y., & Gray K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597-600. https://doi.org/10.1016/j.tics.2023.04.008 doi: 10.1016/j.tics.2023.04.008 URL pmid: 37173156
[26]	Ding Y., Zhang L. L., Zhang C., Xu Y., Shang N., Xu J., Yang F., & Yang M. (2024). Longrope: Extending LLM context window beyond 2 million tokens. arXiv preprint. https://doi.org/10.48550/arXiv.2402.13753
[27]	Faul F., Erdfelder E., Lang A. G., & Buchner A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods*, 39(2), 175-191. https://doi.org/10.3758/BF03193146 doi: 10.3758/bf03193146 URL pmid: 17695343
[28]	Ferguson S. A., Aoyagui P. A., & Kuzminykh A. (2023). Something borrowed: Exploring the influence of AI-generated explanation text on the composition of human explanations. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-7). ACM. https://doi.org/10.1145/3544549.3585727
[29]	Goli A., & Singh A. (2024). Frontiers: Can large language models capture human preferences? Marketing Science, 43(4), 709-722. https://doi.org/10.1287/mksc.2023.0306 doi: 10.1287/mksc.2023.0306 URL
[30]	Grossmann I., Feinberg M., Parker D. C., Christakis N. A., Tetlock P. E., & Cunningham W. A. (2023). AI and the transformation of social science research. Science, 380(6650), 1108-1109. https://doi.org/10.1126/science.adi1778 doi: 10.1126/science.adi1778 URL pmid: 37319216
[31]	Gupta R., Nair K., Mishra M., Ibrahim B., & Bhardwaj S. (2024). Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda. International Journal of Information Management Data Insights, 4(1), 100232. https://doi.org/10.1016/j.jjimei.2024.100232 doi: 10.1016/j.jjimei.2024.100232 URL
[32]	Hagendorff T., Fabi S., & Kosinski M. (2023). Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT. Nature Computational Science, 3(10), 833-838. https://doi.org/10.1038/s43588-023-00527-x doi: 10.1038/s43588-023-00527-x URL pmid: 38177754
[33]	Hebenstreit K., Praas R., Kiesewetter L. P., & Samwald M. (2024). A comparison of chain-of-thought reasoning strategies across datasets and models. PeerJ Computer Science, 10, e1999. https://doi.org/10.7717/peerj-cs.1999
[34]	Hertwig R., & Erev I. (2009). The description-experience gap in risky choice. Trends in Cognitive Sciences, 13(12), 517-523. https://doi.org/10.1016/j.tics.2009.09.004 doi: 10.1016/j.tics.2009.09.004 URL pmid: 19836292
[35]	Jiao L., Li C., Chen Z., Xu H., & Xu Y. (2025). When AI “possesses” personality: Roles of good and evil personalities influence moral judgment in large language models. Acta Psychologica Sinica, 57(6), 929-946. https://doi.org/10.3724/SP.J.1041.2025.0929 doi: 10.3724/SP.J.1041.2025.0929 URL
	[焦丽颖, 李昌锦, 陈圳, 许恒彬, 许燕. (2025). 当AI“具有”人格: 善恶人格角色对大语言模型道德判断的影响. 心理学报, 57(6), 929-946.] doi: 10.3724/SP.J.1041.2025.0929
[36]	Jin H. J., & Han D. H. (2014). Interaction between message framing and consumers’ prior subjective knowledge regarding food safety issues. Food Policy, 44, 95-102. https://doi.org/10.1016/j.foodpol.2013.10.007 doi: 10.1016/j.foodpol.2013.10.007 URL
[37]	Jones E., & Steinhardt J. (2022). Capturing failures of large language models via human cognitive biases. Advances in Neural Information Processing Systems, 35, 11785-11799. https://doi.org/10.48550/arxiv.2202.12299
[38]	Kahneman D., & Tversky A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-292. https://doi.org/10.2307/1914185 doi: 10.2307/1914185 URL
[39]	Karinshak E., Hu A., Kong K., Rao V., Wang J., Wang J., & Zeng Y. (2024). LLM-globe: A benchmark evaluating the cultural values embedded in LLM output. arXiv preprint. https://doi.org/10.48550/arXiv.2411.06032
[40]	Karinshak E., Liu S. X., Park J. S., & Hancock J. T. (2023). Working with AI to persuade: Examining a large language model's ability to generate pro-vaccination messages. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1-29. https://doi.org/10.1145/3579592
[41]	Katz A., Fleming G. C., & Main J. (2024). Thematic analysis with open-source generative AI and machine learning: A new method for inductive qualitative codebook development. arXiv preprint. https://doi.org/10.48550/arXiv.2410.03721
[42]	Kelton A. S., Pennington R. R., & Tuttle B. M. (2010). The effects of information presentation format on judgment and decision making: A review of the information systems research. Journal of Information Systems, 24(2), 79-105. https://doi.org/10.2308/jis.2010.24.2.79 doi: 10.2308/jis.2010.24.2.79 URL
[43]	Khalid M. T., & Witmer A. P. (2025). Prompt engineering for large language model-assisted inductive thematic analysis. arXiv preprint. https://doi.org/10.48550/arXiv.2503.22978
[44]	Kumar A., & Lim S. S. (2008). How do decision frames influence the stock investment choices of individual investors? Management Science, 54(6), 1052-1064. https://doi.org/10.1287/mnsc.1070.0845 doi: 10.1287/mnsc.1070.0845 URL
[45]	Lehr S. A., Caliskan A., Liyanage S., & Banaji M. R. (2024). ChatGPT as research scientist: Probing GPT’s capabilities as a research librarian, research ethicist, data generator, and data predictor. Proceedings of the National Academy of Sciences, 121(35), e2404328121. https://doi.org/10.1073/pnas.2404328121 doi: 10.1073/pnas.2404328121 URL
[46]	Lenth R. V. (2025). Emmeans: Estimated marginal means, aka least-squares means (R package version 1.11.0) [Computer software]. https://doi.org/10.32614/CRAN.package.emmeans
[47]	Li S. (2004). A behavioral choice model when computational ability matters. Applied Intelligence, 20(2), 147-163. https://doi.org/10.1023/B:APIN.0000013337.01711.c7 doi: 10.1023/B:APIN.0000013337.01711.c7 URL
[48]	Lin Z. (2023). Why and how to embrace AI such as ChatGPT in your academic life. Royal Society Open Science, 10(8), 230658. https://doi.org/10.1098/rsos.230658 doi: 10.1098/rsos.230658 URL
[49]	Lin Z. (2024). How to write effective prompts for large language models. Nature Human Behaviour, 8(4), 611-615. https://doi.org/10.1038/s41562-024-01847-2 doi: 10.1038/s41562-024-01847-2 URL pmid: 38438650
[50]	Lin Z. (2025). Techniques for supercharging academic writing with generative AI. Nature Biomedical Engineering, 9(4), 426-431. https://doi.org/10.1038/s41551-024-01185-8 doi: 10.1038/s41551-024-01185-8 URL
[51]	Liu N., Zhou L., Li A. M., Hui Q. S., Zhou Y. R., & Zhang Y. Y. (2021). Neuroticism and risk-taking: the role of competition with a former winner or loser. Personality and Individual Differences, 179, 110917. https://doi.org/10.1016/j.paid.2021.110917 doi: 10.1016/j.paid.2021.110917 URL
[52]	Liu S. X., Yang J. Z., & Chu H. R. (2019). Now or future? Analyzing the effects of message frame and format in motivating Chinese females to get HPV vaccines for their children. Patient Education and Counseling, 102(1), 61-67. https://doi.org/10.1016/j.pec.2018.09.005 doi: S0738-3991(18)30692-X URL pmid: 30219633
[53]	Lopes L. L. (1996). When time is of the essence: Averaging, aspiration, and the short run. Organizational Behavior and Human Decision Processes, 65(3), 179-189. https://doi.org/10.1006/obhd.1996.0017 doi: 10.1006/obhd.1996.0017 URL
[54]	Lu J., Chen Y., & Fang Q. (2022). Promoting decision satisfaction: The effect of the decision target and strategy on process satisfaction. Journal of Business Research, 139, 1231-1239. https://doi.org/10.1016/j.jbusres.2021.10.056 doi: 10.1016/j.jbusres.2021.10.056 URL
[55]	Mei Q., Xie Y., Yuan W., & Jackson M. O. (2024). A turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121 doi: 10.1073/pnas.2313925121 URL
[56]	Mischler G., Li Y. A., Bickel S., Mehta A. D., & Mesgarani N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 6(10), 1467-1477. https://doi.org/10.1038/s42256-024-00925-4 doi: 10.1038/s42256-024-00925-4 URL
[57]	Morreale A., Stoklasa J., Collan M., & Lo Nigro G. (2018). Uncertain outcome presentations bias decisions: Experimental evidence from Finland and Italy. Annals of Operations Research, 268(1-2), 259-272. https://doi.org/10.1007/s10479-016-2349-3 doi: 10.1007/s10479-016-2349-3 URL
[58]	Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
[59]	Park P. S. (2024). Diminished diversity-of-thought in a standard large language model. Behavior Research Methods, 56(6), 5754-5770. https://doi.org/10.3758/s13428-023-02307-x doi: 10.3758/s13428-023-02307-x URL pmid: 38194165
[60]	Pascal B. (1670).Pensées (W. F. Trotter, Trans.). Retrieved Nov. 22, 2018, from https://sourcebooks.fordham.edu/mod/1660pascal-pensees.asp
[61]	Pavey L., & Churchill S. (2014). Promoting the avoidance of high-calorie snacks: Priming autonomy moderates message framing effects. PLoS One, 9(7), e103892. https://doi.org/10.1371/journal.pone.0103892 doi: 10.1371/journal.pone.0103892 URL
[62]	Pawel S., Consonni G., & Held L. (2023). Bayesian approaches to designing replication studies. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000604
[63]	Peng L., Guo Y., & Hu D. (2021). Information framing effect on public’s intention to receive the COVID-19 vaccination in China. Vaccines, 9(9), 995. https://doi.org/10.3390/vaccines9090995 doi: 10.3390/vaccines9090995 URL
[64]	Peters E., & Levin I. P. (2008). Dissecting the risky-choice framing effect: Numeracy as an individual-difference factor in weighting risky and riskless options. Judgment and Decision Making, 3(6), 435-448. https://doi.org/10.1017/s1930297500000012 doi: 10.1017/S1930297500000012 URL
[65]	Popovic N. F., Pachur T., & Gaissmaier W. (2019). The gap between medical and monetary choices under risk persists in decisions for others. Journal of Behavioral Decision Making, 32(4), 388-402. https://doi.org/10.1002/bdm.2121 doi: 10.1002/bdm.v32.4 URL
[66]	Prescott M. R., Yeager S., Ham L., Saldana C. D. R., Serrano V., Narez J., … Montoya J. (2024). Comparing the efficacy and efficiency of human and generative AI: Qualitative thematic analyses. JMIR AI, 3(1), e54482. https://doi.org/10.2196/54482 doi: 10.2196/54482 URL
[67]	Qin X., Huang M., & Ding J. (2024). AITurk: Using ChatGPT for social science research. PsyArXiv. https://doi.org/10.31234/osf.io/xkd23
[68]	Redelmeier D. A., & Tversky A. (1992). On the framing of multiple prospects. Psychological Science, 3(3), 191-193. https://doi.org/10.1111/j.1467-9280.1992.tb00025.x doi: 10.1111/j.1467-9280.1992.tb00025.x URL
[69]	Reeck C., Mullette-Gillman O. A., McLaurin R. E., & Huettel S. A. (2022). Beyond money: Risk preferences across both economic and non-economic contexts predict financial decisions. PLoS One, 17(12), e0279125. https://doi.org/10.1371/journal.pone.0279125 doi: 10.1371/journal.pone.0279125 URL
[70]	Salles A., Evers K., & Farisco M. (2020). Anthropomorphism in AI. AJOB Neuroscience, 11(2), 88-95. https://doi.org/10.1080/21507740.2020.1740350 doi: 10.1080/21507740.2020.1740350 URL pmid: 32228388
[71]	Samuelson P. A. (1963). Risk and uncertainty: A fallacy of large numbers. Scientia, 98, 108-113.
[72]	Scarffe A., Coates A., Brand K., & Michalowski W. (2024). Decision threshold models in medical decision making: A scoping literature review. BMC Medical Informatics and Decision Making, 24(1), 273. https://doi.org/10.1186/s12911-024-02681-2 doi: 10.1186/s12911-024-02681-2 URL
[73]	Shahid N., Rappon T., & Berta W. (2019). Applications of artificial neural networks in health care organizational decision-making: A scoping review. PLoS One, 14(2), e0212356. https://doi.org/10.1371/journal.pone.0212356 doi: 10.1371/journal.pone.0212356 URL
[74]	Simonsohn U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559-569. https://doi.org/10.1177/0956797614567341 doi: 10.1177/0956797614567341 URL pmid: 25800521
[75]	Strachan J. W. A., Albergo D., Borghini G., Pansardi O., Scaliti E., Gupta S., … Becchio C. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285-1295. https://doi.org/10.1038/s41562-024-01882-z doi: 10.1038/s41562-024-01882-z URL pmid: 38769463
[76]	Sun H. Y., Rao L. L., Zhou K., & Li S. (2014). Formulating an emergency plan based on expectation-maximization is one thing, but applying it to a single case is another. Journal of Risk Research, 17(7), 785-814. https://doi.org/10.1080/13669877.2013.816333 doi: 10.1080/13669877.2013.816333 URL
[77]	Suri G., Slater L. R., Ziaee A., & Nguyen M. (2024). Do large language models show decision heuristics similar to humans? A case study using GPT-3.5. Journal of Experimental Psychology: General, 153(4), 1066-1075. https://doi.org/10.1037/xge0001547 doi: 10.1037/xge0001547 URL
[78]	Tabachnick B. G., & Fidell L. S. (2007). Using multivariate statistics (5th ed.). Allyn & Bacon.
[79]	Thapa S., & Adhikari S. (2023). ChatGPT, Bard, and large language models for biomedical research: Opportunities and pitfalls. Annals of Biomedical Engineering, 51(12), 2647-2651. https://doi.org/10.1007/s10439-023-03284-0 doi: 10.1007/s10439-023-03284-0 URL
[80]	Tversky A., & Bar-Hillel M. (1983). Risk: The long and the short. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(4), 713-717. https://doi.org/10.1037/0278-7393.9.4.713 doi: 10.1037/0278-7393.9.4.713 URL
[81]	Von Neumann J., & Morgenstern O. (1947). Theory of games and economic behavior (2nd rev. ed.). Princeton University Press.
[82]	Wang Y., Zhang J., Wang F., Xu W., & Liu W. (2023). Do not think any virtue trivial, and thus neglect it: Serial mediating role of social mindfulness and perspective taking. Acta Psychologica Sinica, 55(4), 626-641. https://doi.org/10.3724/SP.J.1041.2023.00626 doi: 10.3724/SP.J.1041.2023.00626 URL
	[王伊萌, 张敬敏, 汪凤炎, 许文涛, 刘维婷. (2023). 勿以善小而不为: 正念与智慧——社会善念与观点采择的链式中介. 心理学报, 55(4), 626-641. https://doi.org/10.3724/SP.J.1041.2023.00626 ] doi: 10.3724/SP.J.1041.2023.00626 URL
[83]	Webb T., Holyoak K. J., & Lu H. (2023). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9), 1526-1541. https://doi.org/10.48550/arXiv.2212.09196 doi: 10.1038/s41562-023-01659-w URL pmid: 37524930
[84]	Weber E. U., Blais A. R., & Betz N. E. (2002). A domain- specific risk-attitude scale: Measuring risk perceptions and risk behaviors. Journal of Behavioral Decision Making, 15(4), 263-290. https://doi.org/10.1002/bdm.414 doi: 10.1002/bdm.v15:4 URL
[85]	Wei J., Wang X., Schuurmans D., Bosma M., Ichter B., Xia F.,... Zhou D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824-24837. https://doi.org/10.48550/arxiv.2201.11903
[86]	Xia D., Li Y., He Y., Zhang T., Wang Y., & Gu J. (2019). Exploring the role of cultural individualism and collectivism on public acceptance of nuclear energy. Energy Policy, 132, 208-215. https://doi.org/10.1016/j.enpol.2019.05.014 doi: 10.1016/j.enpol.2019.05.014 URL
[87]	Xia D., Song M., & Zhu T. (2025). A comparison of the persuasiveness of human and ChatGPT generated pro- vaccine messages for HPV. Frontiers in Public Health, 12, 1515871. https://doi.org/10.3389/fpubh.2024.1515871 doi: 10.3389/fpubh.2024.1515871 URL
[88]	Yuan Y., Jiao W., Wang W., Huang J. T., He P., Shi S., & Tu Z. (2023). Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher. arXiv preprint. https://doi.org/10.48550/arXiv.2308.06463
[89]	Zhang J., Li H. A., & Allenby G. M. (2024). Using text analysis in parallel mediation analysis. Marketing Science, 43(5), 953-970. https://doi.org/10.1287/mksc.2023.0045 doi: 10.1287/mksc.2023.0045 URL
[90]	Zhang Y., Huang F., Mo L., Liu X., & Zhu T. (2025). Suicidal ideation data augmentation and recognition technology based on large language models. Acta Psychologica Sinica, 57(6), 987-1000. https://doi.org/10.3724/SP.J.1041.2025.0987 doi: 10.3724/SP.J.1041.2025.0987 URL
	[章彦博, 黄峰, 莫柳铃, 刘晓倩, 朱廷劭. (2025). 基于大语言模型的自杀意念文本数据增强与识别技术. 心理学报, 57(6), 987-1000.] doi: 10.3724/SP.J.1041.2025.0987
[91]	Zhao F., Yu F., & Shang Y. (2024). A new method supporting qualitative data analysis through prompt generation for inductive coding. 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 164-169. https://doi.org/10.1109/IRI62200.2024.00043
[92]	Zhao W. X., Zhou K., Li J., Tang T., Wang X., Hou Y.,... Wen J. R. (2023). A survey of large language models. arXiv preprint. https://doi.org/10.48550/arXiv.2303.18223

赌博游戏任务
获益结果		损失结果
金额(元)	概率(%)	金额(元)	概率(%)
+10000	10	−278	10
+5000	20	−313	20
+3333	30	−357	30
+2500	40	−417	40
+2000	50	−500	50
+1667	60	−625	60
+1429	70	−833	70
+1250	80	−1250	80
+1111	90	−2500	90

赌博游戏任务
获益结果		损失结果
金额(元)	概率(%)	金额(元)	概率(%)
+10000	10	−278	10
+5000	20	−313	20
+3333	30	−357	30
+2500	40	−417	40
+2000	50	−500	50
+1667	60	−625	60
+1429	70	−833	70
+1250	80	−1250	80
+1111	90	−2500	90

自变量	回归系数(β)	标准误(SE)	95% CI	Wald χ²值	Exp (β)	p值
截距	1.867	0.098	[1.679, 2.064]	362.10	6.47	< 0.001
博弈次数(多次 = 1, 单次 = 0)	0.597	0.159	[0.289, 0.911]	14.17	1.82	< 0.001
模型类型(GPT-4 = 1, GPT-3.5 = 0)	−0.815	0.124	[−1.061, −0.574]	43.09	0.44	< 0.001
博弈次数 ×模型类型	0.251	0.202	[−0.149, 0.646]	1.55	1.29	0.213

自变量	回归系数(β)	标准误(SE)	95% CI	Wald χ²值	Exp (β)	p值
截距	1.867	0.098	[1.679, 2.064]	362.10	6.47	< 0.001
博弈次数(多次 = 1, 单次 = 0)	0.597	0.159	[0.289, 0.911]	14.17	1.82	< 0.001
模型类型(GPT-4 = 1, GPT-3.5 = 0)	−0.815	0.124	[−1.061, −0.574]	43.09	0.44	< 0.001
博弈次数 ×模型类型	0.251	0.202	[−0.149, 0.646]	1.55	1.29	0.213

变量类型	回归系数(β)	标准差(SE)	z	95% CI	p值
博弈次数(多次 = 1, 单次 = 0)	1.967	0.175	11.245	[1.624, 2.309]	< 0.001
实验情境(金融 = 1, 医疗 = 0)	−1.828	0.242	−7.549	[−2.303, −1.354]	< 0.001
博弈次数 × 实验情境	0.766	0.265	2.886	[0.246, 1.286]	0.004