大型語言模型ChatGPT應用於核子醫學專科醫師考試

背景：大型語言模型（Large language models, LLMs）正在迅速改變醫學和核子醫學領域。方法：本實驗從中華民國核醫學會官網收集112～113年專科醫師甄審試題共100題，以ChatGPT進行測試。結果：ChatGPT-4o正確率為82%，而ChatGPT-4o1 mini正確率為69%，兩者有達統計學的顯著差異（p-value = 0.009322）。GhatGPT-4o在含有圖片的題目中，正確率為56.56%（9/14），而只含文字的題目正確率為84.88%（73/86），但兩者並無統計學顯著差異（p-value = 0.1247）。ChatGPT-4o1 mini在含有圖片的題目中，正確率為50%（7/14），而只含文字的題目正確率為72.09%（62/86），但兩者亦無統計學顯著差異（p-value = 0.1223）。結論：本研究顯示LLMs已對核子醫學科的專業知識有令人驚艷的處理能力。但醫療從業人員在應用類似技術時，仍需要充份確認其正確性，以避免誤用。

關鍵字

聊天機器人； Chat GPT ；大型語言模型；核子醫學；專科醫師考試

並列摘要

Background: Large language models (LLMs) are rapidly transforming the fields of medicine and nuclear medicine. Methods: In this study, we collected 100 Nuclear Medicine Board Examination questions from the website of the Society of Nuclear Medicine, Taiwan (R.O.C), spanning the years 2023-2024. The questions were tested using ChatGPT. Results: ChatGPT-4o achieved an accuracy rate of 82%, while ChatGPT-4o1 mini achieved 69%, with a statistically significant difference between the two models (p-value = 0.009322). For questions containing images, ChatGPT-4o had an accuracy rate of 56.56% (9/14), while for text-only questions, its accuracy rate was 84.88% (73/86); however, the difference was not statistically significant (p-value = 0.1247). ChatGPT-4o1 mini achieved an accuracy rate of 50% (7/14) for questions containing images and 72.09% (62/86) for text-based questions, with no statistically significant difference (p-value = 0.1223). Conclusions: This study demonstrates that LLMs exhibit remarkable understanding of nuclear medicine knowledge. However, medical professionals must thoroughly verify the accuracy of such technologies to prevent misuse.

並列關鍵字

chatbots ； ChatGPT ； large language models ； nuclear medicine ； Nuclear Medicine Board Examination

延伸閱讀

全文下載

主題瀏覽