背景:大型語言模型(Large language models, LLMs)正在迅速改變醫學和核子醫學領域。方法:本實驗從中華民國核醫學會官網收集112~113年專科醫師甄審試題共100題,以ChatGPT進行測試。結果:ChatGPT-4o正確率為82%,而ChatGPT-4o1 mini正確率為69%,兩者有達統計學的顯著差異(p-value = 0.009322)。GhatGPT-4o在含有圖片的題目中,正確率為56.56%(9/14),而只含文字的題目正確率為84.88%(73/86),但兩者並無統計學顯著差異(p-value = 0.1247)。ChatGPT-4o1 mini在含有圖片的題目中,正確率為50%(7/14),而只含文字的題目正確率為72.09%(62/86),但兩者亦無統計學顯著差異(p-value = 0.1223)。結論:本研究顯示LLMs已對核子醫學科的專業知識有令人驚艷的處理能力。但醫療從業人員在應用類似技術時,仍需要充份確認其正確性,以避免誤用。
Background: Large language models (LLMs) are rapidly transforming the fields of medicine and nuclear medicine. Methods: In this study, we collected 100 Nuclear Medicine Board Examination questions from the website of the Society of Nuclear Medicine, Taiwan (R.O.C), spanning the years 2023-2024. The questions were tested using ChatGPT. Results: ChatGPT-4o achieved an accuracy rate of 82%, while ChatGPT-4o1 mini achieved 69%, with a statistically significant difference between the two models (p-value = 0.009322). For questions containing images, ChatGPT-4o had an accuracy rate of 56.56% (9/14), while for text-only questions, its accuracy rate was 84.88% (73/86); however, the difference was not statistically significant (p-value = 0.1247). ChatGPT-4o1 mini achieved an accuracy rate of 50% (7/14) for questions containing images and 72.09% (62/86) for text-based questions, with no statistically significant difference (p-value = 0.1223). Conclusions: This study demonstrates that LLMs exhibit remarkable understanding of nuclear medicine knowledge. However, medical professionals must thoroughly verify the accuracy of such technologies to prevent misuse.