ChatGPT 3.5與4.0於世界衛生組織COVID-19常見問答之比較：羅序分析

評估ChatGPT 3.5和4.0生成的COVID-19資訊與世界衛生組織（WHO）發布的一致性。我們從WHO的官方網站收集了487個與COVID-19相關的問題，並提供給ChatGPT 3.5和4.0生成答案。然後將生成的回應與官方的WHO答案進行比較。兩位臨床專家根據四個屬性（準確性、全面性、相關性和清晰度）給予1到5的評分。視覺分析Rasch評分量表模型的結果產生。根據兩位專家的評分，ChatGPT 3.5和4.0之間存在顯著的評分困難差異，這表示ChatGPT 4.0j較ChatGPT 3.5，更具優越的答題生成能力。ChatGPT 4.0提供的答案品質高於ChatGPT 3.5，但應注意其與官方WHO答案的差異。因此，依賴ChatGPT者，應再諮詢更可靠的資訊來源，以減少錯誤資訊的潛在風險。

關鍵字

聊天生成；世界衛生組織；羅序分析；常見問答； 95%信賴區間

並列摘要

This study aimed to evaluate the consistency of COVID-19 information produced by ChatGPT versions 3.5 and 4.0 with official releases from the World Health Organization (WHO). For this purpose, 487 COVID-19-specific questions were sourced from the WHO's official website and posed to both versions of ChatGPT. The answers generated by ChatGPT were then cross-checked with the official responses from the WHO. Two clinical experts rated these answers on a scale of 1 to 5, assessing them based on four criteria: accuracy, comprehensiveness, relevance, and clarity. The Rasch rating scale model aided in the visual representation of the findings. The results, as interpreted by the two experts, revealed notable differences in the quality of answers between the two ChatGPT versions. Specifically, ChatGPT 4.0 outperformed version 3.5 in terms of answer generation capabilities, as evidenced by the significant statistical differences in their ratings. However, despite ChatGPT 4.0's superior performance, there were still inconsistencies between its answers and the WHO's official responses. The study concludes by advising users to cross-reference ChatGPT's information with more reliable sources to avoid potential misinformation risks.

並列關鍵字

ChatGPT ； WHO ； Rasch analysis ； FAQ ； 95% Confidence Intervals

國際替代計量

ChatGPT 3.5與4.0於世界衛生組織COVID-19常見問答之比較：羅序分析

全文下載

主題瀏覽