人工智慧應用於ICD-10疾病分類

目前疾病分類主要依靠人力閱讀大量的文字資料作為分類的依據，一位專業的疾病分類員需要長時間的專業訓練才能進行ICD-10（The International Statistical Classification of Diseases and Related Health Problems 10th Revision, ICD-10）疾病分類的複雜作業，而這項工作即便是由專業的疾病分類人員來進行，都需要花費大量的時間才能對一個病人做出正確的編碼。本研究包含了14萬筆的文字資料，這些文字類型的資料，例如出院診斷或是病史等等的文字記錄，透過這些文字資料希望能建立一套ICD-10代碼的自動分類系統，具有閱讀並處理這些醫師所寫下的文字資料的能力，最後得到相對應的ICD-10代碼。在本研究中，資料包含從臺大醫院2016年到2017年七月的出院病摘，ICD-10-CM的21類分類結果的F1-score可以達到85％，全部代碼分類的F1-score為65％，本研究的成果證明深度學習在醫療體系的文字資料應用值得進一步研究。

關鍵字

深度學習；文字分類；自然語言處理； ICD-10

並列摘要

At present, disease classification mainly relies on humans to read a large amount of text data as a basis for classification. A professional disease classifier requires long-term professional training to perform the complex tasks of ICD-10 (The International Statistical Classification of Diseases and Related Health Problems 10th Revision, ICD-10) disease classification. It takes a lot of time for the classification to make a correct ICD-10 coding for each patient. This study contains 140,000 labeled data and different types of text data, such as discharge note or history. We hope to train an automatic ICD-10 coding system from these text data and have the ability to read and understand the information written by doctors. In this study, the data included discharge note from National Taiwan University Hospital from 2016 to July 2017. We have 0.85 F1-score in ICD-10-CM 21-categories classification and 0.65 F1-score in all code classifications. The results of this study prove that deep learning is worthy of further research in medical system.

並列關鍵字

deep learning ； text classification ； natural language processing ； ICD-10

參考文獻

Farkas R, Szarvas G: Automatic construction of rule-based ICD-9-CM coding systems. Bioinformatics 2008;9(Suppl3): S10. doi: 10.1186/1471-2105-9-S3-S10

Quan H, Sundararajan V, Halfon P, et al: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005;43:1130-9. doi: 10.1097/01.mlr.0000182534.19832.83

Tsoumakas G, Katakis I: Multi label classification: an overview. International Journal of Data Warehousing and Mining 2007;3:1-13. doi: 10.4018/jdwm.2007070101

R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd International Conference on Machine Learning (ICML 2006) 2006;161-8. doi: 10.1145/1143844.1143865

Rajkomar A, Oren E, Chen K, et al: Scalable and accurate deep learning for electronic health records. Npj Digit Med 2018;1. doi: 10.1038/s41746-018-0029-1

國際替代計量

人工智慧應用於ICD-10疾病分類

全文下載

主題瀏覽