透過您的圖書館登入
IP:3.144.29.148
  • 學位論文

基於語境化語言模型與規則庫之ICD-10編碼系統

Using Deep Contextualized Language Model and Rule Base for ICD-10 Coding System

指導教授 : 賴飛羆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


背景: 中央健康保險署要求特約醫療院所於2016年起用 ICD-10-CM/PCS編碼作為給付依據,然而目前的編碼作業透過專業疾病分類人員花費大量時間與精力解讀和分析病歷內容及診斷,以確保每件病歷疾病分類編碼的正確性。 目標: 本篇研究的目標是希望透過語境化語言模型與規則庫建構一個ICD-10 AI輔助系統,以提升編碼的時效性與正確性,透過歸入正確的DRG落點,取得醫療院所應有的醫務收入。 方法: 本篇研究中,我們使用亞東醫院的出院病摘資料並透過資料探勘與自然語言處裡的技術 (包含 Word2Vec, XLNet, BERT, AttentionXLM) 於深度學習網路中以實現 ICD-10 的自動編碼。並透過實務上與疾分師討論所建立的規則庫來優化原本的語境化語言模型。 結果: 在各個實驗結果中發現,語境化語言模型在多標籤分類任務中的表現的確比非語境化語言模型來的好,其中以BioBERT分類模型搭配資料探勘與規則庫的優化在 ICD-10-CM 與 ICD-10-PCS 的編碼上達到最好的結果 (F1-Score 0.77 與 0.69)。 結論: 在模型與系統的驗證上,透過亞東醫院疾分師的實際參與實驗,我們比較了沒有提供AI預測編碼與有提供AI預測編碼的編碼時間與一致性,並與成對的樣本進行了分析,結果顯示有提供預測的ICD代碼可以將疾分師的平均F1從中位數從0.832提高到0.922 (P < 0.05),但沒有減少其平均編碼時間(P = 0.64)。這些模型上的改變與規則庫的優化都會在未來持續於本ICD-10 的網頁服務中,以提供所有的 ICD-10 使用者自動編碼及訓練的服務。 關鍵詞: 自然語言處理、深度學習、多標籤分任務、語境化語言模型、規則庫、國際疾病分類標準

並列摘要


Background: Since 2016, the National Health Insurance Administration, Taiwan has required all hospitals to use ICD-10-CM/PCS codes as the basis for insurance payment. However, in the current coding operation disease coders spend a lot of time and energy to interpret and analyze the content and diagnosis of the medical records to ensure the correctness of the classification and coding of each medical record. Objective: The goal of this research is to build an ICD-10 AI assistance system through a contextual language model and rule base to improve the timeliness and correctness of coding, and improve the cost control in hospital. Methods: We used the Far East Memorial Hospital (FEMH) diagnosis summary as the resources and applied data mining and NLP techniques, including Word2Vec, XLNet, BERT, AttentionXLM to implement ICD-10 auto-coding. Furthermore, we optimize the original contextual language model through the rule base established by discussing with the coder. Results: In experiments on the FEMH dataset, performance of the contextualized language model are better than the non-contextualized language model in the multi-label classification task. Our predicting result could achieve F1score of 0.77 and 0.69 on ICD-10-CM and ICD-10-PCS code with BioBERT classification model and rule-base. Conclusions: The ICD-10 coding system with our best BioBERT classification model improved coders’ mean F1 score from the median of 0.832 to 0.922 (P < 0.05). Keyword: Natural language processing, Deep learning, ICD-10, Contextualized language model, Rule base, Multi label classification

參考文獻


1. WHO. ICD-10 Version: 2015, apps.who.int. (2021).
2. Mills, Ronald E., et al., "Impact of the transition to ICD-10 on Medicare inpatient hospital payments," Journal of AHIMA website (2015).
3. Gu Siming, et al., "Research on the role cognition of disease classification personnel in Taiwan." Medical record information management (2010): 84-103.
4. Huang Sweden, et al., "Research on the impact of disease classification coding quality on hospitalization benefits." Medical record information management (2010): 12-26.
5. SANouraei, JSVirk, AHudovsky, CWathen, ADarzi, DParsons. "Accuracy of clinician-clinical coder information handover following acute medical admissions: implication for using administrative datasets in clinical outcomes management. Journal of Public Health." Journal of Public Health (2015): 352-362.

延伸閱讀