透過您的圖書館登入
IP:18.225.34.193
  • 期刊

加護病房患者臨床結果預測──機器學習與主題模型法之應用

Predicting Clinical Outcomes of the Patients in Intensive Care Units with Machine-Learning and Topic Model Techniques

摘要


目的:加護病房相關研究中,臨床病患生命徵象預測一直是非常重要的議題。然而過去電子健康記錄資料的分析,大多是以數值型資料為主,文字型資料的研究鮮少被發表。方法:根據MIMIC-III數據庫中所收集到的數值型資料及半結構化資料(病患的臨床記錄、診斷資料及檢驗報告),利用自然語言處理技術中的隱含狄利克雷分佈(LDA)模型與機器學習方法,對38,597名成年的加護病房患者進行臨床結果預測。結果:整合結構化資料及半結構化資料可增加預測之準確性,其中長期死亡率最佳AUROC達到0.871,短期死亡率最佳AUROC更能達到0.922。結論:本研究構建之模型可明顯提升預測效能,並成功辨識出重要變數。期望這樣的分析結果能增加醫護人員對於病患病情的掌握,也讓醫療資源能夠得到更優化的應用。

並列摘要


Objectives: Predicting clinical patients' vital signs remains a critical issue in intensive care unit (ICU) related studies. However, studies on electronic health record (EHR) data have mostly analyzed numerical data and rarely semi-structured textual data. Methods: Our study used structured and semi-structured data (i.e., patients' diagnosis data and inspection reports) collected from the MIMIC-III database. First, we used the Latent Dirichlet Allocation (LDA) model (a model employed in natural language processing) to process semi-structured data. Then, we used machine learning methods for the prediction of clinical outcomes in 38,597 adult ICU patients. Results: Based on the results, combining the structured and semi-structured data of ICU patients can strengthen the ICU patient mortality prediction accuracy. The model with machine learning methods generated favorable mortality predictions, where the highest AUROC, for long-term mortality is 0.871, and the highest AUROC for short-term mortality is 0.922. Conclusions: The constructed model successfully identified crucial variables for predicting patient mortality. Thus, when providing medical services to patients, health care personnel may consider the critical variables associated with the patients' hospitalization durations to ensure that the patients receive optimal medical services.

參考文獻


Afzal, N., Sohn, S., Abram, S., Scott, C. G., Chaudhry, R., Liu, H. F., ... Arruda-Olson, A. M. (2017). Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. Journal of Vascular Surgery, 65(6), 1753-1761. doi:10.1016/j.jvs.2016.11.031
Ali, S., Majid, A., Javed, S. G., & Sattar, M. (2016). Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data. Computers in Biology and Medicine, 73, 38-46. doi:10.1016/j.compbiomed.2016.04.002
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of machine Learning research, 3(4-5), 993-1022. doi:10.1162/jmlr.2003.3.4-5.993
Breuninger, T. A., Wawro, N., Breuninger, J., Reitmeier, S., Clavel, T., Six-Merker, J., ... Linseisen, J. (2021). Associations between habitual diet, metabolic disease, and the gut microbiota using latent Dirichlet allocation. Microbiome, 9(1), 18. doi:10.1186/s40168-020-00969-9
Cuadrado-Godia, E., Jamthikar, A. D., Gupta, D., Khanna, N. N., Araki, T., Maniruzzaman, M., ... Suri, J. S. (2019). Ranking of stroke and cardiovascular risk factors for an optimal risk calculator design: Logistic regression approach. Computers in Biology and Medicine, 108, 182-195. doi:10.1016/j.compbiomed.2019.03.020

延伸閱讀