  • 期刊


Leveraging Hierarchical Category Knowledge for Data-Imbalanced Diagnosis Text Understanding




Clinical notes are essential medical documents to record each patient's symptoms. Each record is typically annotated with medical diagnostic codes, which means diagnosis and treatment. This paper focuses on predicting diagnostic codes given the descriptive present illness in electronic health records by leveraging domain knowledge. We investigate various losses in a convolutional model to utilize hierarchical category knowledge of diagnostic codes in order to allow the model to share semantics across different labels under the same category. The proposed model not only considers the external domain knowledge but also addresses the issue about data imbalance. The MIMIC3 benchmark experiments show that the proposed methods can effectively utilize category knowledge and provide informative cues to improve the performance in terms of the top-ranked diagnostic codes which is better than the prior state-of-the-art. The investigation and discussion express the potential of integrating the domain knowledge in the current machine learning based models and guiding future research directions.


James Mullenbach, Sarah Wiegreffe, Jon Duke, et al: 2018. Explainable pre- diction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2018;1:1101-11. doi: 10.18653/v1/N18-1100
Alistair EW Johnson, Tom J Pollard, Lu Shen, et al: Mimic-iii, a freely accessible critical care database. Scientific Data 2016;3:160035. doi: 10.1038/sdata.2016.35
Gaurav Singh, James Thomas, Iain Marshall, et al: Structured multi-label biomedical text tagging via attentive neural tree decoding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018;2837-42. doi: 10.18653/v1/D18-1308
Yoon Kim: Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014;1746-51. doi: 10.3115/v1/D14-1181
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, et al: Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Health- care Conference, PMLR 2016;56:301-18.
