透過您的圖書館登入
IP:3.15.193.45
  • 期刊
  • OpenAccess

應用整合式多階段分類模式於肝臟疾病的患者預測之研究

APPLICATION OF INTEGRATED MULTI-STAGE CLASSIFICATION MODEL TO THE PREDICTION OF LIVER DISEASE PATIENTS

摘要


肝臟疾病是由多種原因引起的複雜疾病。由於肝臟患者的症狀細微,在早期難以診斷。肝細胞的累積脂肪會與代謝綜合症、心血管疾病和2型糖尿病等重要的慢性疾病有關。在關於肝臟疾病預測的眾多文獻中,使用機器學習技術建構肝臟疾病預測模式,已廣泛的應用於肝臟議題中,然而,影響肝臟疾病的風險因子眾多,且資料組成結構具有類別不平衡(Class Imbalance)問題。為建構有效的預測模式,本研究運用印度肝臟疾病資料為研究的實證資料建構整合式預測架構。在此預測架構中,將應用機器學習中的邏輯斯迴歸(LR)、支援向量機(SVM)、多元適性雲型迴歸(MARS)和坡度提升演算法(XGBoost)分類技術與特徵選取技術的內嵌法(Lasso)、過濾法(Filter)方法;以及過採樣法(Over)、人工數據合成法(SDG)處理資料不平衡技術建構預測模式,並將所提之整合式模式與單純模式的結果進行比較。實證結果顯示,無論資料切割比例,所提之整合式預測模式的預測結果相較於單純預測模式較佳。並由最佳模式中可知,經由資料不平衡技術後再特徵選取能夠有效提升預測績效,並且所提模式能有效地建構肝臟疾病的預測模式。

並列摘要


Liver disease is a complex disease caused by many reasons. Due to the subtle symptoms of liver patients, it is difficult to diagnose at an early stage. The accumulation of fat in liver cells may be related to important chronic diseases such as metabolic syndrome, cardiovascular disease, and type 2 diabetes. The accumulation of fat in liver cells may be related to important chronic diseases such as metabolic syndrome, cardiovascular disease, and type 2 diabetes. In many literatures on liver disease prediction, the use of machine learning technology to construct liver disease prediction models has been widely used in liver issues. However, there are many risk factors affecting liver disease, and the data composition structure has a class imbalance problem. In order to construct a valid prediction model, this study used empirical data of the Indian liver disease data to this study to construct an integrated prediction framework. In this prediction framework, the logistic regression (LR), support vector machine (SVM), multivariate adaptive regression splines (MARS), and eXtreme Gradient Boosting (XGBoost) for classification technology in machine learning are applied; embedded method (Lasso) and filter method in feature selection technology; oversampling method (Over), synthetic data generation method (SDG) processing data imbalance technology to construct a prediction model, and carry out the results of the proposed integrated model and simple model compare. The empirical results show that regardless of the data cutting ratio, the prediction results of the proposed integrated prediction model are better than those of the simple prediction model. And it can be seen from the best model that feature selection after data imbalance technology can effectively improve prediction performance, and the proposed model can effectively construct a prediction model for liver disease.

參考文獻


Abdar, M., Zomorodi-Moghadam, M., Das, R., & Ting, I. H. (2017). Performance analysis of classification algorithms on early detection of liver disease. Expert Systems with Applications, 67, 239-251.
Açıkoğlu, M., & Tuncer, S. A. (2020). Incorporating feature selection methods into a machine learning-based neonatal seizure diagnosis. Medical hypotheses, 135, 109464.
Andrade, A., Silva, J. S., Santos, J., & Belo-Soares. P. (2012). Classifier approaches for liver steatosis using ultrasound images. Procedia Technology, 5, 763-770.
Araújo, A. R., Rosso, N., Bedogni, G., Tiribelli, C., & Bellentani, S. (2018). Global epidemiology of non-alcoholic fatty liver disease/non-alcoholic steatohepatitis: What we need in the future. Liver international: official journal of the International Association for the Study of the Liver, 38 Suppl 1, 47-51.
Bidi, N., & Elberrichi, Z. (2016). Feature selection for text classification using genetic algorithms. In the Proceedings of the 2016 8th International Conference on Modelling, Identification and Control (ICMIC), Algiers, Algeria, 806-810.

延伸閱讀