透過您的圖書館登入
IP:3.145.97.248
  • 學位論文

機器學習與深度學習演算法於電信業務支援系統不平衡資料之效能評估

Performance Evaluation for Machine Learning and Deep Learning Algorithms on Imbalanced Dataset: Case Study of Business Support System

指導教授 : 林風
本文將於2028/08/17開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


從真實世界中收集的資料通常都是分佈不平衡(imbalanced)的,資料中各個類別的分布極度不平均。傳統的分類演算法會過度的傾向學習多數的類別(通常是較不重要的類別)。在這篇論文中,我們將以電信業務支援系統的異常預測為例,測試機器學習(machine learning)及深度學習(deep learning)演算法在不平衡資料上的效能。電信業務支援系統通常都維持著良好的穩定度,所以系統異常是稀有事件(rare events),因此機器學習及深度學習演算法更難在這個極度不平衡(highly imbalanced)的資料上達到良好的效果。為了解決這個問題,我們提出了基於頻率的特徵(Frequency-based Feature Creation),藉由產生新的特徵用來描述獨熱編碼(one hot encoded)特徵的分佈。除此之外,我們也修改了現有的技術用以增強少數類別的影響力,例如門檻投票(Voting with Threshold)及分類修正(Classification Correction)。

並列摘要


The data collected from the real systems is imbalanced, i.e. The classification categories are not equally represented. The existing classification algorithms usually introduce bias towards majority class (potentially uninteresting class). In this thesis, we will apply the anomaly prediction on a Business Support System (BSS) [1] of telecommunication service providers as a case to study the performance of the machine learning [2, 3, 4, 5] and deep learning [2, 3, 4] algorithms on imbalanced dataset. The reliability and stability have been treated as the major requirements for a BSS [6]. In other words, the occurrences of anomaly are rare events in a BSS. The distribution of the system log data of BSS is highly imbalanced. Thus, it is more challenging for machine learning algorithms and deep learning algorithms to have good performance on highly imbalanced datasets. To resolve the issue, we propose an approach, namely Frequency-based Feature Creation (FFC), to create new features to describe the distributions of the one-hot-encoded features. Furthermore, we enhance some existing techniques to amplify the effects of the minority class, e.g., Voting with Threshold (VT) and Classification Correction (CC).

參考文獻


[1] J. H. Chen, C. W. Huang, and C. C. Shih. The exploration of machine learning for abnormal prediction model of telecom business support system. In Asia-Pacific Network Operations and Management Symposium, 2017.
[2] K. Zhang, J. Xu, and M. R. Min. Automated it system failure prediction: A deep learning approach. In IEEE International Conference on Big Data, 2017.
[3] T. Islam and D. Manivannan. Predicting application failure in cloud: A machine learning approach. In IEEE International Conference on Cognitive Computing, 2017.
[4] A. Rosa, Y. Chen, and W. Binder. Failure analysis and prediction for big-data systems. IEEE Transactions on Services Computing, 10:984–998, 2017.
[5] I. Karakurt, S. Ozer, T. Ulusinan, and M. C. Ganiz. A machine learning approach to database failure prediction. In Computer Science and Engineering International Conference on Computer Science and Engineering, 2017.

延伸閱讀