透過您的圖書館登入
IP:3.139.105.83
  • 學位論文

建構一個PSO-SA的最佳化分類器於不平衡序列資料的分類

Build a PSO-SA Optimization Classifier for the Imbalance Sequence Data

指導教授 : 蔡介元

摘要


資料探勘中非常有名的技術「序列模式探勘」與「序列分類」的技術可以解決許多問題,例如顧客行為變化。然而,傳統的方法對於正確分類出少數樣本的能力不足。實際上,類別資料不平衡的問題經常會發生在日常生活中,例如詐騙行為的偵測、醫療診斷、垃圾信件偵測、產品監控與檢測等,因此本研究將針對不平衡序列資料發展出一個有效的分類方法。在本研究中,根據每個序列所屬的類別,AprioriAll演算法將被用來找出該類別的序列樣式,接著使用pairwise coupling方法將原始的多重類別序列資料拆成組合成許多組二類別的資料,針對每一組二類別的資料,我們會使用所提出的FMCIS方法來建構分類器並命名為FMCIS。每一個分類器會先對一條序列產生兩個相似度的值,接著再運用這兩個值去建構模糊偏好關係所需要的單元。然後使用模糊偏好關係將各個分類器所產生出來的單元值進行整合並加以計算,根據本研究所設定的終止條件將可以產生最後的分類結果。為了增加分類的準確性,一種混和的PSO-SA演算法將會被提出來調整FMCIS裡面序列樣式的權重以及模糊偏好關係裡面類別的權重。結果顯示出本研究提出的分類模型可以有效的解決序列資料不平行的分類問題,但是在模糊偏好關係這個部分,類別的權重並沒有辦法能夠有效地提升整體的分類準確率。

並列摘要


Sequential pattern mining and sequence classification are two popular data mining methods used to explore the change of customer behavior. However, traditional methods have poor predictive ability to identify minority instances when dealing with the class imbalance problem. Actually, the imbalance class problem such as fraud detection, medical diagnosis, spam detection and fault monitoring/inspection exists everywhere in the real world. Therefore, this study develops an effective method to cope with the imbalance sequence classification problem. In this study, the sequences are divided into several sequence subsets according to the class label of sequences. Then, the AprioriAll algorithm is applied for each sequence subset and finds its sequential patterns. Next, the pairwise coupling method is used to combine every pair of sequence subsets and form a set of binary class datasets. For every binary class dataset, Force multi-class imbalance sequence (FMCIS) method is developed to build a classifier. Each classifier will generate two similarity values for a sequence first, then construct the units in fuzzy preference relations due to these two similarity values. The units will be composed by the fuzzy preference relations and computed a set of non-dominated values. Finally, the final class label of a sequence will be predicted due to the maximal non-dominated value. To increase the prediction accuracy of the proposed classifier, a hybrid PSO-SA algorithm is developed to adjust the weights of each pattern in each classifier and the weights of each class in fuzzy preference relations. The results show that the proposed classification model is useful for the sequence classification with imbalance data and especially in the low support value. But the applying optimized weighting in fuzzy preference relations does not perform well as expected.

參考文獻


1. Agrawal, R. and Srkant, R., “Fast algorithm for mining association rules,” Proceedings of the 20th VLDB Conference, pp. 487-499, 1994.
3. Altincay, H. and Ergun, C., “Clustering based under-sampling for improving speaker verification decisions using AdaBoost,” Lecture Notes in Computer Science, 3138, pp. 698-706, 2004.
4. Arbell, O., Landau, G. M., Mitchell, J. S. B., “Edit distance of run-length encoded strings,” Information Processing Letters, 83, pp. 307-314, 2002.
5. Balopoulos, V., Hatzimichailidis, A. G., Papadopoulos, B. K., “Distance and similarity measures for fuzzy operators,” Information Science, 177, pp. 2336-2348, 2007.
6. Bargiela, A. and Pedrycz, W., “Recursive information granulation: aggregation and interpretation issues,” IEEE Transactions on Systems, Man, and Cybernetics, 33 (1), pp. 96-112, 2003b.

被引用紀錄


許景琦(2004)。精神科醫師發展相關多角化經營 暨共通智識之研究〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2004.02344

延伸閱讀