利用轉置權重馬氏距離改善類神經網路分類效果

在資料探勘中，分類與預測為主要的議題之一。現今的研究當中，已有許多分類方法被應用於資料的分類問題中，如: 類神經網路(Artificial Neural Network, ANN)、向量支援器(Support Vector Machine, SVM)、馬氏距離(Mahalanobis Distance)、決策樹(Decision Tree)等。由於類神經網路強大的模式辨認與高度的容錯能力，使得類神經網路最常被用來進行分類的工作。在利用類神經網路進行分類之前，資料的前處理對於類神經網路分類結果的好壞會有相當程度的影響，除了在使用類神經網路進行分類之前的正規化與標準化之外，並沒有對資料進行額外的處理，也就是說資料屬性的權重被當作是一模一樣，因此當有不恰當的資料屬性被拿來進行分類工作時，類神經網路可能會被此不恰當的屬性所誤導，進而讓分類結果不佳。所以在原始資料的前處理當中，資料屬性應該依其對分類結果的影響區分其權重，這樣才能讓對結果有顯著影響的因子被顯現出來，分類結果才會有所提升。馬氏距離的計算主要是用來衡量樣本點之間的距離，而鮮少有利用馬氏距離計算屬性變數間的距離，利用屬性間馬氏距離的計算，可以看出哪些屬性對於分類的結果佔有較大的比重，哪些屬性對於分類結果的影響較小。本研究提出以轉置權重馬氏距離為基礎之類神經網路分類模型(TWMD-based Neural Network)來解決分類問題，藉由馬氏距離越大相似程度越小之概念來給予屬性權重，並使用權重處理過後之資料來訓練類神經網路。研究結果顯示，經屬性權重處理過後的資料來訓練類神經網路會比未經屬性權重處理的資料來訓練類神經網路的分類結果要來的好。關鍵字：馬氏距離、類神經網路、屬性權重

關鍵字

馬氏距離；類神經網路；屬性權重

並列摘要

Abstract In the data mining field, classification and prediction are one of the most major issues. In resent research, there are many methodologies to apply to classification problem, like Neural Network, Support Vector Machine, Mahalanobis Distance, Decision Tree and so on. Due to the powerful pattern recognition and error tolerance ability in neural network, neural network is usually applied to do the classification works. Before using neural network to do classification, there are significant influence on data preprocessing. Except scaling and normalization before using neural network, it doesn’t do additional process on data. All the attributes are regarded as the equal weight. If using irrelevant attributes to do the classification, neural network may misdirect by the irrelevant attributes. Therefore, neural network doesn’t do well on classification. So, in the data preprocessing, the weight of the attribute should be distinguish between different weights based on the classification results. It will highlight the attributes that have significant influence on outcome and the result of classification can increase. The calculation of the Mahalanobis Distance is used to measure the distance of instances, but it is rarely to calculate the distance between attributes. Calculate the Mahalanobis Distances between attributes can know which attributes have significant influence on classification result. Our research use Neural Network based on Transposed Weighted Mahalanobis Distance (TWMD-based NN) to solve the classification problems. Use the concept of the similarity, the bigger Mahalanobis Distance, the smaller the weight. Finally, use the data which processed by attribute weight to train Neural Network. The research results show that the processed data by the attribute weights are better than the original data. Keywords：Mahalanobis Distance、Neural Network、Feature weight

並列關鍵字

Mahalanobis ； Neural Network ； Feature weight

參考文獻

[11] Taguchi and Jugulum, 2002 G. Taguchi, R. Jugulum, The Mahalanobis–Taguchi strategy, John Wiley & Sons, New York (2002)

[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan

Kaufmann, 2000.

[3] E. Krusinska and J. Liebhart, “Objective Evaluation of Degree of Illness with the Weighted Mahalanobis Distance. A Study for Patients Suffering from Chronic Obturative Lung Disease”, Computer in biology and medicine, Vol. 17, Issue 5, 1987, pp. 321-329.

[4] E. Krusinska, “A Valuation of State of Object Based on Weighted Mahalanobis Distance”, Pattern Recognition, Vol. 20, No. 4, 1987, pp 413-418.

國際替代計量

利用轉置權重馬氏距離改善類神經網路分類效果

主題瀏覽