透過您的圖書館登入
IP:18.226.180.161
  • 學位論文

階層化概念處理連續型資料分割 以進行資料挖掘

The Division of Continuous Data Attribute by Hierarchical Concepts: for Data Mining

指導教授 : 楊燕珠
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在資料採礦研究領域中,其實早已提出了許多的分類技術,早期研究中,常對決策樹的建立做許多的研究,例如:ID3、C4.5、CN2等。針對樹的設計演算法,其實也早已證明其精確性及效率性,除此之外,在其它的資料採礦技術裏,例如:類神經網路、天真貝氏對於分類結果亦有諸多的不同的供獻。 資料愈多、特性愈複雜是資料分析中要面對的問題,因此提供一個更強大、更適合的分析模型是必需的。我們希望將資料性質的複雜度降低,本研究提出一個易於了解、容易使用於大型資料庫的階層化概念模型,並根據此模型來做資料挖掘。 本研究利用階層化概念,將訓練資料依資訊值分離至不同的區塊 (包含連續型與離散型屬性),在這個概念當中,不但應用於解決資料的分割;另一方面,我們發現應用於天真貝氏,可發現最後能產生一分類模式-階層化天真貝氏,藉此透過階層化的產生來做資料的預測。 實驗中結合所有的屬性,推導出最後的分類結果;從實驗的結果得知,本研究概念模型在天真貝氏的應用,精確度提升;而且在應用了不同性質的資料庫,也證明本研究的穩健度。

並列摘要


In the research field of data mining, many different means of classification have been put forward, and in early research, the establishment of a decision tree has received a great deal of attention - for example, ID3, C4.5, and CN2, and so on. It has been proven that the design algorithm of the tree is accurate and efficient. In addition to this, other data mining techniques also contribute a great deal to classification, such as neural networks, and Naïve Bayesian. It is necessary to provide a more powerful and appropriate model for analysis, because with regard to data analysis, the more data there is, the more complex are its special features. In this research paper, in order to reduce the complexity of data attributes, we put forward a hierarchical concept model of a large database that is easy to understand and use, and perform data mining using this model. This research paper employs the concept of hierarchy. Training data is split into different parts according to the value of information (including continuous valued attributes and discrete valued attributes). The concept of hierarchy is not only applied to data splitting - we have also discovered that it can be applied to Naïve Bayesian. Its application to Naïve Bayesian can finally generate a classification pattern - hierarchical Naïve Bayesian by means of which data predictions can be made. We arrived at our final classification results by considering all the attributes in the experiment; it can be seen from the experiment results that accuracy is enhanced through the application of the concept model to Naïve Bayesian. Additionally, the stability of this research has been proven through the use of databases with different domains.

參考文獻


[1] Clark, P., Boswell, R., 1991, “Rule Induction with CN2: Some Recent Improvement”, In Proceeding of the 5th European Conference on Machine Learning, pp151-163.
[2] Clark, P., Niblett, T., 1989, “The CN2 induction algorithm”, Machine Learning Journal, 3(4), pp.261-283.
[4] Dougherty, J., Kohavi, R., Sahami, M., 1995, “Supervised and Unsupervised Discretization of Continuous Features”, International Conference on Machine Learning.
[5] Duch, W., Grudzinski, K., June 1999, “Weighting and selection of features”, Intelligent Information Systems VIII, In the proceedings of the workshop in Ustron, Poland, pp.14-18.
[11] Kononenko, I., 1993, ”Inductive and Bayesian Learning in Medical Diagnosis”, Applied Artificial Intelligence, Vol. 7, pp.317-337.

延伸閱讀