資料探勘(data mining)是一種能夠在大量資料自動化發現一些有用資訊的流程,包含文數字、聲音、影像、視訊等各類型資料。如何有效地處理分析這些資料,並將資料轉為有用的資訊,已經變成現今各領域中重大的挑戰。因之,發展一個良好且適用於大數據資料探勘演算法,已成為近年來極為熱門的研究議題。 本研究結合實數型基因演算法的實驗設計、菁英保留、突變機制,進行傳統實數型基因演算法陷入局部極值、求解效能等問題之改善,經20個多模函數驗證的結果,找出求解精度較優的改良型實數編碼基因演算法的組合方式。 最後將本研究比較出的改良型實數編碼基因演算法與樸素貝式分類器(Naïve Bayes Classifier)結合成為本研究所提出的改良型樸素貝式分類器,並採用UCI 提供的12個資料集,進行改良型樸素貝式分類器的效能驗證。探勘結果顯示,本論文所開發的改良型樸素貝式分類器比傳統的樸素貝式分類法,具有較高的整體準確率及較低的偽陰性率。
Data mining is a process which can find some useful information through big data automatically. Nowadays, how to analyze these data and transform them into useful information has been a big challenge in various fields. Therefore, developing an algorithm of data mining which can be applied to analyze big data has been a fashionable issue in these years. In order to improve the problem of Real Coded Genetic Algorithms (RCGA) which is easily trapped in the local value and the capacity of RCGA, this research combines design of experiment, elite reservation, mutation mechanism of RCGA. By using 20 functions verification, this result reveals that the improved RCGA proposed in the study has better accuracy performance. In the end, this research integrates the improved RCGA and Naive Bayesian Classifier (NBC) to become improved NBC, then applies to classify 12 data sets which UCI provided to validate the effectivity of improved NBC. The results implied that the improved NBC developed in the study has better accuracy and lower false negative rate than traditional NBC.