分類是資料探勘裡面最重要的技術之一,我們可以透過分類,將已知資料進行處理分類並找出隱含的規則,日後可用此規則對未知的資料進行預測。日常生活中,它的應用非常的廣泛,例如醫療上我們可以利用此技術找出病人基因特徵之隱含規則,日後便可將此規則應用在其他病人,如此一來可以加速醫療流程,也讓醫生在診斷上有其他的依據可做參考。所以資料探勘是一門非常重要的技術和學問,在海量資料(Big Data)的來臨,我們更必須要藉由此技術來分析資料中隱含的意義資訊。 在本篇論文中,我們探討AdaBoost(Adaptive Boosting)二元及多元方法,首先賦予每個樣本一個權重值,再來利用改變樣本權重的方式來訓練多個弱分類器。訓練完成後,最終將多個弱分類器組合成一個強分類器,如此一來我們可以利用此強分類器來對未知資料進行預測。我們提供AdaBoost演算法在大腸癌、乳癌、8OX、及Iris資料集的實驗結果。
Classification is one of the technology in data mining, we can discover patterns and relationships between parameters in data by classification that we can use these patterns to predict unknown data. In the real life, it is applied in several areas. For example, we can discover patterns from the genes of patients by using classification and then it can apply to other patients by using this pattern. Thus, data mining is the most important technology in data analysis. In Big Data, it cannot obtain the information without using data mining. In this thesis, we study the binary and multiclass classification of AdaBoost algorithm. In this algorithm, each sample has a weight value. It uses T weak classifiers to train the training samples. In training weak classifiers, we must change the weight of each incorrectly and correctly classified sample. Finally, the strong classifier is to combine the votes of all individual weak classifiers and then we can use this strong classifier to predict the unknown data. Experiments on colon cancer, breast cancer, 8OX, and Iris data sets are illustrated.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。