透過您的圖書館登入
IP:18.222.239.77
  • 學位論文

結合資料增強與多特徵空間的自適應提升模型

AdaBoost with Data Augmentation in Multiple Feature Spaces

指導教授 : 鄭卜壬

摘要


自適應提升演算法,是提升演算法中一個非常經典的傑作,它的驚人能力已經被驗證多年。它會透過將多個弱分類器有順序性的訓練,並使用先前訓練中所得到的錯誤來幫助新的分類器,盡可能的使其不要再犯相同的錯誤,並且能夠使這些訓練好的弱分類器同心協力,互相互補,進而成為一個更強的分類器。然而隨著時間的推移,近幾年來陸續有各式各樣關於機器學習的不同想法被提出,或許乍看之下這些新的想法與自適應提升演算法並無關連,但我們認為其中有些其實是可以融合進原本的自適應提升演算法中,並且讓其變得更好的。 舉例來說,使用多個不同的特徵空間來訓練,以及近年來非常火紅的資料增強技術,在我們的認知中,便還未有人將其與自適應提升演算法一同討論。因此我們在本篇論文中提出了一個可以使用多個不同特徵空間,以及加入了資料增強來輔助的自適應提升演算法架構。並且,我們分別對圖像,文字等不同屬性的資料在多個資料集中進行了實驗,而透過實驗我們發現,使用多個不同的特徵空間,確實能夠對分類帶來助益,同時資料增強的技術也能輔助自適應提升演算法,使其有更好的結果。

並列摘要


AdaBoost is one of the most successful boosting algorithms. Its incredible power has been proven for many years. Sequentially training weak learners and using the previous learner's mistake to help the current one avoid those mistakes can make these learners work together and complement each other very well. However, along with more and more different machine learning concepts that have been proposed, we believe some of them could be incorporated into the original AdaBoost algorithm and make it even better. To our best knowledge, no previous work has studied the effect of either using multiple feature spaces or data augmentation in AdaBoost algorithm. Therefore, in this paper, we proposed a modified AdaBoost structure that includes these concepts simultaneously. Besides, we experimented with different types of data like text and image, respectively. From the experiment result, we found that both methods can greatly enhance the performance of the original AdaBoost.

參考文獻


[1] Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to boosting. Journal­Japanese Society For Artificial Intelligence, 14(771­780):1612, 1999.
[2] Wonji Lee, Chi­Hyuck Jun, and Jong­Seok Lee. Instance categorization by support vector machines to adjust weights in adaboost for imbalanced data classification. Information Sciences, 381:92–103, 2017.
[3] TrevorHastie,SaharonRosset,JiZhu,andHuiZou.Multi­classadaboost.Statistics and its Interface, 2(3):349–360, 2009.
[4] Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. Ordering­based pruning for improving the performance of ensem­ bles of classifiers in the framework of imbalanced datasets. Information Sciences, 354:178–196, 2016.
[5] Yoav Freund and Robert E Schapire. A decision­theoretic generalization of on­line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.

延伸閱讀