透過您的圖書館登入
IP:3.12.161.77
  • 學位論文

於特徵成本限制下使用線性分類器樹之整體分類器模型

A Salient Ensemble of Trees using Cascaded Linear classifiers with Feature-Cost Constraints

指導教授 : 陳銘憲

摘要


在機器學習領域中,成本感知(cost sensitive)特徵選取以及模型訓練都是熱門的研究方向,傳統特徵選取的目標是選取出資訊量最大且重複性最低的特徵子集合,然而在許多情境之下這些特徵並非是能夠免費使用的。除此之外,某些應用情境會需要保證每一筆測試資料在進入模型之後都不會超出成本預算限制,像是一個即時(real-time)應用常常有反應時間的限制。研究人員們已經提出許多同時考量準確率與特徵成本之模型,以往的研究常常假設特徵成本之間是互相獨立的,然而在很多情況下這並不符合現實。本論文將特徵成本分成個別成本(individual cost)以及組成本(group cost),個別成本代表的是獨立於其他特徵的成本,像是儲存每個特徵所需要的記憶體。組成本則是當某一組中任何一個特徵被抽取後會需要支付的成本。在本論文中我們提出了一個整合了成本感知特徵選取技術以及具特徵成本預算限制之模型(model with feature-cost budget constraints)的兩階段式演算法,我們提出的組成本感知隨機森林(Group-cOst-sensitive rAndom foresT, GOAT)模型能夠同時考慮個別成本與組成本並進行特徵選取,接著我們會利用選取出的特徵子集合來訓練分類模型,本論文提出了使用線性分類器樹之整體分類器模型(Ensemble of Trees using cascaded lInear Classifiers, ETIC),此模型能夠在滿足特徵成本限制的同時對任何測試資料進行預測。我們在模型中使用了線性分類器樹,也就是使用了多個特徵來訓練一個節點,這個方法能夠讓每個節點比傳統的決策樹形成更強大的決定邊界(decision boundary)。在實驗中我們使用了包括穿戴式裝置資料與物件偵測資料來進行實驗,在和一些以往的方法比較之後我們能夠從結果中看到我們所提出的GOAT成本感知特徵選取方法以及ETIC分類模型確實能夠得到較為優異的表現。

並列摘要


In machine learning field, both feature selection and cost-sensitive model training are widely studied. The traditional goal of feature selection is to find features with more information and less redundancy, but in many situation the features are not free to use. Moreover, in some application it’s important to guarantee that the model will never run out of cost budget given any testing instance, e.g. real-time application with limited response time. Researchers have tried to model the trade-off between performance and cost, but they often assume that the costs are independent from other features, which is not practical in reality. In this thesis, we model the feature cost as two categories, individual cost and group cost. The individual cost stands for the part that is independent from any other features such as memory. The group cost represents the part that charges only once when any feature in that group is extracted. Moreover, we propose a two-stage algorithm that incorporates both cost-sensitive feature selection and model with a cost budget constraint. We propose a cost-sensitive feature selection algorithm that considers both individual cost and group cost based on the idea of random forest, i.e. group-cost-sensitive random forest (GOAT) algorithm. After the proper feature subset is selected, the proposed algorithm applies the derived features to building a salient ensemble of trees each of which uses cascaded linear classifiers (ETIC). Moreover, the ETIC model is trained with the satisfaction of the feature-cost constraints. Our proposed ETIC model applies multiple features in each node, which is more powerful than traditional random forest that uses only a feature in each node. In the experiment, we compare the results between our proposed algorithm and some baselines using real-data including the user preference data and the object detection data. When the group cost dominates, our GOAT-ETIC model can gain a 10 to 30% improvement over the baseline algorithms.

參考文獻


[1] M. Bastan, H. Cam, U. Gudukbay, and O. Ulusoy. Bilvideo-7: an mpeg-7-compatible video indexing and retrieval system. IEEE MultiMedia, 17(3):62–73, 2010.
[2] B. E. Boser, E. Sackinger, J. Bromley, L. D. Jackel, et al. Hardware requirements for neural network pattern classifiers: A case study and implementation. IEEE micro, 12(1):32–40, 1992.
[4] M. Dash and H. Liu. Feature selection for classification. Intelligent data analysis, 1(1-4):131–156, 1997.
[8] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and B. Dally. Deep compression and eie: Efficient inference engine on compressed deep neural network. In Hot Chips 28 Symposium (HCS), 2016 IEEE, pages 1–6. IEEE, 2016.
[12] Q. Hu, J. Liu, and D. Yu. Mixed feature selection based on granulation and approximation. Knowledge-Based Systems, 21(4):294–304, 2008.

延伸閱讀