多核學習於電腦視覺之應用

這篇論文旨在探討與處理擁有多種特徵表示之資料的電腦視覺應用，並利用不同資料特徵之間的互補性，來提高這些應用的效能。為了達成這個目的，我們提出一些多核學習的技術，並用於實現異質特徵表示的結合，以助於電腦視覺應用的完成。這篇論文共包含了三個部分，每一部分裡所發展出的多核學習技術均是各自獨立與自我包含的，但是所有的技術都具有一個共同的主題：每一種特徵表示下的資料均是用一個核矩陣來記錄其兩兩之間的關係，也就是核矩陣將成為一致的資料特徵表示。在這個設定下，不同的特徵表示將以核矩陣之型式來進行結合，另一方面，這也代表著這些電腦視覺應用將可與目前最佳的機器學習方法—核機器—有直接的連結。在論文的第一部分中，我們著重於將多特徵表示下的資料，用單一且簡潔的型式來表示，這是有鑑於多特徵表示下的資料通常是高維度的，且不同特徵表示往往具有不同型式的表示，把資料投影至一個低維度的歐氏空間將有助於許多電腦視覺應用的完成，如分類、辨識與分群。因此，我們所提出的方法(稱為MKL-DR)將多核學習演算法融入至降維分析中，多核學習是用於異質特徵的結合，而降維分析是用於投影資料至低維度空間。MKL-DR 有以下三個特色：首先，MKL-DR支援各種不同型式的特徵表示之使用，因此我們可以更精確地描述資料不同面象的特徵；第二，MKL-DR 讓許多現有之降維分析演算法可以同時處理多個核的學習，並增加這些降維分析演算法的效能。第三，透過不同的降維分析演算法之目標函式，MKL-DR 之模型增加了多核學習可應用的範圍，它將多核學習從原先僅能處理監督式學習之問題，擴展至可處理半監督式與非監督式學習的問題。在論文的第二部分中，我們提出區域合成核的概念，並用於調適性特徵結合。由於電腦視覺的應用常需要處理具高度變異性之資料，就辨識來說，最佳的資料特徵往往會隨資料的不同而改變，因此對於所有之資料僅學習出一組全域性的特徵結合，便無法對多種特徵作出最佳的利用。我們建議為每一筆的訓練資料學習一個區域分類器，而在學習這個區域分類器時，是針對以該筆資料的中心之鄰近區域來進行最佳化的，包括選取最佳的特徵結合與分類器參數。透過區域分類器的學習，我們達到了區域調適性特徵結合，然而這需要學習出跟訓練資料一樣多區域分類器，這在處理大型的資料集時，於計算時間上就變得不可行。為了解決這個問題，我們提出一個新的訓練流程，它將多個區域分類器的訓練過程轉化為一個多任務學習的問題，我們並發表了一個多任務提昇演算法來實現所提出的訓練流程，這個架構不但顯著地加速區域分類器的學習，同時也降低了過適現象的發生率。在論文的第三部分中，我們將監督式的多核學習技術運用在半監督式與非監督式的分群任務中。有別於前兩個部分，這個部分的難度根於非監督式分群的問題設定：我們沒有已標示的資料來推論與學習最佳合成核所應有的組成係數。為了解決這個問題，我們藉由一個最佳化問題來連結監督式的多核學習與非監督式的分群任務，此外我們亦發展出一個新的分群演算法，它能有效地從複雜的資料集中偵測出每群資料的一致性結構，並實現以群為單位的特徵選取。具體來說，我們把一群資料連結一個提昇分類器，這個分類器可實現多核學習，並用於選取最佳特徵組合來區隔屬於這群的資料與不屬於這群的資料，這些分類器的學習程序將融入於分群的任務中，並藉由疊代的方式來解整個最佳化問題。在最佳化的過程中，每群的結構將逐漸地透過所屬之分類器顯露出來，另一方面，這些分類器的效能亦藉著資料標示正確率的增加而提高。

關鍵字

多核學習；特徵結合；核方法；降維分析；區域學習；多任務學習；提昇演算法；物件辨識；影像分群；材質分類；人臉識別

並列摘要

In this thesis, we aim to better address several important computer vision problems when multiple feature representations of data are available. To achieve this goal, we have proposed novel multiple kernel learning (MKL) techniques to carry out feature combination and facilitate the underlying vision tasks. The thesis consists of three parts. The proposed approaches in the three parts are self-contained but highly correlative. The common theme among them is that kernel matrices will serve as the unified feature representations for data under different descriptors. Based upon this setting, feature fusion can be carried out in the domain of kernel matrices, while the applications will have a direct connection to kernel machines, the best off-the-shelf machine learning methodologies. In the first part of the thesis, we aim to provide a unified and compact view of data with multiple feature representations. It is motivated by the fact that the feature representations of data under various descriptors are typically high dimensional and assume diverse forms. Finding a way to transform them into an Euclidean space of lower dimension hence generally facilitates the underlying vision tasks, such as recognition or clustering. To this end, the proposed approach (termed as MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions. First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semisupervised ones. In the second part, we propose the local ensemble kernels for adaptive feature fusion. As data often display large interclass and intraclass variations in complex vision tasks, the optimal feature combination for recognition/clustering may vary form data class to class. Learning a global feature fusion may not make the most use of multiple features. We hence suggest to learn a set of local classifiers, and each of them is derived for one training sample and optimized in a way to give good classification performances for data falling within the corresponding local area. Since local classifiers as many as the training samples are required to learn in this setting, it is important to keep the computational cost feasible even if datasets of large sizes are considered. Specifically, we cast the multiple, independent training processes of local classifiers as a correlative multi-task learning problem, and design a new boosting algorithm to accomplish these tasks simultaneously and with higher efficiency. In the last part of this thesis, we illustrate how to integrate supervisedMKL techniques for feature combination into the unsupervised/semi-supervised clustering tasks. The intrinsic difficulty of this part results from the unsupervised nature of clustering: We have no labeled data to guide the searching of the optimal ensemble kernel over a given convex set of base kernels. Our key idea for handling the difficulty is to cast the tasks of supervised feature selection and unsupervised clustering procedure into a joint optimization problem. Besides, we describe a clustering approach with the emphasis on detecting coherent structures in a complex dataset, and consider cluster-dependent feature selection. Specifically, we associate each cluster with a boosting classifier derived from multiple kernel learning, and apply the cluster-specific classifier to performing feature selection cross various descriptors to best separate data of the cluster from the rest. We integrate the multiple, correlative training tasks of the cluster-specific classifiers into the clustering procedure. Through iteratively solving the joint optimization problem, the cluster structure is gradually revealed by these classifiers, while their discriminant power to capture similar data would be progressively improved owing to better data labeling.

並列關鍵字

multiple kernel learning ； feature fusion ； kernel method ； dimensionality reduction ； local learning ； multi-task learning ； boosting algorithm ； object categorization ； image clustering ； texture classification ； face recognition

參考文獻

[28] M. Everingham, A. Zisserman, C. K. I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results. http://www.pascalnetwork.org/challenges/VOC/voc2006/results.pdf, 2006.

[29] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html, 2007.

[30] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html, 2008.

[31] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascal-network.org/challenges/VOC/voc2009/workshop/index.html, 2009.

[32] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html, 2010.

國際替代計量

多核學習於電腦視覺之應用

主題瀏覽