基於監督式機器學習方法之影像內容分析系統之演算法及硬體架構設計實作

隨著半導體產業的蓬勃發展，現在已經有越來越多的功能被整合到消費型電子產品(Consumer Electronics)上。例如，基本通訊功能、高速網路連結、高解析度影像感測器(CMOS Sensor)、高容量儲存裝置以及智慧型人機互動介面等。這些功能，使得消費型電子產品上所儲存的多媒體資料量越來越大。為了提供有效的多媒體資料檢索，特別針對影像資料。如何將這些資料中的語意資訊即時擷取出來並有效運用，成為一個亟需解決的問題。而在處理多媒體內容分析語意擷取時，機器學習演算法扮演重要的角色。此外，針對嵌入式系統，以往所使用的中央處理器(CPU)或特定應用積體電路(ASIC)，皆無法同時滿足多媒體內容分析所需的彈性(Flexibility)及效能(Performance)。因此，對於下一個世代的應用，我們需要新的設計方法(Design methodology)來提供不同使用者所需的彈性及效能。在本論文中，我們提出了高斯混和模型(Gaussian Mixture Model)以及多類別支持向量機(Multi-class Support Vector Machine)機器學習演算法的硬體架構，用以加速多媒體內容分析的影像語意處理，以及概念特徵(Concept feature)擷取的過程。利用從局部到整體資訊集合方式的概念特徵擷取，影像區塊(patch)可以透過如高斯混和模型或是多類別支持向量機機器學習演算法做分析，將影像區塊的低階特徵(low-level feature)分類至事前定義好的概念類別中。利用蒐集整張圖所有影像區塊的概念分類資訊，影像的語意概念特徵便可以被擷取出來，用以代表整張圖。如此的映射過程，便成為低階特徵與人類感官語意感受之間差距的橋樑。過程中所需要的密集運算，成為了資源有限的嵌入式系統的負擔，因此我們提出針對這個問題的解決方法。在我們提出的高斯混和模型硬體架構中，我們利用不同的平行度以及摺疊(folding)的設計方法達到良好的加速及彈性，達到每個週期完成一個高斯分布所需要運算的處理能力。由於在高斯混和模型演算法中，每個分類器(classifier)所需要用到的高斯分布的數量可能不同，因此我們提供的一次處理一個高斯分布的作法，將能更有彈性及效率的符合不同使用者需求。在完整的分析下，我們提出多類支持向量機的硬體架構設計方法，並提供一個符合硬體成本及即時分析處理能力取捨(trade-off)的硬體架構原型，而這個架構經由可重組架構(reconfigurable structure)的最佳化來提供更高的彈性，並提供三種不同運作模式供使用者依照不同需求做選擇，同時能讓記憶體的使用更有效率。我們所提供的彈性包含：(1)三種不同的核心函數(kernel function)、(2)大範圍的參數值、(3)可調整的位元精確度以及(4)兩種不同的運算速度模式。而當支持向量的數量超過可以儲存於晶片上記憶體(on-chip memory)時，也可以利用我們所提出的重載(reload)分析結果，來針對記憶體做改善，進而支援不同重載狀況。

關鍵字

多媒體內容分析；語意分析；監督式機器學習演算法；硬體架構設計；高斯混和模型；支持向量機

並列摘要

Due to the development of semiconductor technology, a Consumer Electronics(CE) product with huge storage device might include different functionalities besides basic communication, such as taking or storing photos. This makes the amount of multimedia data stored on these products very large. This large amount of data has to be accessed intelligently, and thus managing multimedia content becomes an urgent task. To enable efficient data management, the semantic information of the multimedia content has to be extracted for further manipulation, and machine learning algorithms play an important role in this area. In embedded systems for CE products, the traditional CPU and ASIC cannot satisfy both the flexibility and performance based on their architectures, so the exploration of new design methodologies and solutions are needed for next-generation applications. In this thesis, the hardware architectures of the Gaussian Mixture Model (GMM) and multi-class Support Vector Machine (SVM) machine learning algorithms are proposed to accelerate the image semantic processing and concept feature extraction process in multimedia content analysis. By adopting the local to global concept feature extraction method, the low-level features of the image patches are analyzed using the machine learning algorithms, such as GMM or SVM, and thus the patches can be classified to the pre-defined concept classes. After gathering the classification results of the blocks from the whole image, the semantic concepts can be extracted to represent the image. The mapping process bridges the gap between the low-level feature representation and human perception. Since the computations involved in this process are intensive and burdens the resource limited embedded system, the proposed hardware acceleration schemes are used to deal with this problem. The proposed GMM hardware architecture provides high speed-up and good flexibility by combining the parallelism and folding design technique in different levels. The system can process the computations involved in one Gaussian in only one cycle. Since in the GMM algorithm, each classifier that models the data in one class might have different number of Gaussian distributions, it is more efficient to fold the hardware in the class level to support one Gaussian' s computation at once. By doing so, the user will have more flexibility to set the number of Gaussians per class and the number of classes desired. The proposed multiclass SVM hardware architecture is designed under thorough analyses to meet the trade-off between hardware costs and real-time processing demand. The design is further optimized by the reconfigurable structure to provide different operating modes to satisfy the users' various demands and make good use of the memories. The flexibility includes the three kernel functions, the wide range of the value of parameters, adjustable bit-precision with run-length encoding, and two operating speed modes. When the number of support vectors are too large to be stored, the proposed reload scheme can also be adopted to handle this scenario. In short, the contribution of this thesis consists essentially of a flexible high throughput GMM hardware architecture for image semantic processing and a multi-class SVM hardware architecture design methodology with an optimized reconfigurable prototype for real-time multimedia content analysis. Thorough analyses of the SVM hardware architecture to deal with different scenarios using the reconfigurable hardware architecture are also shown and discussed. The contents of this thesis can be regarded as a series of solutions to the implementation of the hardware architecture of supervised machine learning algorithms, such as GMM and SVM, for multimedia content analysis in CE products.

並列關鍵字

multimedia content analysis ； semantic processing ； supervised machine learning algorithm ； hardware architecture ； Gaussian mixture model ； support vector machine

參考文獻

[35] Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo,“Evaluating bag-of-visual-words representations in scene classification,” in Proceedings of International Workshop on Multimedia Information Retrieval, 2007, pp. 197–206.

[45] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang, “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys, vol. 40, no. 5, April 2008.

[37] Mark J. Huiskes and Michael S. Lew, “The MIR flickr retrieval evaluation,” in Proceeding of the 1st ACM international conference on Multimedia information

[2] Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cedric Bray, "Visual categorization with bags of keypoints," in Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 1–22.

[22] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001, Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

國際替代計量

基於監督式機器學習方法之影像內容分析系統之演算法及硬體架構設計實作

全文下載

主題瀏覽