基於多屬性稀疏編碼之人體動作與人臉表情辨識

找到一個”好的”訊號表示方式，讓其可以更精準的表達訊號本身的結構、樣式與訊號間的關係一直都是研究人員致力研究的主題。近年來，稀疏編碼技術(sparse coding)在擷取訊號特性的優越性因而越來越受到矚目。經由更進一步的考慮群體特性，稀疏編碼技術在群體階層上產生良好的編碼結果。然而，個別的物體或是動作通常包含許多的資料屬性來描述物體或動作的特性。以動作辨識而言，動作可能包含了不同的視角、姿態與明暗情況。而傳統的稀疏編碼技術無法充分的利用這些特性來獲得更好的效能。因此，在這篇論文中，我們提出多屬性稀疏編碼技術，透過資料屬性的限制產生更好的結果來對具有多屬性的動作與人臉表情進行辨識。對動作辨識而言，我們首先執行一個以過度分割(over-segmentation)為基礎的背景模型建立與前景切割以獲得人們在執行動作的外型輪廓。接著我們計算多區間的運動歷史影像(motion histogram image, MHI)來表達動作過程中的變化。而具有多屬性的動作可以用多個個別的屬性矩陣來描述這些屬性。這些屬性矩陣之後被整合到稀疏編碼的l_1最佳化式子中。透過這些資料屬性的矩陣，在選擇表示資料的基底時限制與強迫從具有相同屬性的群體中選出，藉以獲得更有效的資料表示結果。特別的是，我們的方法也適用在當訓練資料中只有少部分的資料是知道屬性的狀況(partially label)。此外，我們進一步的延伸多屬性稀疏編碼技術，結合人臉上的動作單元(Action Units)來進行人臉表情辨識。動作單元不僅可以被表示為個別的屬性矩陣來描述人臉表情的群體特性，也可以做為一個在挑選基底時的限制，因為相同的人臉表情應該會具有非常類似的動作單元組合。而這些群體的限制與動作單元組合相似度的限制，都一起被整合到稀疏編碼的l_1最佳化式子中來辨識人臉表情。我們透過實驗在多個不同的公開多視角人體動作資料庫與人臉表情資料庫來展示我們方法的有效性與強健性，並獲得了很好的結果。

關鍵字

多屬性稀疏編碼；動作辨識；人臉表情辨識；背景去除

並列摘要

Sparse coding technique has been proved to be very effective in extracting global features from signals for several different applications. Furthermore, the sparse representation was designed to produce sparse solution at the group level by considering group structure of training images. However, distinctive objects or different action videos usually contain multiple data attributes which are high-level descriptions about the properties of objects or actions. For the action recognition problem, action video may contain multiple attributes, such as different types of viewing angle, pose and illumination. Such multi-attribute properties cannot be fully exploited by the group lasso method since it is not designed to handle multiple attributes. In this thesis, we propose multi-attribute sparse representation based method enforced with group constraint for the action recognition and facial expression recognition problems which contain multiple data attributes. For the action recognition problem, an over-segmentation based background modeling and foreground detection approach is employed to extract silhouettes from action videos firstly. Then, multiple time intervals of the motion history images are computed to capture motion and pose information in human activities. Actions with multiple attributes can be represented by individual attribute matrices to describe group property for each action instance. These attribute matrices are incorporated into the formulation of l_1-minimization. The sparsity property as well as the group constraints makes the basis selection in sparse coding more efficient in term of accuracy. Especially, our approach is able to operate under the condition of partially labeled attributes in the training data. Furthermore, we integrate action units (AUs) information and multi-attribute sparse coding for facial expression recognition. AUs not only can be represented by an individual attribute mask to describe group property for each facial expression video, but also as a constraint to enforce that the same facial expressions should have very similar AUs. The group constraint makes the basis selection in sparse coding more efficient and the AU similarity constraint penalizes selecting the dictionary atoms with distance far away the target instance. These groups constraint and the AU similarity constraint are incorporated into the formulation of l_1-minimization to recognize facial expression. We will demonstrate the proposed multi-attribute sparse coding based method through experiments on several public multi-view human action datasets and facial expression datasets to show the effectiveness and robustness of the proposed method.

並列關鍵字

multi-attribute sparse coding ； human action recognition ； human expression classification ； background subtraction

參考文獻

[1] M. Elad and M. Aharon, “Image denoising via learned dictionaries and sparse representation,” IEEE conference on Computer vision and Pattern Recognition, 2006.

[3] J. Mairal, M. Elad, and G. Sapiro, "Sparse Representation for Color Image Restoration," IEEE Transactions on Image Processing, vol.17, no.1, pp.53,69, Jan. 2008

[5] Q. Qiu, Z. Jiang, and R. Chellappa, “Sparse dictionary-based representation and recognition of action attributes,” IEEE International Conference on Computer Vision, 2011.

[6] J. Yang, K. Yu, Y. Gong, and T. Huang,” Linear spatial pyramid matching using sparse coding for image classification,” IEEE Conference on Computer Vision and Pattern Recognition, 2009.

[7] C.-K. Chiang, T.-F. Su, C. Y, and S.-H. Lai, “Multi-attribute sparse representation with group constraints for face recognition under different variations,” IEEE International Conference on Automatic Face and Gesture Recognition, 2013.

被引用紀錄

紀權真（2007）。門扇構造對木質防火門耐火性能之影響〔碩士論文，國立屏東科技大學〕。華藝線上圖書館。https://doi.org/10.6346/NPUST.2007.00011

國際替代計量

基於多屬性稀疏編碼之人體動作與人臉表情辨識

全文下載

主題瀏覽