近年來,人體動作辨識可被應用在許多領域,像是居家照顧、監控系統以及人機互動,而人體骨架資訊不會輕易地被光線變化所影響,且較可以描繪出人體的輪廓,這些因素使得基於骨架的人體動作辨識正蓬勃發展。 本論文提出一個多分支架構來辨識基於骨架的人體動作,首先,將抓到的骨架座標正規化,以解決多變化視角的問題。接下來,我們對已正規化的三維骨架資訊上作三種不同的骨架視覺化,進一步將三維的資訊轉成二維的影像串列,這些利用不同方法產生的二維影像彼此之間會互補。接著,我們將這些影像輸入進獨立的三維卷積神經網路來做分類,最後融合不同分支的結果來得到最終的預測分類。我們使用NTU RGB+D資料庫進行實,其結果說明我們所提出的方法能有非常好的結果。
In recent years, skeleton-based action recognition has been widely developed since action recognition can be applied on home caring system, intelligence surveillance, and human-computer interaction. Compared to RGB data, skeletal data is more insensitive to illumination changes and more reliable to estimate body silhouettes. In this thesis, we propose a multi-stream architecture to handle skeleton-based action recognition. First, the fetched skeletal coordinates are regularized to resolve the view-variant problem. Second, the 3D regularized skeleton data are transformed to 2D image sequences by using proposed visualization approaches: skeleton visualization, part decomposition, and view composition. The images generated by different visualization methods are complementary to each other. Third, each type of images has one dedicated 3D ConvNets which utilizes both spatial and temporal features for human action recognition. The final result depends on the score fusion of multiple 3D ConvNets. Experiments conducted on NTU RGB+D dataset demonstrate the superiority of our proposed method.