演講者姿勢的偵測、辨識與追蹤

本文提出演講者姿勢之辨識系統，其原先之目的為提供目前實驗室正在發展中之自動化演講錄製系統有關的演講者姿態資訊，將姿態資訊結合其他的資訊，可使演講錄製系統中之演講者攝影師次子系統達到自動化運鏡取景的效果。本研究的輸入資料為演講者攝影師次子系統中之KINECT感測器所提供之演講者深度影像。此種影像的好處是不受光線強弱的影響，這點對於演講環境相當重要；又有不受人物衣著的顏色及紋理所影響；也可避免因尺寸大小不同所引起的干擾。本研究的辨識系統是基於像素的深度比影像特徵(depth comparison image features)，採用隨機森林(Random forests)技術，將輸入影像中之人物區域的像素歸類至不同的身體部位。當影像中人物的身體部位完成指定後，利用極小方框(Minimum bounding box)技術框選身體部位並取得身體部位中心點。之後，姿勢即由身體部位中心點座標來表示。接下來，將偵測到的姿勢與一組事先建立好的姿勢高斯混合模組(Gaussian mixture models, GMM)作比對，比對結果最佳的姿勢模組，系統即認為偵測的姿勢是屬於該模組所對應的姿勢類別。後續輸入的人物深度影像，以粒子過濾(Particle filter)技術來持續的追蹤姿勢。在本研究中辨識兩種姿勢，分別為手舉起與手彎曲的姿勢。在隨機森林的訓練過程，將訓練400張人物深度影像和其對應的身體部位標籤影像。並實際在演講廳拍攝演講者的深度影片來分析姿勢辨識率。透過實驗分析得知研究的姿勢辨識率約為90%。

關鍵字

姿勢辨識；隨機森林；混合高斯模型

並列摘要

In the paper, a technique for identifying the poses of a lecturer is presented. The identified poses together with other sources of information will automatically direct a PTZ camera to capture appropriate videos for the lecturer. The videos taken are to be used in an automatic lecturer recording system that is currently under development in our laboratory. The input data to our system are depth images provided by a KINECT sensor. For each input image, the pixels of the lecturer are first segmented into body parts. This is achieved using a Random forest based on the depth comparison image features of pixels. The centers of body parts are next determined using a Minimum bounding box technique. A pose of the lecturer is described in terms of the centers of parts. The detected pose is recognized by matching with a set of prebuilt GMM (Gaussian mixture models) pose models. Once a pose is recognized, it is tracked over the subsequent video sequence using a hybrid approach of motion tracking and Particle filtering. A large number of real depth image sequences were examined. Experimental results revealed the feasibility and reliability of the proposed pose recognition system.

並列關鍵字

pose recognition ； Random forests ； Gaussian mixture models (GMM)

參考文獻

[Bre98] C. Bregler and J. Malik, “Tracking People with Twists and Exponential Maps,” IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8-15, 1998.

Google Scholar

[Che11] C. Chen, Y. Zhuang, F. Nie, Y. Yang, F. Wu, and J. Xiao, “Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor,” IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 11, pp. 1676-1689, Nov. 2011.

Google Scholar

[Che13] H. T. Chen, Y. Z. He, C. L. Chou, S. Y. Lee, B.-S.P. Lin, and J. Y. Yu , “Computer-Assisted Self-training System for Sports Exercise Using Kinects,” IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1-4, 2013.

Google Scholar

[Cho16] S. Cho, W. S. Kim, N. J. Paik, and H. Bang, “Upper-Limb Function Assessment Using VBBTs for Stroke Patients,” IEEE Computer Graphics and Applications, vol. 36, no. 1, pp. 70-78, 2016.

Google Scholar

[Con95] L. Concalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular Tracking of the Human Arm in 3D,” In Proc. of Intl. Conf. on Computer Vision, 1995.

Google Scholar

主題瀏覽