透過您的圖書館登入
IP:18.117.183.172
  • 學位論文

高效能頭部姿態估計與深度學習網路設計

Efficient Head Pose Estimation and Deep Learning Network Design

指導教授 : 丁肇隆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在電腦視覺領域中,人體的相關檢測一直有舉足輕重的地位。而「頭部姿態估計」模型,能為人體面部提供重要資訊,更是十分重要。 試想對於螢幕廣告的投放者,若能高準確率地判斷消費者人臉對於畫面的視角與專注,對於廣告成效的評估將極有幫助。 據我們檢視的文獻,目前現階段成熟的開源項目中,頭部姿態估計的還處於堪用而不好用的發展階段。除了不易達到基本要求之準確率,訓練模型的時間長、資料集標籤繁複,有些甚至要求人臉照片中數十個特徵點,對於GPU的計算量頗大。 本研究以開源的3D人臉資料集作訓練,設計一個真正高泛用性的模型,不採用任何特徵點,僅需要人臉姿態角度的資料。為強化網路對於人臉輪廓與五官特徵的捕捉,我們設計了深度學習網路中的「注意力機制」(Vision Attention Mechanism),這是一個能夠自動學習影像中重要區塊的權重張量,後續透過我們的視覺化熱圖,得以了解張量學習到的重點特徵。 而本研究中的特徵擷取網路—「雙層串流網路」,單獨計算卷積層僅有7層,在實驗佐證下,能夠比套用傳統預訓練模型與參數的研究,達到更高的效能。我們也揉合了文獻中的方法,改良分類演算方式為「多折分類法」,讓演算法更貼近人性與智能。

並列摘要


In the field of computer vision, the detection of human body has always been significant, and the "Head Pose Estimation" model, which can provide important information for human face, is very important, too. For the advertisers of on-screen advertisements, it will be very helpful to evaluate the effectiveness of advertisements if the model can estimate the vision of consumers with high accuracy. In open source projects, the head pose estimation is still in the development stage which is bare but not good to use. In addition to the accuracy that is not easy to reach the baseline, the training time of the model is long, and the label of the dataset is too complex. Some even require dozens of feature points in the face images, which requires a lot of calculation for GPU. In our study, open source 3D face dataset is used for training. In order to design a truly general model, only head pose angle data are needed. So as to enhance the extractor of facial features, we implemented the "Vision Attention Mechanism" in the deep learning network, which can automatically learn the weight of important pixel in the image. Besides, the feature extractor, called "Double-layer Streaming Network", has only seven convolution layers. The experimental results show that it can achieve higher efficiency than applying the pre-trained model and weights. We are also inspired by the references, and improved the classification algorithm to "Multi-Layer Classification", so as to make the algorithm closer to human intelligence.

參考文獻


[1]. Xiangyu Zhu, Zhen Lei, Xiaoming Liu and Hailin Shi, Stan Z. Li. (2016). Face alignment across large poses: A 3d solution. Proceedings of the IEEE conference on computer vision and pattern recognition.
[2]. Adrian Bulat and Georgios Tzimiropoulos. (2017). How far are we from solving the 2d 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). Proceedings of the IEEE International Conference on Computer Vision.
[3]. Rasmus Rothe, Radu Timofte, Luc Van GoolComputer Vision Lab, D-ITET, ETH Zurich, Switzerland. (2015). Dex: Deep expectation of apparent age from a single image. Proceedings of the IEEE international conference on computer vision workshops.
[4]. Gabriele Fanelli, Juergen Gall and Luc Van Gool. (2011). Real time head pose estimation with random regression forests. CVPR 2011, IEEE.
[5]. Murphy-Chutorian, E. and M. M. Trivedi. (2008). "Head pose estimation in computer vision: A survey." IEEE transactions on pattern analysis and machine intelligence 31(4): 607-626.

延伸閱讀