在電腦視覺領域中,人體的相關檢測一直有舉足輕重的地位。而「頭部姿態估計」模型,能為人體面部提供重要資訊,更是十分重要。 試想對於螢幕廣告的投放者,若能高準確率地判斷消費者人臉對於畫面的視角與專注,對於廣告成效的評估將極有幫助。 據我們檢視的文獻,目前現階段成熟的開源項目中,頭部姿態估計的還處於堪用而不好用的發展階段。除了不易達到基本要求之準確率,訓練模型的時間長、資料集標籤繁複,有些甚至要求人臉照片中數十個特徵點,對於GPU的計算量頗大。 本研究以開源的3D人臉資料集作訓練,設計一個真正高泛用性的模型,不採用任何特徵點,僅需要人臉姿態角度的資料。為強化網路對於人臉輪廓與五官特徵的捕捉,我們設計了深度學習網路中的「注意力機制」(Vision Attention Mechanism),這是一個能夠自動學習影像中重要區塊的權重張量,後續透過我們的視覺化熱圖,得以了解張量學習到的重點特徵。 而本研究中的特徵擷取網路—「雙層串流網路」,單獨計算卷積層僅有7層,在實驗佐證下,能夠比套用傳統預訓練模型與參數的研究,達到更高的效能。我們也揉合了文獻中的方法,改良分類演算方式為「多折分類法」,讓演算法更貼近人性與智能。
In the field of computer vision, the detection of human body has always been significant, and the "Head Pose Estimation" model, which can provide important information for human face, is very important, too. For the advertisers of on-screen advertisements, it will be very helpful to evaluate the effectiveness of advertisements if the model can estimate the vision of consumers with high accuracy. In open source projects, the head pose estimation is still in the development stage which is bare but not good to use. In addition to the accuracy that is not easy to reach the baseline, the training time of the model is long, and the label of the dataset is too complex. Some even require dozens of feature points in the face images, which requires a lot of calculation for GPU. In our study, open source 3D face dataset is used for training. In order to design a truly general model, only head pose angle data are needed. So as to enhance the extractor of facial features, we implemented the "Vision Attention Mechanism" in the deep learning network, which can automatically learn the weight of important pixel in the image. Besides, the feature extractor, called "Double-layer Streaming Network", has only seven convolution layers. The experimental results show that it can achieve higher efficiency than applying the pre-trained model and weights. We are also inspired by the references, and improved the classification algorithm to "Multi-Layer Classification", so as to make the algorithm closer to human intelligence.