透過您的圖書館登入
IP:3.129.22.135
  • 學位論文

輔以注意機制和臉部關鍵點偵測之深度卷積網路應用於戴有口罩人臉的頭部姿態估計

A Deep Convolutional Network for Head Pose Estimation of Humans Wearing Facial Masks Enhanced by Attention Mechanism and Landmark Detection

指導教授 : 傅立成
共同指導教授 : 蕭培墉(Pei-Yung Hsiao)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,頭部姿態估計任務在電腦視覺領域愈來愈受重視和關注,並應用深度卷積網路取得了顯著性的進步,其任務在於預測影像或是影片中人臉的三維角度資訊。對於許多任務而言,準確了解頭部姿態估計資訊是相當重要,像是用來監督乘客和駕駛者的狀態以防止行車意外事故發生、在人機互動系統中判斷頭部的狀態來做對應的指令動作。由於當今受COVID-19疫情的影響,人們在公眾場合中需佩帶口罩,即使是在車上密閉空間中也都是需要佩帶。先前的頭部姿態估計研究在人臉遮蔽情境下仍是具有挑戰性。因此,如何解決口罩遮蔽問題變得相當重要。 本論文針對人臉口罩遮蔽情境下的頭部姿態估計問題提出對應之解決方案,我們設計基於端對端訓練的深度卷積網路架構之深度學習模型架構,並加入注意機制模組用來增強區域特徵和全域特徵中重要的資訊。另外,使用特徵插值正規化模組和多任務學習策略來優化模型學習到的特徵和從臉部關鍵點偵測任務中學習額外的資訊來提升模型效能和強健姓。此外,為了解決原本資料集上較少口罩遮蔽的情境,我們使用資料擴增技術來生成人臉口罩資料以輔助模型學習。 為了驗證本研究的可行性,本研究使用了頭部姿態估計公開訓練集300W-LP和BIWI訓練模型,並於測試資料集AFLW2000、BIWI和MAFA進行評估。我們首先對設計的模組進行消融研究,以證明提出的方法能提升關注任務的效能。其次,與其他先進的方法進行數據上的比較,由實驗結果顯示本方法獲得了具有競爭力的結果。

並列摘要


In recent years, using deep convolutional networks to estimate head pose accurately has gained significant interest in computer vision. The aim of the head pose estimation task is to predict the three-dimensional orientation information of human faces in images or videos. For many applications, precisely realizing head pose estimation information is essential and beneficial, such as monitoring passengers' and drivers' status to prevent traffic accidents and determining the human faces' status to ensure the appropriate command in the human-computer interaction systems. Recently, due to the impact of the COVID-19 pandemic, people need to wear facial masks in almost all the public places, sometimes even including the interior of a vehicle, but the previous researches on head pose estimation have become even more challenging in face occlusion situations. Therefore, how to solve this challenging situation becomes quite important. In this thesis, we propose a solution to tackle the head pose estimation task, which can be more robust in the facial mask situation. Therefore, we design a deep learning model through end-to-end training and incorporate the attention mechanism to enhance the critical information on local and global features. In addition, we introduce the feature interpolation regularization module and multi-task learning strategy to optimize the feature embedding for head pose estimation and to learn additional information from the facial landmark detection task for performance improvement and model robustness. Furthermore, in order to solve the situation where the original dataset is short of data samples with facial masks, we synthesize the samples with facial masks as a way of data augmentation during training for model learning. To validate the proposed research, the model is trained on the public dataset BIWI and 300W-LP for head pose estimation, and is tested on the three datasets, AFLW2000, BIWI, and MAFA datasets. Our model will be first evaluated in different configurations to determine whether the proposed approach is effective. Second, through extensive experiments comparing our work with previous competitive methods, our proposed method has been shown to perform highly promisingly on these datasets.

參考文獻


[1] Xiao Li, Dong Zhang, Ming Li, and Dah-Jye Lee. Accurate head pose estimation using image rectification and a lightweight convolutional neural network. IEEE Transactions on Multimedia, pages 1–1, 2022.
[2] Prajval Kumar Murali, Mohsen Kaboli, and Ravinder Dahiya. Intelligent in-vehicle interaction technologies. Advanced Intelligent Systems, 4(2):2100122, 2022.
[3] Hai Liu, Tingting Liu, Zhaoli Zhang, Arun Kumar Sangaiah, Bing Yang, and Youfu Li. Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Transactions on Industrial Informatics, 18(10):7107–7117, 2022.
[4] Jamie Sherrah, Shaogang Gong, and Eng-Jon Ong. Understanding pose discrimination in similarity space. In BMVC, 1999.
[5] Jamie Sherrah, Shaogang Gong, and Eng-Jon Ong. Face distributions in similarity space under varying head pose. Image Vis. Comput., 19:807–819, 2001.

延伸閱讀