利用特徵拆分及自動訓練數據擴增進行非監督式人臉辨識模型學習

隨著智慧家庭、健康照護、以及居家機器人的蓬勃發展，發展出可以辨識特定小群體（如家中成員）之人臉、並且能夠快速適應環境變化（例如由環境照明或攝影機架設位置所引起的人臉外觀變化）的人臉辨識系統為現今電腦視覺或是機器人領域裡十分重要的研究議題。在這篇論文中，我們以一個針對特定人群拍攝的影片來模擬智慧家庭中的監控影片，並提出了一個創新的方法來進行非監督式學習人臉辨識模型，其中包含兩個主要的元素：（１）三元網路（Triplet Network），其用來提取與人臉身份相關的人臉特徵，並透過對於此特徵之聚類來達到人臉辨識的效果；（２）擴增網路（Augmentation Network），在給定與身份相關之特徵為條件之下，其能夠學習並合成出更多的人臉樣本。特別的是，我們利用影片中人臉出現之時間與空間的特性來獲取三元網路的訓練數據；而擴增網路則學習到如何將人臉影像拆分成與身份相關以及與身份無關的特徵，從而能夠生成出屬於同一個身份但多變樣貌（如相異之頭部姿態或妝髮）的人臉影像。藉由利用擴增網路所產生之更豐富的人臉影像，我們可以進一步精煉三元網路使其達到更精確且具更佳泛化性的人臉辨識能力。經由大量且系統性的實驗佐證，不僅展現了我們所提出之模型能夠在非監督式的設定下學習到特定環境的人臉辨識能力，並且也驗證了其在各種外觀或場景變化下所具備之優越的適應能力。

關鍵字

人臉辨識；資料擴增；拆分；無監督學習

並列摘要

As the growth of smart home, healthcare, and home robot applications, learning a face recognition system which is specific for a particular environment and capable of self-adapting to the temporal changes in appearance (e.g., caused by illumination or camera position) is nowadays an important topic. In this thesis, given a video of a group of people, which simulates the surveillance video in a smart home environment, we propose a novel approach which unsupervisedly learns a face recognition model based on two main components: (1) a triplet network that extracts identity-aware feature from face images for performing face recognition by clustering, and (2) an augmentation network that is conditioned on the identity-aware features and aims at synthesizing more face samples. Particularly, the training data for the triplet network is obtained by using the spatiotemporal characteristic of face samples within a video, while the augmentation network learns to disentangle a face image into identity-aware and identity-irrelevant features thus is able to generate new faces of the same identity but with variance in appearance. With taking the richer training data produced by augmentation network, the triplet network is further fine-tuned and achieves better performance in face recognition. Extensive experiments not only show the efficacy of our model in learning an environment-specific face recognition model unsupervisedly, but also verify its adaptability to various appearance changes.

並列關鍵字

Face Recognition ； Data Augmentation ； Disentanglement ； Unsupervised Learning

參考文獻

[1]Shun Zhang et al. “Tracking persons-of-interest via adaptive discriminative features”. In:Proceedings of the European Conference on Computer Vision (ECCV).2016.

Google Scholar

[2]Chung-Ching Lin and Ying Hung. “A prior-less method for multi-face tracking in unconstrained videos”. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018.

Google Scholar

[3]Jianmin Bao et al. “CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training”. In:Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2017.

Google Scholar

[4]Mohammed E Fathy, Vishal M Patel, and Rama Chellappa. “Face-based active authentication on mobile devices”. In:Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015.

Google Scholar

[5]David Crouse et al. “Continuous authentication of mobile user: Fusion of face image and inertial measurement unit data”. In:2015 International Conference on Biometrics (ICB). 2015.

Google Scholar

國際替代計量

利用特徵拆分及自動訓練數據擴增進行非監督式人臉辨識模型學習

全文下載

主題瀏覽