透過您的圖書館登入
IP:3.149.244.86
  • 學位論文

基於深度學習之人臉辨識與分群系統

Deep Learning Based Face Clustering and Recognition System

指導教授 : 杭學鳴

摘要


人臉辨識和分群技術在過去幾年變得越來越普遍與流行。隨著圖片上傳量的增加,以人工方式對人臉照片進行分群變得越加困難,所以需要有一定程度的自動化來執行這項工作。人臉分群的目標是,將相似的人臉照片歸在同一類。本論文的研究目標是設計一個能在僅具有限硬體資源之設備(如手機、硬碟等)上運行的人臉分群系統。因此,我們需要輕便且高效率的演算法。 本論文的研究涵蓋了多種基於深度學習的分群演算法、資料集和評估指標。在基本系統架構中,我們探討了Google提出的深度學習架構FaceNet [1]。該架構由基於GoogLeNet之Inception模型的卷積層所組成。整個分群系統架構由三大部分組成:人臉偵測、特徵表示和人臉分群相似度的選擇。FaceNet使用三對比損失函數進行訓練,對每張人臉輸出一個512維的特徵向量,使用特徵向量可以有效地對人臉進行分群。此外,我們也研究了幾個先進的人臉辨識方法,包括FaceNet、ASPL [13] 、Deep Face [5]、Re-Identification [4]、MobileFaceNet [14]和Siamese Neural Network [6]。 多數近期所提出的人臉辨識架構都會以產出人臉特徵,作為比對用之低維度向量。我們所採用的三對比損失函數,曾被廣泛使用於人臉辨識和分群。這些架構在擁有巨量人臉照片的資料集(CASIA、Asian等)上訓練,並以資料集Labelled Faces in the Wild(LFW)進行測試。分群演算法Chinese Whispers的表現優於大多數現有的人臉分群方法。這些候選演算法以Adjusted Rand(ARI)指數進行評估。最後用支持向量機(SVM)對新添加的人臉照片進行分類。我們獲得了亮眼的結果,並將呈現於內文的模擬結果中。在輸入照片中所捕捉的人臉不一定總是正面的,它們可能被旋轉過、具亮度差異、人臉年齡變化等,這些都會降低分群的表現。在本論文中,我們嘗試了幾種多階段或多重迭代的漸進式步驟,以逐步提高被分類之人臉資料集的正確性。在未來,我們將探索資料擴增的方法,並研究漸進式訓練(包含年齡變化來訓練深度模型),與嘗試當前深度學習分群方法,讓此架構更加穩定有效。

並列摘要


Face recognition and clustering has become increasingly widespread and popular in past a few years. As the number of pictures uploaded is increasing, manual examination of pictures to cluster based on faces becomes difficult. Hence, some degree of automation is needed. Face clustering is a method to group similar faces together. The goal of this research is to design the system architecture best suited for our purpose of clustering faces on pictures uploaded on a device with limited hardware capabilities (such as mobile phones, hard disks etc.). Hence, for this reason, we need a light and efficient algorithm. In this study, several deep learning based clustering algorithms, datasets, and evaluation metrics has been considered. We start with a deep learning architecture proposed by Google called FaceNet [1] consisting of convolutional layers based on GoogLeNet inspired inception models have been explored. The clustering problem is composed of three key components: face detection, feature representation and choice of similarity for grouping faces (clustering). FaceNet returns a 512-dimensional vector embedding for each face and uses a triplet loss function for training to effectively identify the same person’s face. In addition, several state-of-the-art methods, including FaceNet, ASPL[13], Deep Face [5], Re-Identification [4], MobileFaceNet[14] and Siamese Neural Network [6] and several others [21][22][23][29] have been studied. In most of these proposed methods, the architecture has been designed to represent a face as a low-dimensional embedding vector, although they differ in the way they produce these embeddings. Triplet loss remains the most popular choice of loss functions [1][18] for the purpose of face recognition and clustering on the feature space produced by deep neural network. The architecture is trained with a dataset comprising huge set of images belonging to numerous classes (CASIA, Asian etc.), and is tested over Labelled Faces in the Wild (LFW) and Asian dataset. Among the existing face clustering methods, Chinese Whispers clustering algorithm has the most superior performance. These candidates have been evaluated using the Adjusted Rand index (ARI). Newly added images are classified using the Support Vector Machine (SVM) technique. Notable results have been obtained, which are shown by the simulations in this thesis. All possible user scenarios have been considered and a suitable progressive system pipeline has been designed based on multiple simulations. Faces in the input images are not always frontal; they might contain rotated faces, show wide variations in illumination or differ a lot with the age of the person, which degrades the clustering performance. In this study, several multi-stage and multi-iterative simulations have been performed to gradually improve the correctness of classified face data sets. Using the proposed system pipeline, high ARI accuracy on different user scenarios has been achieved. Simulation results show that the progressive training approach makes the system more user-centric, and increases the overall robustness and accuracy of the system.

參考文獻


[1] F. Schroff, D. Kalenichenko, and J. Philbin. “Facenet: A unified embedding for face recognition and clustering.” in Computer Vision and Pattern Recognition, pages 815–823, 2015.
[2] K Zhang, Z Zhang, Z Li: “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks.” in IEEE Signal Processing Letters. 23(10), 1499-1503, 2016.
[3] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” in CVPR, pp. 5325–5334, 2015.
[4] H. Fan, L. Zheng, and Y. Yang. “Unsupervised person re-identification: Clustering and fine-tuning”. arXiv preprint arXiv:1705.10444, 2017.
[5] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification.” in Computer Vision and Pattern Recognition, 2014.

延伸閱讀