透過您的圖書館登入
IP:3.140.185.147
  • 學位論文

基於人臉先驗與感知損失之人臉強化影像超分辨率

Face-Enhanced Single Image Super-Resolution Based on Facial Priors and Perceptual Loss

指導教授 : 林嘉文

摘要


近年來,深度學習在圖像超分辨率任務取得了出色的表現,包含單圖像對一班物件的超解析率,以及針對人臉的人臉超分辨率。人臉超分辨率被視為超解析率任務的一個特例,所以其使用的方法與一般超解析率並不相同。然而,在真實場景中,有許多同時包含人臉跟其他物體的圖片,像是在會議室以及像Facebook及Instagram 等社群網路上傳的圖片。雖然人臉在圖片中是重要的部分,但其他地方像是背景或一般物體有時也很重要。換句話說,我們希望可以增強人臉的部分,同時保持其他地方的品質。在大多數超分辨率的方法中,處理人臉的方法跟一般物體的方法並不相同,所以必須分別對對應的地方做處理。但在實際應用中,用兩個不同模型對同一張圖片處理並不是個好方法,因為這需要為同一個目標付出雙倍的運算資源。換句話說,其實我們可以用一個模型處理這兩種類似的任務。 在我們的方法中,我們考慮了人臉先驗,使用人臉相關任務的預訓練模型計算感知損失,並把這兩者結合在單圖像超分辨率方法中。人臉先驗幫助我們強化圖像中人臉的部分; 用FaceNet計算感知損失比起過往使用VGG網路更為有效; 使用單圖像超分辨率方法作為基礎模型幫助我們從低分辨率輸入圖像提取特徵。結果顯示我們強化了圖像中人臉的部分,同時保持了在一般物體的表現,並且可以實時處理單張圖像。

並列摘要


Recently, deep learning has achieved great performance in image super-resolution (SR) task, including single-image super-resolution (SISR) for general SR task and face super-resolution for human face images. Face super-resolution task is seen as a special case for general super-resolution so that different methods are used for these two tasks. However, in real-world scenario, there are many images that including human faces and the objects, such as images in meeting room or taken for social media like Facebook and Instagram. Although human faces are the important part in the image, but the other part like background or objects sometimes are also important. In other words, we want to enhance the quality of human faces in the images, but keep the quality at the other part. In the most SR methods, they handle human faces and general objects as different tasks so that if we want to enhance human faces part we have to use face SR method and the other part general SR method. But in reality, using two models to deal with the same image is a bad solution because this solution means we use almost double computational resources for the same goal. In other words, we can handle these two similar tasks in single model. In our work, we put into facial prior knowledge into consideration, use pretrained model for human-face-related tasks to calculate perceptual loss, and combine them into general single image super-resolution method. Using facial prior helps us to refine human faces in the image; using perceptual loss from FaceNet[1] is more effective and efficient than using VGG[2] network; and using single image super-resolution method as our backbone model helps us extracting features in input low-resolution image. Results show that we enhance human faces part in the image, keep the quality in the other part, and can deal with single image in real-time.

參考文獻


[1] F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Representations, 2015.
[3] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[4] Y. Chen, Y. Tai, X. Liu, C. Shen, and J. Yang, “FSRNet: End-to-end learning face super-resolution with facial priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018.
[5] Chunwei Tian, Yong Xu, Wangmeng Zuo, Bob Zhang, Lunke Fei,and Chia-Wen Lin. "Coarse-to-fine cnn for image super-resolution." IEEE Transactions on Multimedia (2020).

延伸閱讀