基於多噪音模態與兩分支預測網路之深度偽造影片檢測

深度偽造技術(Deepfake)，用來指稱透過深度學習來做到人臉圖像合成目的的技術，近來這種技術被廣為濫用製造假新聞、偽造名人情色影片等等，造成社會中許多危害與信任危機，使得對應的檢測技術日益受到人們的重視。在檢測臉部偽造技術中，偽造生成方法隨著深度學習的發展品質不斷提升，使得偽造檢測這項議題難以得到一個通用解，為了不讓偵測Deepfake的技術落後於偽造生成的進步，許多研究人員與企業都相繼投入對抗深度人臉偽造，好比如Facebook、Microsoft等企業於2019年底聯合舉辦了Deepfake Detection Challenge，除此之外相關的資料集也陸續在提出，來提供開發者來建構更好的Deepfake檢測工具。在本篇論文中，我們綜合多種圖片噪音模態（high pass filter DCT、Error-Level-Analysis、Photo Response Non-Uniformity）的分析作為訓練輸入，目的是為了能得到更穩健的訓練模型以獲得更高的檢測精度，並且搭配兩分支的預測網路，來分離不同組成成份的偽造偽影（manipulation artifact、blending artifacts），最後透過多種loss的結合，讓特徵向量在高維空間中的分佈能符合我們的預期；總結而言，我們的檢測方法相較於過去許多的作法，除了在檢測圖像真假上有著更好的表現，還能夠去預測偽造區域（manipulation region、blending boundary）的所在，使得訓練模型的檢測結果更具有解釋性。

關鍵字

深度學習；深度偽造檢測；圖片雜訊分析；偽造偽影分離；多任務學習

並列摘要

With the popularity of many free and open source related tools (such as faceswap,DeepFaceLab, etc), the Deepfake videos are abusively applied to create harmful films,such as fake news, fake celebrity porn movies, etc. Misuse of synthesis technology will bring potential harm to nowadays society. Therefore, the corresponding technology of Deepfakes Video Detection has attracted more and more attention. In order to prevent the technology of forgery detection from lagging far behind the forgery generation techniques,Facebook, Microsoft and other well-known companies held the Deepfake Detection Challenge at the end of 2019 to encourage the development of new methods to counter AI-generated forgery films. Google also published the DeepFakeDetection dataset, to benefit the development of new detection algorithm. In this paper, we have synthesized the analysis of various image noise modalities(including high-pass filter DCT, Error Level Analysis, Photo Response Non-Uniformity)as training input, with the purpose of obtaining a more robust training model. The two-branch prediction network is used to separate forgery artifacts of different components(manipulation artifacts, blending artifacts). Finally, through the combination of multiple losses, the distribution of the feature vector in the high dimensional space can be made to meet our expectations. In summary, our detection method has better performance than many previous methods in detecting the authenticity of the input image, and it can also predict the location of the forged regions (like manipulation regions, blending boundaries). Make the detection results of the training model more descriptive.

並列關鍵字

Deep Learning ； Deepfake Detection ； Image Noise Analysis ； Forgery Artifacts Separation ； Multi-task learning

參考文獻

[1]D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen. Mesonet: a compact facialvideo forgery detection network.2018IEEEInternationalWorkshoponInformationForensicsandSecurity(WIFS), Dec 2018.

Google Scholar

[2]F. Chollet. Xception: Deep learning with depthwise separable convolutions. In2017IEEEConferenceonComputerVisionandPatternRecognition(CVPR), pages1800–1807, 2017.