基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統

在此論文中，我們提出了一個基於鑑別式自編碼解碼器的神經網路模型，對語者辨識系統的錄音回放攻擊進行自動偵測，也就是判斷語者辨識系統所收到的音訊內容是屬於真實的人聲或是由錄音機所回放出來的人聲。在語者辨識領域中，以人為的聲音造假對語者辨識系統進行的攻擊稱之為欺騙攻擊(Spoofing Attack)。有鑑於深度類神經網路模型已被廣泛應用在語音處理相關問題，我們期望能夠應用相關模型在此類問題上。在所提出的鑑別式自編碼解碼器模型中，我們利用模型的中間層來達到特徵抽取的目的，並且提出新的損失函數，使得中間層的特徵將依照資料的標記結果做分群，因此新的特徵將具有能鑑別真偽人聲的資訊，最後再利用餘弦相似度來計算所抽取的特徵與真實的人聲相近與否，得到偵測的結果。我們採用2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge(ASVspoof-2017)所提供的資料庫進行測試，所提出的系統在開發數據集上得到了很好的成效，與官方所提供的測試方法相比，其準確度約有42 %的相對進步幅度。

關鍵字

語者辨識；語者辨識攻擊；回放攻擊偵測；鑑別式自編碼解碼器；深度類神經網路

並列摘要

In this paper, we propose a discriminative autoencoder (DcAE) neural network model to the replay spoofing detection task, where the system has to tell whether the given utterance comes directly from the mouth of a speaker or indirectly through a playback. The proposed DcAE model focuses on the midmost (code) layer, where a speech utterance is factorized into distinct components with respect to its true label (genuine or spoofed) and meta data (speaker, playback, and recording devices, etc.). Moreover, the concept of modified hinge loss is introduced to formulate the cost function of the DcAE model, which ensures that the utterances with the same speech type or meta information will share similar identity codes (i-codes) and higher similarity score computed by their i-codes. Tested on the development set provided by ASVspoof 2017, our system achieved a much better result, up to 42% relative improvement in the equal error rate (EER) over the official baseline based on the standard GMM classifier.

並列關鍵字

Speaker Verification ； Speakser Verification Attack ； Spoofing Attack ； Discriminative Autoencoder ； Deep Neural Network

參考文獻

Abe, M.,Nakamura, S.,Shikano, K.,Kuwabara, H.(1990).Voice conversion through vector quantization.Journal of the Acoustical Society of Japan (E).11(2),71-76.

Google Scholar

Alam, M. J.,Kenny, P.,Bhattacharya, G.,Stafylakis, T.(2015).Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015.Proceedings of Interspeech 2015.(Proceedings of Interspeech 2015).

Google Scholar

Alegre, F.,Amehraye, A.,Evansdoi, N.(2013).Spoofing countermeasures to protect automatic speaker verification from voice conversion.Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).(Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)).

Google Scholar

Bone, D.,Lee, C.-C.,Narayanan, S.(2014).Robust unsupervised arousal rating: A rule-based framework with knowledge-inspired vocal features.IEEE Transactions on Affective Computing.5(2),201-213.

Google Scholar

Chen, L.-H.,Ling, Z.-H.,Liu, L.-J.,Dai, L.-R.(2014).Voice conversion using deep neural networks with layer-wise generative training.IEEE/ACM Transactions on Audio, Speech and Language Processing(TASLP).22(12),1859-1872.

Google Scholar

國際替代計量

基於鑑別式自編碼解碼器之錄音回放攻擊偵測系統

全文下載

主題瀏覽