透過您的圖書館登入
IP:3.15.154.76
  • 學位論文

領域轉移下的非監督式異音偵測

Unsupervised Anomalous Sound Detection under Domain Shifts

指導教授 : 張智星

摘要


在機器學習的這個領域中,聲音相關的研究也有許多的應用,舉凡降噪、人 聲分離、語音辨識等等。而異音偵測也是聲音處理其中一個題目,主要目的是讓 模型能在短時間內正確的辨別一段聲音中是否有異常。近年來,隨著機器學習的 應用越來越廣泛,也越來越多領域的使用者考慮引入機器學習到自己的領域來加 速或提升正確率於本來的作業流程,而現在全世界也正走在工業 4.0 的道路上, 許多傳統產業也漸漸對機器學習產生興趣,而異音偵測就是一個適合運用在工廠 的應用,不只能輔助操作人員,更能加快作業速度與提升產品良率。本論文以過 往異音偵測的研究為基礎,探討在領域轉移 (domain shift) 的情況下如何維持甚至 提升正確率。本文以時頻譜的形式看待聲音,依此將聲音表示成多張的圖像,再 以 CNN 結合轉換器的模型做特徵抽取,設計多任務學習的模型做為分類器判斷 聲音片段是否為異音,我們使用 DCASE 競賽資料集,使用最先進的模型進行特 徵抽取,並透過損失函數的設計與模型架構的優化來處理領域轉移,最後利用基 於密度的方法計算異音分數,得到最終結果。最後我們與 DCASE 競賽官方公佈 的模型、其他參賽者的模型做比較,官方公佈模型的基線平均接收操作特徵圖下 面積 (area under the ROC Curve, AUC) 為 63.36%,本研究的模型相較於官方公佈 模型的表現有顯著的提升,平均 AUC 可達到 72%,與其他參賽者的模型比較也 能達到前五名的表現。

並列摘要


In this field of machine learning, sound-related research also has many applications, such as noise reduction, human voice separation, speech recognition, and so on. Abnormal sound detection is also one of the topics in sound processing. The main purpose is to allow the model to correctly identify whether there is an abnormality in a sound in a short period of time. In recent years, as the application of machine learning has become more and more extensive, more and more users are considering introducing machine learning into their own fields to speed up or improve the accuracy rate of the original operation process, and now the world is also moving on the road of Industry 4.0, many traditional industries are gradually becoming interested in machine learning, and abnormal sound detection is an application suitable for use in factories. It can not only assist operators, but also speed up operations and improve product yields. Based on the previous research on abnormal sound detection, this paper discusses how to maintain or even improve the accuracy in the case of field transfer. In this paper, we look at the sound in the form of spectrogram, and then represent the sound as images, and then use the CNN combined with the transformer model for feature extraction. The multi-task learning model is used as a classifier to judge whether the sound clip is abnormal or not. We use the DCASE competition dataset, ap- plying state-of-the-art model for feature extraction, and the domain transfer is handled through the design of the loss function and the optimization of the model architecture. Finally, the density-based method is used to calculate the abnormal sound score, and the final result is obtained. Finally, we compare with the model officially announced by the DCASE competition and the models of other contestants. The average area under the ROC Curve (AUC) of the baseline model that officially announced is 63.36%. Compared with the baseline model, Performance of the model in this study has been significantly im- proved. The average AUC can reach 72%, and it can also achieve the top five performance compared with other contestants’ models.

參考文獻


[1] Markus Breunig, Hans-Peter Kriegel, Raymond Ng, and Joerg Sander. Lof: Identi- fying density-based local outliers. volume 29, pages 93–104, 06 2000.
[2] AsaBen-Hur,DavidHorn,HavaSiegelmann,andVladimirVapnik.Asupportvector method for clustering. Advances in Neural Information Processing Systems, 13, 2000.
[3] Bernhard Schölkopf, Robert C Williamson, Alex Smola, John Shawe-Taylor, and John Platt. Support vector method for novelty detection. Advances in neural information processing systems, 12, 1999.
[4] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In 2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008.
[5] Hansi Chen, Hongzhan Ma, Xuening Chu, and Deyi Xue. Anomaly detection and critical attributes identification for products with multiple operating conditions based on isolation forest. Advanced Engineering Informatics, 46:101139, 2020.

延伸閱讀