透過您的圖書館登入
IP:3.14.70.203
  • 學位論文

探索基於生成對抗網路之新穎強健性技術
於語音辨識的應用

Exploring Generative Adversarial Network Based Robustness Techniques for Automatic Speech Recognition

指導教授 : 陳伯琳

摘要


近年深度學習技術在許多領域有重大突破,在各種實際應用中也大放異彩,於自動語音辨識的應用中也一樣有優秀表現。雖然主流語音辨識系統在某些指標性任務上已經可達到和人類聽覺相當的辨識效果,然而它們卻不像人類一樣對於環境干擾具有強健性,也就是說儘管語音辨識系統有了大幅度的改進,「噪聲」仍舊一定程度的干擾語音辨識之準確度。諸如:背景人聲,火車,公車站牌,汽車噪音,餐館背景雜音…以上皆為常見的環境噪聲干擾。所以強健性技術的研究在當今語音辨識系統發展中扮演著重要角色。有鑑於此,本論文著手研究在語音特徵向量序列之調變頻譜上基於生成對抗網路之有效的增益方法。並在Aurora4語料庫上進行一系列實驗顯示本研究使用的方法可以增進語音辨識的效果。

並列摘要


Nowadays deep learning technologies have achieved record-breaking results in a wide array of realistic applications, such as automatic speech recognition (ASR). Even though mainstream ASR systems evaluated on a few benchmark tasks have already reached human-like performance, they, in reality, are not robust to environmental distortions in the manner that humans are. In view of this, this thesis sets out to develop effective enhancement methods, stemming from the so-called generative adversarial networks (GAN), for use in the modulation domain of speech feature vector sequences. A series of experiments conducted on the Aurora-4 database and task seem to demonstrate the utility of our proposed methods.

參考文獻


[1] 汪逸婷, “運用調變頻譜分解技術於強健語音特徵擷取之研究,” 國立臺灣師範大學 碩士論文, 2014.
[2] 朱紋儀, “調變頻譜正規化用於強健式語音辨識之研究,” 國立臺灣師範大學 碩士論文, 2011.
[3] 張庭豪, “調變頻譜分解之改良於強健性語音辨識,” 國立臺灣師範大學 碩士論文, 2015.
[4] 顏必成, “探索調變頻譜之低維度特徵結構用於強健性語音辨識,” 國立臺灣師範大學 碩士論文, 2017.
[5] Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen, "Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition," in INTERSPEECH, 2017.

延伸閱讀