基於特徵粒度之訓練策略於中文口語問答系統之應用

在口語問答系統（Spoken Question Answering, SQA）中，一個簡單且直覺的作法，是先將一段音訊透過自動語音辨識（Automatic Speech Recognition, ASR）轉換成一連串的辨識文字結果，再輸入給現有各式基於文字的問答系統模型來完成任務需求。然而，這樣的做法通常會遭遇自動語音辨識錯誤（Recognition Errors）的影響，導致問答系統模型的效果不如預期。為了解決此一問題，本論文提出一種基於輸入特徵粒度的訓練策略，其目標是改善自動語音辨識錯誤所造成的效能損失，而且不需要額外模型的需求即可完成。我們將本論文所提出之訓練策略運用於中文口語機器閱讀理解（Machine Reading Comprehension, MRC）任務之中，驗證此一方法對於自動語音辨識錯誤的影響與改善。

關鍵字

口語問答系統；語音辨識；特徵粒度；訓練策略

並列摘要

In spoken question answering, a segment of audio is usually converted into a textual representation through an automatic speech recognition (ASR) system, and then input to a text-based question answering model to generate the answer. However, based on the ASR transcriptions, which usually contain lots of recognition errors, text-based question answering system may produce imperfect results. In order to mitigate the performance gap, in this study, a featured-granularity training strategy is proposed. Accordingly, we evaluate the proposed training strategy on spoken Chinese machine reading comprehension task, which not only demonstrates the capability and ability of the proposed strategy, but several valuable observations can be drawn from the experimental results.

並列關鍵字

Spoken Question Answering ； Speech Recognition ； Featured-granularity ； Training Strategy

參考文獻

Wang, W., Shen, J., Guo, F., Cheng, M.-M., & Borji, A. (2018). Revisiting Video Saliency: A Large-Scale Benchmark and a New Model. In Proceedings of CVPR 2018, 4894-4903. doi: 10.1109/CVPR.2018.00514

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. In Proceedings of TACL, 5, 135-146. doi: 10.1162/tacl_a_00051

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of CVPR 2016,770-778. doi: 10.1109/CVPR.2016.90

van Heerden, C., Karakos, D., Narasimhan, K., Davel, M., & Schwartz, R. (2017). Constructing Sub-word Units for Spoken Term Detection. In Proceedings of ICASSP 2017, 5780-5784. doi: 10.1109/ICASSP.2017.7953264

Karras, T., Laine, S., & Aila, T. (2019). A Style-based Generator Architecture for Generative Adversarial Networks. In Proceedings of CVPR 2019, 4401-4410. doi: 10.1109/CVPR.2019.00453

國際替代計量

基於特徵粒度之訓練策略於中文口語問答系統之應用

全文下載

主題瀏覽