透過您的圖書館登入
IP:216.73.216.100
  • 期刊
  • OpenAccess

基於端對端模型化技術之語音文件摘要

Spoken Document Summarization Using End-to-End Modeling Techniques

摘要


本論文主要探討端對端(End-to-End)的節錄式摘要方法於語音文件摘要任務上的應用,並深入研究如何改善語音文件摘要之成效。因此,我們提出以類神經網路為基礎之摘要摘要模型,運用階層式的架構及注意力機制深層次地理解文件蘊含的主旨,並以強化學習輔助訓練模型根據文件主旨選取並排序具代表性的語句組成摘要。同時,我們為了避免語音辨識的錯誤影響摘要結果,也將語音文件中相關的聲學特徵加入模型訓練以及使用次詞向量作為輸入。最後我們在中文廣播新聞語料(MATBN)上進行一系列的實驗與分析,從實驗結果中可驗證本論文提出之假設且在摘要成效上有顯著的提升。

並列摘要


This thesis set to explore novel and effective end-to-end extractive methods for spoken document summarization. To this end, we propose a neural summarization approach leveraging a hierarchical modeling structure with an attention mechanism to understand a document deeply, and in turn to select representative sentences as its summary. Meanwhile, for alleviating the negative effect of speech recognition errors, we make use of acoustic features and subword-level input representations for the proposed approach. Finally, we conduct a series of experiments on the Mandarin Broadcast News (MATBN) Corpus. The experimental results confirm the utility of our approach which improves the performance of state-of-the-art ones.

參考文獻


Chen, B., Kuo, J.-W., & Tsai, W.-H. (2004). Lightly Supervised and Data-Driven Approaches to Mandarin Broadcast News Transcription. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing 2004. doi : 10.1109/ICASSP.2004.1326101
Cheng, J. & Lapata, M. (2016). Neural summarization by extracting sentences and words. In Proc. of ACL, 484-494. doi: 10.18653/v1/P16-1046
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., …Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. of EMNLP 2014, 1724-1734. doi: 10.3115/v1/D14-1179
Chopra, S., Auli, M., & Rush, A. M. (2016). Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. In Proc. of NAACL-HLT 2016, 93-98. doi: 10.18653/v1/N16-1012
Chien, J.-T. (2015). Hierarchical Pitman-Yor-Dirchlet language model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(8), 1259-1272. doi: 10.1109/TASLP.2015.2428632

延伸閱讀