透過您的圖書館登入
IP:18.116.40.177
  • 學位論文

使用多種鑑別式模型以及特徵資訊於語音文件摘要之研究

Exploiting Various Discriminative Models and Information Cues for Spoken Document Summarization

指導教授 : 陳柏琳
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


已有許多機器學習的摘要方法被應用於語音文件摘要,它們通常將文件摘要視分類問題(分兩類),嘗試從文件中挑選重要的語句做為摘要結果;然而,訓練語料不平衡的問題有時會影響這些摘要方法的效能。另一方面,藉由以增進分類正確率而訓練的摘要方法並不見得擁有較好的摘要結果。鑑於此種現象,本論文首先探討使用兩個不同的訓練準則的摘要方法,以減輕上述問題所造成的負面影響,並且得以提高摘要效能。其一為將訓練文件中成對語句之間的重要性排序資訊,做為摘要方法訓練之依據;另一則以直接最大化其摘要評估分數為準則做為計摘要方法訓練之依據。另外,一些訓練語句和特徵選取的方法也在本論文被廣泛地研究與比較。摘要實驗是在中文廣播新聞上進行;我們發現所使用的兩種訓練準則皆能夠展現出比基礎實驗方法較好的結果,但於訓練語句以及特徵選取方法似乎並不能顯地改善摘要效能。

並列摘要


Many of the existing machine-learning approaches to speech summarization cast important sentence selection as a two-class classification problem; however, the imbalanced data problem sometimes results in a trained speech summarizer with unsatisfactory performance. On the other hand, training the summarizer by improving the associated classification accuracy does not always lead to better summarization evaluation performance. In view of such phenomena, this thesis investigates two different training criteria to alleviate the negative effects caused by them, as well as to boost the summarizer’s performance. One is to learn the classification capability of a summarizer on the basis of the pair-wise ordering information of sentences in a training document according to a degree of importance. The other is to train the summarizer by directly maximizing the associated evaluation score. Alternatively, a few methods for training sentence and feature selection are also extensively studied and compared. Experiment results on a broadcast news summarization task show that the presented two training criteria can drive up the performance as compared to baseline summarization system, while training sentence and feature selection seems to show mixed effectiveness.

參考文獻


[Baxendale 1958] Baxendale, P. Machine-made index for technical literature - an experiment. IBM Journal of Research Development, 1958.
[Cortes et al. 1995] C. Cortes and V. Vapnik. Support Vector Networks. Machine Learning, 20, 1995.
[Climenson et al. 1961] W.D. Climenson, N.H. Hardwick, S.N. Jacobson, “Automatic syntax analysis in machine indexing and abstracting”, In American Documentation, 1961.
[Chen et al. 2009] Y. -T Chen, B. Chen and H. -M. Wang, “A probabilistic generative framework for extractive broadcast news speech summarization," IEEE Transactions on Audio, Speech and Language Processing, 17(1), 2009.
[Chen et al. 2004] Berlin Chen, Hsin-min Wang, Lin-shan Lee, “A Discriminative

延伸閱讀