語音文件摘要 － 特徵、模型與應用

語音文件摘要容易受語音辨識錯誤的影響，進而導致在使用傳統文字文件摘要方法時並無法正確地摘要出語音文件中重要文句。相對於文字文件，語音文件在從事語音摘要時卻額外地提供了許多的資訊：諸如聲韻特徵(Prosodic Features)、聲學特徵(Acoustic Features)、語者(Speaker Roles)或情感(Emotion)資訊等，都是從事語音文件摘要時可以善加利用的額外語句特徵。本論文以特徵(Features)、模型(Models)與應用(Applications)等三個不同構面進行語音文件摘要之研究。在特徵層面，我們探討如何使用不同的詞圖結構表示語音辨識候選詞序列(Recognition Hypotheses)，進而解決傳統因為只利用單一最佳辨識詞序列(1-Best)所造成的辨識錯誤影響。在模型方面，我們基於Kullback-Leibler (KL) 散度測量(Divergence Measure)方法提出了一個非監督式(Unsupervised)的摘要模型，此摘要模型允許利用文字以外的資訊線索增進散度測量正確性，進而減緩因為語音辨識錯誤所造成的問題。同時，針對監督式(Supervised)的摘要模型，我們提出了三種不同的訓練準則進行摘要模型訓練，以解決訓練資料不平衡(Imbalanced Data)所導致的負面影響。架構在此二類不同的摘要模型之上，我們進而提出了一個風險感知(Risk-Aware)的摘要架構，此架構透過監督式與非監督式摘要模型的結合，不僅能保有其各自的優點更進而克服各自方法的侷限。我們亦導入了不同的減損函式(Loss Function)，以便考量語句-語句或者是文章-語句間的冗餘性與連貫性關係。對於應用層面，我們探討如何將摘要技術整合至資訊檢索技術上。本論文所提出之方法均實驗在廣播新聞語料，實驗結果亦證明本論文所提出之方法可大幅地改善現有摘要方法的效能。

關鍵字

語音摘要；散度測量；訓練資料不平衡；風險感知；資訊檢索

並列摘要

Speech summarization is inevitably faced with the problem of incorrect information caused by recognition errors. However, it also presents opportunities that do not exist for text summarization; for example, information cues from prosodic analysis including speaker emotions can help the determination of importance and structure of spoken documents. In this dissertation, we discuss the problem of speech summarization from three aspects: features, models and applications. For the feature aspect, we investigate various ways to robustly represent the recognition hypotheses of spoken documents beyond the top scoring ones to alleviate negative eects caused by speech recognition errors. For the model aspect, an unsupervised Kullback-Leibler (KL) divergence based summarization method which has the capability to accommodate more information cues to alleviate the problem caused by speech recognition errors is presented. We also investigate three disparate training criteria to train a supervised summarizer in a preference-sensitive manner, to overcome the problem of imbalanced data existing in speech summarization. Building on these methods, we propose a risk-aware summarization framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. Various loss functions and modeling paradigms are introduced, providing a principled way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. For the application aspect, we demonstrate the possibility of integrating summarization techniques into information retrieval tasks. Experimental results on the broadcast news summarization task suggest that our proposed methods can give substantial improvements over conventional summarization methods.

並列關鍵字

speech summarization ； Kullback-Leibler divergence ； Imbalanced Data ； Risk-Aware ； information retrieval

參考文獻

Aubert, X. L. (2002). An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech and Language, 6, (1), pp. 89 - 114.

Bahl, L., Brown, P., Souza, P. and Mercer, R. (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 49 - 52.

Baxendale, P. (1958). Machine-made index for technical literature - an experiment. IBM Journal of Research and Development, pp. 354 - 361.

Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. Springer-Verlap.

Carbonell, J. and Goldstein, J. (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proc. of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335 - 336.

被引用紀錄

張云箐（2007）。最小均方演算法以及功率頻譜密度差異值用於雜訊消除的分析〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2007.00019

國際替代計量

語音文件摘要－特徵、模型與應用

主題瀏覽