研究使用詞彙與語意資訊於

朱惠銘

透過您的圖書館登入 IP:18.191.189.85

透過您的圖書館登入

IP:18.191.189.85

繁體中文
English
简体中文

精確檢索 : 冠狀病毒
模糊檢索 : 冠狀病毒
冠狀病毒感染

冠狀病毒疾病
查詢出版品: 冠狀病毒

進階查詢

查詢歷史

主題瀏覽

【下載完整報告】AI熱潮從學術研究也能看出端倪？哪些議題是2023熱搜議題？

學位論文

研究使用詞彙與語意資訊於

Investigating the Use of Lexical and Semantic Information for Automatic Spoken Document Segmentation and Organization

朱惠銘(Huei-Ming Chu)

指導教授：陳柏琳

國立臺灣師範大學/理學院/資訊工程研究所/碩士(2004年)

若您是本文的作者，可授權文章由華藝線上圖書館中協助推廣。

查找全文

摘要

無資料

關鍵字

語音文件切割；語音文件組織；自我組織圖；主題混合模型圖示

並列摘要

Spoken document segmentation is to automatically set the boundaries between different small topics begin mentioned in long steams of audio signals, and divide the spoken documents into a set of cohesive paragraphs of sentences sharing some common central topic. While spoken document organization aims at automatically analyzing the subject topics of the segmented shot paragraphs of the spoken documents, clustering them into groups with topic labels and organizing them into some hierarchical visual presentation easier for users to browse. Both of them have gained growing attention in the past few years. In the thesis, we explored the use of the Hidden Markov Model (HMM) approach, which has been proven effective for speech recognition and information retrieval, in the context of spoken document segmentation. We not only exploited the lexical information inherent in the spoken document, such as the statistical features or the language model probabilities, but also considered the acoustic information, such as the pause distribution and the confidence measure, in identifying segment boundaries. Moreover, the semantic information conveyed in the spoken document was also integrated into the HMM segmenter for accurately modeling the state observation distributions. On the other hand, we investigated two unsupervised and data-driven organization approaches as well for spoken document analysis, i.e., the Self-Organizing Map (SOM) and Probabilistic Latent Semantic Analysis Map (ProbMap). While for the ProbMap approach, a topical mixture model approach (TMMmap), which came from an alternative perspective, was also studied. A series of experiments was conducted on the Topic Detection and Tracking (TDT) spoken document collections in order to analyze the performance levels of these approaches and compare the differences between them. Finally, we further attempted to incorporate the topic distributions as well as the topological constraints achieved from spoken document organization into the HMM segmenter. Very Promising results were initially demonstrated.

並列關鍵字

Spoken Document Segmentation ； Spoken Document Organization ； Self-Organization Map ； Topic Mixture Model Map

參考文獻

[方國安, 2002]方國安, “應用基因演算法於中文廣播新聞中情境切割及分類”, 國立成功大學資訊工程學系碩士班碩士論文, pp. 20~36, 2002

[陳佳甫, 2003] 陳佳甫, “考慮特徵、語言模型及額外資訊之中文語音文件切割-以廣播新聞為例” 國立台灣大學電信工程學研究所碩士論文, 2003

[Manning, 1999] Christopher D. Manning, Hinrich Schutze, “Foundations of Statistical Natural Language Processing, pp.197 1999”

[Ball and Hall, 1967] Ball G.H., Hall D.J. “A Clustering Technique for Summarizing Multivariate Data.” Behavioral Science, Vol. 12, 153-155. 1967.

[Baum and Eagon, 1967] Baum, L.E. and J.A. Eagon, “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology”, Bulletin of American Mathematical Society, 1967, 73, pp. 360-363

國際替代計量

研究使用詞彙與語意資訊於

主題瀏覽

研究使用詞彙與語意資訊於

Investigating the Use of Lexical and Semantic Information for Automatic Spoken Document Segmentation and Organization

摘要

關鍵字

並列摘要

並列關鍵字

參考文獻

延伸閱讀

國際替代計量

相關連結

本網站使用Cookies