透過您的圖書館登入
IP:18.191.218.132
  • 學位論文

串流文件內涵事件之偵測、演變及摘要之研究

Event Detection, Evolution and Summarization of Streaming Texts

指導教授 : 陳銘憲

摘要


由於網路的便利性,網際網路已成為目前資訊散佈的主流媒介,許多關係人類生活的相關資訊都藉由著它來發布或交換訊息,但也由於其便利性,大量且不斷產生的網際網路資訊也增加了使用者搜尋資訊時的不便,為了要有效管理這不斷產生資訊的文件串流,事件偵測與追蹤與自動事件內容摘要化便成了目前熱門的學術研究議題。 本論文的主旨在於提供一個套管理串流文件的有效機制,我們提出了兩種事件偵測方法來自動偵測與追蹤新興的新聞事件,透過我們所提出的衰老理論,我們可有效的描述事件的生命週期來降低事件偵測的錯誤率,此外,我們也提出了一套以隱含式馬可夫模型為基礎的生命模型來描述事件的熱門程度變化,藉由所學習到的生命模型,我們可即時地預測不同事件的熱門狀態來動態的調整事件偵測中的分群門檻值。透過官方制定的實驗測試集,我們所提出的方法確實能改善現有事件偵測方法的效能。另外,為了便利使用者了解事件的來龍去脈,我們還提出了一套事件內容摘要化的方法,在摘要化的過程中,我們考慮了事件的時續性以進階產生事件的故事演變圖。實驗結果證明事件時序性能有效提升事件內容摘要化的效能,而實驗範例也說明了所產生的故事演變圖確實能捕捉到事件內的重要發展與演變。

並列摘要


The World Wide Web (WWW) has become a major information source for people from all walks of life. Although the WWW facilitates information distribution, the ever-increasing volume of Internet documents has made information discovery from the Internet a time consuming task. To manage the massive information of the Internet efficiently, there is a critical need for event detect and summarization methods from text streams. In this dissertation, we provide two adaptive methods to detect sequential events from text streams. We first propose an aging theory to model the life cycle of events. Then, we provide an event detection framework called LIPED which utilizes HMM-based life profiles to predict the activeness status of events for adaptive threshold adjustments. To help user comprehend the development of news topics easily, we also provide a unified mechanism to construct a topic evolution graph and summary from topic documents. The experiment results based on the official TDT4 corpus show that the proposed event detection methods improve the performance of existing well-known event detection approaches substantially, and the composed topic summaries and evolution graphs are highly representative.

參考文獻


[1] C. C. Aggarwal, “A Framework for Diagnosing Changes in Evolving Data Streams,” in proceedings of the ACM SIGMOD international conference on management of data, pp. 575-586, 2003.
[9] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” in annals of mathematical statistics: 41, pp. 164-171, 1970.
[17] D. Donjerkovic and R. Ramakrishnan, “Dynamic Histograms: Capturing Evolving Data Sets,” in proceedings of the 16th international conference on data engineering, pp. 86, 2000.
[24] R. P. Grimaldi, Discrete and Combinatorial Mathematics: An Applied Introduction, Addison Wesley Publishing Company; 4th edition, 1998.
[25] V. Hatzivassiloglou, L. Gravano, and A. Maganti, “An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering,” in proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp. 224-231, 2000.

延伸閱讀