藉由條件隨機域探勘運動領域的故事情節

隨著體育賽事的大眾化，越來越多的運動都擁有廣大的粉絲，很多粉絲會想知道今天最喜歡的球隊或球員發生了什麼事，很多球隊經理也會想知道怎麼招募、訓練、交易球員才可以得到最好的表現，新聞文章是他們獲得這類情報一個很重要的來源，但隨著網際網路的快速發展，網路上的體育新聞數以萬計。因此，我們提出了一個研究架構幫助他們快速整理體育新聞中的重要故事情節，讓他們能隨時掌握每天發生的每一個體育事件的發展走向。所提出的研究架構分為三部份，首先，我們對新聞文章做前處理，將文章分割成段落；接著，我們利用了體育新聞文章中擁有的特定事件種類和種類中的關鍵字，並搭配條件隨機域模型得到每個段落在每個事件種類的隸屬程度，最後，利用在體育新聞文章中很常被提到的名稱實體捕捉段落之間的相似性，我們建構了名稱實體樹來計算每個名稱實體的距離，得到最後的故事情節為圖形結構。實驗結果顯示我們的方法在兩個指標上面都勝過SteinerTree方法，因為我們的方法善用了運動領域的獨有的特色，且在故事情節的結構上面更具有表達力。本研究所提出的針對特定主題的故事情節，可以使球迷與球隊管理階層對每天眾多體育事件做更快速的掌握與運用。

關鍵字

故事情節探勘； NBA新聞文章；條件隨機域模型；事件擷取；名稱實體樹

並列摘要

In this thesis, we propose a framework to mine sport event storylines from a collection of news articles. The proposed framework contains three phases. First, we preprocess the news articles by removing stop words, normalizing the variants for each word, and extracting name entities. Next, we employ a conditional random fields model to label the event category of each word in news paragraphs and derive an event vector for each news paragraph. Finally, we use the event vectors derived and name entities to compute the similarity between news paragraphs, and then generate the storylines for a given query. The experiment results show that the proposed framework outperforms the comparing method and can generate better understanding storylines. The proposed framework can help obtain some valuable insights for sport fans and team managers.

並列關鍵字

storyline mining ； NBA news ； conditional random fields model ； event extraction ； name entity tree

參考文獻

[4] A. Gupta, P. Srinivasan, J. Shi, L. S. Davis, Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos, IEEE Computer Vision and Pattern Recognition (2009) 2012-2019.

[7] J. Lafferty, A. McCallum, F. Pereira, Conditional random fields - Probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning (2001) 282-289.

[10] F. R. Lin, C. H. Liang, Storyline-based summarization for news topic retrospection, Decision Support Systems (2008) 473–490.

[11] S. Y. Lu, A tree-to-tree distance and its application to cluster analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (1979) 219-224.

[13] M. F. Porter, An algorithm for suffix stripping, Program (1980) 130-137.

國際替代計量

藉由條件隨機域探勘運動領域的故事情節

查找全文

主題瀏覽