現今的電子新聞無論型態或是數量都已不知凡幾,網路新聞早已成為人們閱讀新聞最重要的方式之一。本研究提出一種結合統計與語意方法的系統,旨在單篇電子中文新聞中擷取其重要的資訊,藉以產生一具有該篇新聞重點之摘要。 我們從雅虎電子新聞資料庫( Yahoo News )中取得兩百篇電子新聞,其中涵蓋了十種不同類別的新聞各二十篇。運用 Yahoo 之斷章取義 API (http://tw.developer.yahoo.com/cas/) 來對這些文件做前處理,以供我們所設計的系統來對文章做分析與進一步的處理,進而得出每個句子的內含分數。藉由這些分數我們對單篇文章中所有的句子來做排序,方可得出文章中句子的排名,再由高而低的,適當的取出我們欲摘要的比例,接續而成我們的結果摘要。 我們也將系統方法所產生摘要的結果之準確度,與機器學習的方法做比較,以期得知本系統之可靠度和準確率。結果顯示,本系統不論是在產生的摘要品質抑或是摘要的效能上,都有不錯的成果。
Seeing the type and amount of electronic news are too numerous to mention, web news has become one of the most important ways that people read news nowadays. In this research, we propose a combination of methods for extracting summary with salient ideas from a single news document in Chinese, including textual statistical and semantic features. We collect electronic news from 10 categories of Yahoo News, 20 for each, 200 articles in total as our target corpus. Employ Yahoo(斷章取義 http://tw.developer.yahoo.com/cas/) API to do the term extraction works. Design a ranking system to weight the sentences and sift the elected ones out, turn them into the final summary with a proper compression rate according to the reference summary. We also compare the results with the ones with ML (machine learning) approach, to further evaluate the accuracy and reliability of the system. From this approach, we find it useful for effectively produce informative summaries with decent qualities.