中文新聞自動摘要產生系統

隨著網際網路的蓬勃發展，瀏覽新聞媒體網站、線上閱讀新聞已成為許多民眾上網的主要活動，但每天都有大量的新聞資料產生，已經造成資料氾濫的情形。讀者通常只會選擇重要或感興趣的新聞閱讀，其他新聞至多只會看看標題就帶過去了。這些被草草帶過的新聞裡面或許會有讀者想知道的資訊，但可能會因為標題下的不夠好而沒有被讀者閱讀。將不同新聞網站的文章保存，並從冗長的文章自動概括出簡潔的摘要，就可以為讀者節省大量的閱讀時間。本論文提出一個能自動收集並歸納出中文新聞摘要的方法，其步驟是先把網站上的新聞標題、類別和內文擷取下來，再利用中文斷詞技術以自行定義的詞彙資料庫為基準來進行分詞斷句，然後使用資訊檢索的加權技術來找出文章中的專有名詞和關鍵字，並以句子為單位，算出句子的權重。接著以文章標題的詞彙為指標，找出句子的顯要因素值。最後將兩者進行加總算出新的句子權重值，即可進行重要句子擷取的作業，依照權重值的大小按照文章順序來對句子做排序，以產生中文新聞自動摘要。

關鍵字

自動摘要；中文斷詞；網路新聞；資訊檢索；大數據

並列摘要

As the development of the internet grows rapidly, browsing news media website and online news have been the main activity for most people. Furthermore, news release everyday massively, which causes the overflowing of information. Readers generally read the headlines or the topics which they are interested in. They would only read the title of other news at most. Those news ignored by readers at first glance might contain some information that readers want to know; however, the titles might be unappealing for public therefore the articles are not read. In summarize, if the articles from different news media are saved, and the brief summaries are automatic abstracted, it would be possible to gain more time for readers efficiently. This paper put forward a method can collect and generalize Chinese news abstract automatically. The steps are capturing the news title, category, and content on the internet, and using Chinese word segmentation technique to segment the words by standard from lexical database which is self-defined. Furthermore, using weighted technique for information retrieval to find proper names and keywords; by unit of sentence, calculates the weight of each sentence. Moreover,find the significance factor by using the title of the article as an index. Finally, summarize both of them to get the new sentence weight to continue the retrieve of the key sentence. According to the weight of sentence and the order in the article, an abstract of Chinese news is generated automatically.

並列關鍵字

Automatic Abstract ； Chinese Word Segmentation ； Network News ； Information Retrieval ； Big Data

參考文獻

[4] 許桓瑜, “長句斷詞法和遺傳演算法對新聞分類的影響”, in 淡江大學資訊工程學系

[5] 黃仁鵬、張貞瑩, “運用詞彙權重技術於自動文件摘要之研究”, in 中華民國資訊管

[7] D Das and M. A.F, “A survey on automatic text summarization”, in Literature Survey

[9] X. Huang, Y. Ariki, and M. Jack, “Hidden markov models for speech recognition”, in

Edinburgh: Edinburgh University Press, 1990

國際替代計量

中文新聞自動摘要產生系統

全文下載

主題瀏覽