近幾年來網路的應用越來越廣泛,使得許多資訊陸續進行數位化,以利在網路上傳播。不過隨著數位化的發展,也使得資訊大量的增加,使用者在獲取資訊上已不如以往困難。在這種現象之下,重要的是如何過濾掉不需要的訊息,讓使用者可以找到真正所需要的資訊。 在傳統的新聞摘要中,處理單一新聞文件時,大多使用單文件摘要(Single-Document Summarization)技術將新聞摘要呈現出來。新聞自動分類技術已漸趨成熟,多數的新聞入口網站也會將每一篇新聞分類,卻未針對不同的類別新聞作不同的新聞摘要。如此一來,新聞摘要將有可能使讀者無法快速的搜尋到所關心的新聞,或是遺漏了相關的新聞消息。 所以本研究別於以往,提出一套結合分類導向的新聞摘要方法。希望藉由資訊檢索概念、計算字詞TF*IDF權重值、K-means分群法與文件摘要技術組成新聞摘要,並再針對分類新聞中權重值前百分之十的字詞作為關鍵字詞,進行權重的加權。除此之外,由於新聞文章中的新聞標題字詞往往為新聞重點之一,而文章中的首段首句及末段末句又常為主題句及結論句,因此本研究對於以上重點部分也進行權重的調整。目的就在於希望當讀者閱讀擁有分類導向的新聞摘要之後,便能快速的掌握新聞的重點,判斷是否為其所需要之新聞資訊。
The Internet has been applied more and more widely in recent years. Therefore, various information has been digitized to facilitate its spread on the Internet. With the development of digitization, huge amount of information has been created and it’s not as difficult for users to acquire information as before. As a result, it is important for users to exclude unnecessary information to get what they really need. Traditionally, the single-document summarization method has been used to present the single news summary. With the development of the news automatic clustering technology, most news portal sites also classify news, but they do not give different summaries to different news types. As a result, readers may not quickly find the news they care about or they may miss relative news by the search of this kind of news summaries. This research is to come out with a news summary method that combines with the classification-oriented technology. This method creates news summaries by the concept of information retrieval, the calculation of TF*IDF weight of the words, K-means clustering, and document summarization. We then define the top 10% words of the classified news by their weight as relevant words. In addition, since the headlines are usually the key points of the news, and the first sentence of the first paragraph is usually the main point and the last sentence of the last paragraph is usually the conclusion, this research also adjusts the weights of relevant words according to those concepts. The purpose of the research is to provide a method of classification-oriented news summary so that readers can get the main points of the news in a short time and determine whether the news is what they want.