中文新聞相關性事件之挖掘-
藉由Haar小波轉換

使用者利用關鍵字來查詢的中文文件探勘方式，必須對所要尋找的內容，有具體概念，才能給定適當的關鍵字。另一方面，在某一些可能有相關的文件中，如果沒有共同的關鍵字，便很難察覺其間具有相關性。在本論文中，我們利用事件的歷史資料來進行相關性事件的中文文件探勘，我們定義『事件』為足以表達某一個概念的連續中文文字1，而事件的『歷史資料』則代表該事件於過去某段時間中，分別在連續單位時間內所出現次數的序列。由於Haar小波轉換具有可保留序列波形的特性，我們將事件的歷史資料序列，依照事先給定的時間區間大小，逐一分割成固定長度的小片段，然後將這些片段轉換成利用平均值(mean)和差值(difference)的 Haar小波方式來表示，如此我們便可以利用小波波形的相似性來找出可能具有相關性的事件。在本論文中，我們提出了三種事件探勘方式：熱門事件探勘、因果事件探勘、特定區間事件探勘，並且由實驗中，探勘出不同的中文新聞相關性事件。

關鍵字

相關性分析；資料探勘；小波轉換

並列摘要

To use keyword search in Chinese document mining, one has to have a concrete idea of the item he is searching in order to give an appropriate keyword. On the other hand, in between possibly related articles, without a common keyword, it would be difficult to detect their correlation. In this thesis, we utilize historical serial data of events to conduct data-mining of correlated events in Chinese articles. By “historical serial data”, we refer to that event’s sequence during consecutive units of time of the past. By “related events”, we refer to events which historical serial data share similar evolution trend, such as “opening up Japanese car import” and “car sales”. By “event” we refer to a sequence of Chinese characters2 that sufficiently express one concept. As for “historical data”, it is that certain event’s occurrence sequence during each of consecutive units of time. Since Haar Wavelet transformation possess the characteristic of retaining sequence wave pattern, we cut the sequence of historical data, according to the given time-frame, into set-length fragments. Then we transform these fragments into the Haar wavelet mode of mean and difference. In this way, we can utilize the similarities of wavelet wave pattern to find possibly related events. In this thesis, we offer three methods of event data-mining: popular event mining, cause-and-effect event mining, seasonal event mining. Through experiments, we explore different related Chinese news events.

並列關鍵字

association rules ； data mining ； Haar Wavelet Transformation

參考文獻

[15] Chung-Chian Hsu and Jing-Kuei Chen. “Data mining in Chinese news articles”.

[16] Eamonn J. Keogh, Kaushik Chakrabarti, Sharad Mehrotra, and Michael J. Pazzani,

[21] Yih-Jeng Lin, Ming-Shing Yu, Shyh-Yang Hwang, and Ming-Jer Wu. “A way to

[32] Jing-Doo Wang and Jyh-Jong Tsay,“Evaluating the Correlation Between Two

[39] 陳景揆, 許中川, “探勘中文新聞文件中的概念關聯及趨勢”, 國立雲林科技大

被引用紀錄

劉宣榮（2010）。中華民國專利之關鍵字歷史資料查詢系統〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465542

廖益緯（2010）。關鍵字歷史資料之查詢系統-以PubMed文獻為例〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215464438

國際替代計量

中文新聞相關性事件之挖掘- 藉由Haar小波轉換

未授權

主題瀏覽