利用階層式權重字尾樹找出在天文觀測紀錄中變化相似的序列

隨著科技的進步以及儲存設備成本降低，泛星計畫（Panoramic Survey Telescope And Rapid Response System，Pan-STARRS）中的觀測資料得以大量且詳細的儲存，但對於普遍仰賴人力進行數據前處理與分析的天文學家，卻也得花上比以往更長一段時間才能將亮度變化規則相似的星體給挑選出來，顯示傳統方法並不足以應付現今大型且複雜的數據。因此，本論文著眼於以下之目標： 1. 建立自動化資料前處理系統：由於星體觀測記錄數據資料龐雜，必須先將較需要的部分挑出使用，並解決像是觀測時間的錯誤記錄與雜訊訊號過多等問題。為此建立了自動化的資料前處理機制，以利後續的應用。 2. 引入關聯式規則之演算法：在天文領域中，利用星體間的相似或相異特徵並將其分類是非常重要的一環。我們將概念階層的想法結合權重字尾樹，使得變化相似的星體能夠聚集在同一條分支上。最後提供使用者多元化的搜尋應用方法來幫助後續的分析動作。透過自動化程式的運行，將使分析資料得以簡化，減少了在資料處理上所耗費的人力，在效率上也得到了明顯的提升，提供了研究人員在未來面對大量觀測資料時一個有效的解決方法。

關鍵字

泛星計畫；關聯式規則；資料探勘；概念階層；權重字尾樹

並列摘要

Astronomical researchers have been manually registering and maintaining observation data for various analysis processes. But with the ongoing construction of observatories from Pan-Starrs projects, the size of observation data has exploded. Manually processing numerous of data each day becomes impractical. Responding to this challenge, we need to construct large scale information management system, as well as the efficient methodology for data analysis. We have the following goals to achieve in this project: 1. Constructing an automatic information preparation system: Because of the movements of earth and astronomical objects, a complete set of observation records requires gathering data from world-wide observatories. Limited by factors such as hardware, weather, time, or temperature, we also need to calibrate and clarify heterogeneous data sources before data integration. Considering the rapidly growing data size, data preparation has to be processed automatically and efficiently. We will implement this preparation system with the accessibility of computer network and perform necessary calibration or transformation based on historical data features. The clarified data then can be integrated for further analysis and researches. 2. Develop astronomical time-series pattern mining and associated rule mining methodologies: Discovering the similarities between astronomical objects, and accordingly classify those objects, is an important process for many astronomical researches. We then integrate concept hierarchy with weighted suffix tree, and made those similar variation trend objects gather in the same branch inside the tree structure. Furthermore, we also implement some functions to help user searching what they are interested in. By using automatic program, the observation data can be simplified. Not only reduce the loading in data analysis, but also improve its efficiency and give those researchers a better solution to handle large data in the future.

並列關鍵字

Pan-Starrs ； Weighted Suffix Tree ； Concept Hierarchy ； Data Mining ； Association Rule

參考文獻

〔1〕陳文屏，「天文觀測的新挑戰─談泛星計畫」，科儀新知，第30卷第3期，2008年。

〔2〕吳彥慶，「利用權重字尾樹中頻繁事件序改善入侵偵測系統」，國立中央大學，碩士論文，民國96年。

〔3〕李翊銘，「從交易資料庫中以自我推導方式探勘具有多層次FP-tree」，國立中央大學，碩士論文，民國95年。

〔6〕Philip Bevington, D. Keith Robinson, Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, United States, 2003.

〔7〕K.-Y. Whang, J. Jeon, K. Shim, J. Srivatava, “Position Coded Pre-order Linked WAP-Tree for Web Log Sequential Pattern Mining”, PAKDD, LNAI 2637, pp. 337–349, 2003.

被引用紀錄

劉書宏（2014）。運用權重式字尾樹之分散式天文序列資料索引系統〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0412201512024112

蔡昀翰（2015）。基於Hadoop平台之分散式權重式字尾樹暨天文時序性資料分析系統〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0412201512075875

國際替代計量

利用階層式權重字尾樹找出在天文觀測紀錄中變化相似的序列

未授權

主題瀏覽