透過您的圖書館登入
IP:3.145.177.115
  • 學位論文

基於Hadoop的基序與循序樣式探勘研究 ─以台灣家用電力時間序列資料為例

The Study of Motif and Sequential Patterns Mining Based on Hadoop─A case study of Appliances Usage Time Series in Taiwan

指導教授 : 曹承礎

摘要


隨著環保意識的抬頭,節能減碳目標的追求,電力公司對於電力資料探勘需求日益增加,再加上智慧電表的逐漸普及,電力時間序列資料正在快速的成長,使得相關人員面臨了巨量資料分析與複雜運算之困難。而巨量資料中的目前最普及的解決方案就是使用開放式原始碼巨量資料處理平台Hadoop,透過其分散式編程架構MapReduce、分散式檔案系統HDFS 來處理巨量資料。 在時間序列探勘中有個重要的研究議題就是基序(motif)探勘,基序是指在一個時間序列中重複出現的片段序列,藉由基序探勘,我們將能找出有意義的片段,並讓他代表一個事件,接著就能將一個時間序列轉換為一個事件序列並使用傳統的關聯法則找出用戶其隱藏的用電行為規則,而用電行為規則將對於節能減碳相關政策上的決定提供了相當大的參考價值。 因此為了能解決傳統基序演算法對於巨量資料處理上的限制,本研究將基於Hadoop提出新的基序(motif)探勘演算法-「PrefixMotif」以及「MR_PrefixMotif」,PrefixMotif是由知名循序樣式探勘演算法PrefixSpan所改良。實驗結果顯示,在資料規模相當大的時候PrefixMotif比基序探勘研究中的常用方法Time Serise Project還要快上80倍以上且使用的記憶體空間更小,而做分散化處理後的「MR_PrefixMotif」在hadoop平台上執行,更隨著節點增加讓效能更是進一步的提昇,讓「MR_PrefixMotif」比起傳統的方法在執行效能上具有壓倒性的優勢。 最後本研究也實作了知名循序樣式探勘演算法I-PrefixSapn的分散化處理,提出基於Hadoop上執行的「MR_I-PrefixSpan」演算法並處理樣式探勘的部分,讓整體電力時間序列資料探勘的過程完整,而其探勘流程、基序結果、樣式結果等可供後續相關用電探勘研究之參考。

並列摘要


With the rise of environmental awareness, power companies increasing demand for electric data mining. In addition, the increasing popularity of Smart Meters generate big electric time series data. Big data make researchers confronted analysis of large-scale data sets and heavy computation. It is a good choice to solve this problem that Hadoop which provide fault-tolerant parallelized analysis based on a Programming style named MapReduce. In order to achieve the goal of electric data mining. Motif mining is important research topic in time series mining. In time series, a motif is a subsequence fragment of a recurring. By motif mining, we can discovery a significant event. Traditional single-processor motif algorithm is inadequate to mining motif from that large-scale time series datasets. Therefore, this study provides two novel motif mining algorithm「PrefixMotif」 and 「MR_PrefixMotif」 based on Hadoop platform. Experiments show that when facing big data, 「PrefixMotif」 performance is better than traditional motif mining algorithm 「Time Series Projection」. Further, a distributed algorithm「MR_PrefixMotif」performance is better than single-processor algorithm「PrefixMotif」. MR_PrefixMotif is a novel parallel and distributed algorithm optimized for motif mining of large-scale time series datasets and provided superior performance of motif mining for electric data mining researchers.

參考文獻


【1】 Han, J & Kamber, M.(2001). Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers.
【2】 Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communication of the ACM, 39, 27-34.
【3】 Fayyad, U. & Stolorz, P. (1997). Data mining and KDD: Promise and Challenges. Future Generation Computer Systems, 13, 99-115.
【5】 Kagami., Iwamoto. & Tani (2008). Application of datamining method (ID3) to data analysis for ultra deep hydrodesulfurization of straight-run light gas oil—determination of effective factor of the feed properties to reaction rate of HDS. Fuel, Vol84 no.2-3, 279-285.
【7】 Hirst, E. (1996). The future of DSM in a restructured US electricity industry. Energy Policy, 24, 303-315.

被引用紀錄


康心柔(2018)。台灣住宅部門冷氣用電行為分群探勘研究〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU201701658

延伸閱讀