透過您的圖書館登入
IP:18.118.28.197
  • 學位論文

適用於巨量資料分析的約略集合規則歸納法

A Novel Rough Set-based Rule Induction for Big Data Analytics

指導教授 : 陳靜枝
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


約略集合規則歸納法是一種適用於處理不確定且不完整數據的科學方法,可透過對數據的分析和推理來發現隱含的知識、揭示潛在的規則,且不需要額外的統計的假設。此種分析工具於近年來受到許多矚目,並也廣泛且成功運用於許多領域中。 然而,近年來企業及實務界皆面臨巨量資料所帶來的衝擊,當系統建置於處理營運所產生的交易數據及資料時,資料會於短暫時間內大量增加並累積,其增加的量及速率都超出現有分析工具所能處理的範圍。此外,以資料集維度來觀察,我們發現於資料集中,並非只有物件會在短期間內大量增加,屬性維度亦有相同趨勢。為因應此趨勢,本研究提出一適用於巨量資料分析的增量式約略集合規則歸納法,此模型考量資料集中物件增加及屬性增加兩種維度的議題。可有效運用增量式演算法的特性,有效率的更新規則且節省大量計算時間。 本研究以台灣知名的電視購物台資料為例,實行的結果顯示,本研究所提出的增量式規則歸納法能於短時間內因應新增資料有效更新規則,其效率及分類的正確率及覆蓋率都較傳統方法優異。此結果說明增量式規則歸納法可作為企業處理巨量資料分析時的解決方案,其所產生的規則更可作為企業決策支援及策略評估的重要指標。

並列摘要


Rough set-based rule induction is able to generate decision rules from a database and has mechanisms to handle noise and uncertainty in data. Using these meaningful decision rules, the technique facilitates managerial decision-making. However, databases are used to run the day-to-day operations of a business must process quickly. Large volumes of data are continually updatedwithin a short period of time. The infrastructure required to analyze such large amounts of data must be able to support a deeper analysis, to deal with extreme data volumes, to allow faster response times, and to automate decisions based on analytical models. This study proposed a rough set-based rule induction approach with consideration of both incremental objects and attributes. It is able to deal with the big data issue for rule induction while the data are incrementally added into the dataset. The method eliminates the necessity to re-compute the entire dataset when the database is updated. As a result, huge amounts of computation time and memory space are saved. The proposed model is composed of five main steps: case determination, reduct generation, significance calculation, rule induction, and rule tuning. A case study of a Home shopping company is used to show the validity and efficiency of this method. The results show that the proposed model considerably reduces the computing time for inducing decision rules, while maintaining the quality of the rules.Since this subject has rarely been the subject of previous study, it is believed that this study will form the basis for the solution of many other similar problems of big data analytics.

參考文獻


[2] BakIrlI, G., Birant, D., and Kut, A., “An incremental genetic algorithm for classification and sensitivity analysis of its parameters,”Expert Systems with Applications,Vol. 38,No. 3, 2011, pp. 2609-2620.
[4] Bazan, J. and Szczuka, M., “RSES and RSESlib - A Collection of Tools for Rough Set Computations,”The Second International Conference on Rough Sets and Current Trends in Computing, 2001, pp. 106-113.
[7] Bizer, C., Boncz, P., Brodie, M. L., and Erling, O., “The meaningful use of big data: Four perspectives- Four challenges,”ACM SIGMOD Record,Vol. 40,No. 4, 2011, pp. 56-60.
[11] Chakhar, S. and Saad, I., “Dominance-based rough set approach for groups in multicriteria classification problems,”Decision Support Systems,Vol. 54,No. 1, 2012, pp. 372-380.
[12] Chen, C. Y., Hwang, S. C., and Oyang, Y. J., “A statistics-based approach to control the quality of subclusters in incremental gravitational clustering,”Pattern Recognition,Vol. 38,No. 12, 2005, pp. 2256-2269.

被引用紀錄


許聿慎(2014)。應用混合切割進行分散式資料庫配置〔碩士論文,國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2014.00729

延伸閱讀