RapidMiner 與 Python 進行資料探勘案例分析及比較

近年來由於機器學習及人工智慧技術快速發展，各產業都開始注重數據資料的分析與應用，然而這些數據在蒐集時大多都是龐大且雜亂無序，將造成無法直接從原始數據中挖掘出想要的資訊，因此原始數據在進行數據挖掘分析前必須先做資料前處理。在本篇論文中，我們將介紹一個數據挖掘分析軟體RapidMiner，並且利用Python撰寫相同分析功能的程式進行性能比較。本篇論文除了介紹RapidMiner的基本分析流程，也會展示四項案例，包括房價預測、聲納分析、香蕉分類、鐵達尼號生存率預測，來進行線性回歸、決策樹和支持向量機等機器學習模型的測試。本篇論文採用RapidMiner進行分析的主要原因，提供了一個簡單方便的資料挖掘工具，可讓不具有資訊相關背景的研究者使用圖形化介面進行操作分析，最後並展示在相同背景參數的情況下與Python所撰寫的程式效果進行比較。

關鍵字

支持向量機；決策樹；線性回歸；機器學習；資料探勘； RapidMiner

並列摘要

In recent years, companies in industry have gradually begun focusing on the data analysis because of the rapid development of machine learning and artificial intelligence technology. However, large volumes of raw data are collected each day. The raw data collected is often contains too much data to analyze it sensibly. This is especially so for research using computers as this may produce large amounts of data. Raw data processing is required in most surveys and experiments. At the individual level, data needs to be processed because there may be several reasons why the data is an aberration. In this paper, we introduce a big data mining analysis software (RapidMiner) and compare its performance with programming by Python. Except for presenting the basic operation of RapidMiner, four cases including house price prediction, sonar classification, banana classification, and Titanic survival rate prediction, are performed using three machine learning models, which are the linear regression, decision tree and support vector regression. The main reason for using RapidMiner is the graphical interface operation. This will allow non-programming researchers to carry out a simple and convenient analysis by RapidMiner. We will also show the comparison of performance with Python program in the same background parameters.

並列關鍵字

support vector machine ； decision tree ； linear regression ； machine learning ； data mining ； RapidMiner

參考文獻

[1] Thirunavukkarasu K, Dr.Manoj Wadhawa, " ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER," International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.1, February 2016.

Google Scholar

[2] Abbas M. Abd, Suhad M. Abd, " Modelling the strength of lightweight foamed concrete using support vector machine (SVM)," Case Studies in Construction Materials, June 2017, vol. 6, pp. 8-15.

Google Scholar

[3] IVÁN GARCÍA-MAGARIÑO, GERALDINE GRAY, RAQUEL LACUESTA, RAQUEL LACUESTA, JAIME LLORET, " Survivability Strategies for Emerging Wireless Networks With Data Mining Techniques: a Case Study With NetLogo and RapidMiner," Received February 15, 2018, accepted March 25, 2018, date of publication April 23, 2018, date of current version June 19, 2018.

Google Scholar

[4] Isaac Triguero, Sergio Gonz´alez, Jose M. Moyano, Salvador Garc´ıa, Jes´us Alcal´a-Fdez, Juli´an Luengo, Alberto Fern´andez, Maria Jos´e del Jes´us, Luciano S´anchez, Francisco Herrera, " KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining," International Journal of Computational Intelligence Systems, Vol. 10 (2017) 1238–1249.

Google Scholar

[5] Alexander Fillbrunn, Christian Dietz, Julianus Pfeuﬀer, René Rahn, Gregory A. Landrum, Michael R. Berthold, " KNIME for reproducible cross-domain analysis of life science data," Journal of Biotechnology 261 (2017) 149–156.

Google Scholar

國際替代計量

RapidMiner 與 Python 進行資料探勘案例分析及比較

主題瀏覽