近年來由於機器學習及人工智慧技術快速發展,各產業都開始注重數據資料的分析與應用,然而這些數據在蒐集時大多都是龐大且雜亂無序,將造成無法直接從原始數據中挖掘出想要的資訊,因此原始數據在進行數據挖掘分析前必須先做資料前處理。在本篇論文中,我們將介紹一個數據挖掘分析軟體RapidMiner,並且利用Python撰寫相同分析功能的程式進行性能比較。本篇論文除了介紹RapidMiner的基本分析流程,也會展示四項案例,包括房價預測、聲納分析、香蕉分類、鐵達尼號生存率預測,來進行線性回歸、決策樹和支持向量機等機器學習模型的測試。本篇論文採用RapidMiner進行分析的主要原因,提供了一個簡單方便的資料挖掘工具,可讓不具有資訊相關背景的研究者使用圖形化介面進行操作分析,最後並展示在相同背景參數的情況下與Python所撰寫的程式效果進行比較。
In recent years, companies in industry have gradually begun focusing on the data analysis because of the rapid development of machine learning and artificial intelligence technology. However, large volumes of raw data are collected each day. The raw data collected is often contains too much data to analyze it sensibly. This is especially so for research using computers as this may produce large amounts of data. Raw data processing is required in most surveys and experiments. At the individual level, data needs to be processed because there may be several reasons why the data is an aberration. In this paper, we introduce a big data mining analysis software (RapidMiner) and compare its performance with programming by Python. Except for presenting the basic operation of RapidMiner, four cases including house price prediction, sonar classification, banana classification, and Titanic survival rate prediction, are performed using three machine learning models, which are the linear regression, decision tree and support vector regression. The main reason for using RapidMiner is the graphical interface operation. This will allow non-programming researchers to carry out a simple and convenient analysis by RapidMiner. We will also show the comparison of performance with Python program in the same background parameters.