資料採礦技術對資料庫加值成效評估之研究

在資訊科技的時代，資料對組織而言扮演著資訊來源的重要角色，當面臨資料有遺漏或不足之不完美資料庫時，從資料庫所得之結果可能提供有偏差或誤導的解決辦法；因此，對資料庫進行插補遺漏值及函數映射加值，已成為資料採礦之主要步驟之一。當有目標資料庫與輔助資料庫時，可以利用函數映射方法使資料庫整合為一個大資料庫，即為加值後的資料庫，本研究之目的為當資料庫加值後，評估資料之架構及正確性。依不同的資料型態而使用不同的資料採礦技術建立插補及加值模型，連續資料使用迴歸分析及類神經網路，類別資料使用羅吉斯迴歸、類神經網路、C5.0 及 CART建立預測模型。本研究以RMSE、正確率及Kappa統計量評估插補及加值資料庫之結果，研究結果顯示，對連續資料而言，迴歸分析提供最好的估計，但類別資料大部分以C5.0之結果較好。應用插補及函數映射使資料庫加值並增加大量的資料及資訊量，經過評估後，資料庫加值確實有其效果，對於進行資料採礦極具助益。

關鍵字

資料採礦；遺漏值；插補；函數映射；資料庫加值； C5.0 ； CART ； BPNN

並列摘要

Data plays a vital role as a source of information to organizations, especially in times of information and technology. One encounters a not-so-perfect database from which data is missing or insufficient, and the results obtained from such a database may provide biased or misleading solutions. Therefore, imputing missing data and functional mapping to a database has been regarded as one of the major steps in data mining. A goal database and an auxiliary database utilizing functional mapping make the database combine as a great database, the purpose of this research is to evaluate the structure of the data when the database has been value-added. The present research used different methods of data mining to construct imputative and value-added models in accordance with different types of data. When the missing data is continuous, regression models and Neural Networks are used to build predictive models. For the categorical missing data, the logistic regression model, neural network, C5.0 and CART are employed to construct predictive models. In this research use RMSE , accuracy rate and Kappa statistic to examine the results of imputation and value-added database. The results showed that the regression model was found to provide the best estimate of continuous data; but for categorical data, the C5.0 model proved the best method.After the assessment of the data, using the imputation and functional mapping makes the database add value and increase the amount of information of the data. The value-added database really has its effect because the increase of the amount of information is good for the database that will carry on data mining.

並列關鍵字

Data mining ； Missing data ； Imputation ； Functional Mapping ； Value-added database ； C5.0 ； CART ； BPNN

參考文獻

陳信木、林佳瑩（1996）。調查資料之遺漏值的處理－以熱卡插補法為例。調查研究，第3卷：75-106。

陳順宇(1996)，迴歸分析。台北：華泰文化事業股份有限公司。

黃登源 (1998)，應用迴歸分析。台北：華泰文化事業股份有限公司。

李卓翰(2003)，資料倉儲理論與實務。台北：學貫行銷股份有限公司。

Alan, A. (1996). An Introduction to Categorical Data Analysis. Wiley interscience.

國際替代計量

資料採礦技術對資料庫加值成效評估之研究

全文下載

主題瀏覽