關聯式資料庫系統估計空值方法之探討

好的決策需建立在正確的資訊之上,關聯式資料庫系統中若資料存在空值(Null Values)往往會造成所需的資訊產生錯誤,因而做出了錯誤的決策,此結果往往會造成企業失去先機,甚至傷害、損失, 為了維持資料的正確性,如何精確地估計空值(Null Values)便成了一項重要的研究課題。本論文所提的方法是將關聯式資料庫中的資料先作分群處理(Clustering),以各群集為中心,探討影響估算屬性的資料屬性與欲估算屬性間的關係,以模糊相關係數、模糊判定係數方法求出各群集影響估計值的各屬性與估計屬性間的判定係數,進而推導出各群集屬性間的相對變異值,並以此相對變異值推導出各群集的估算規則;將欲估計的資料與各群集的群集中心比較,找出其距離最接近的群集(Cluster),將之歸入該群集,以該群集的估算規則求得估計值。估算結果因以群集為運算依據,減少了比對處理的時間,所求出的結果亦比目前已存在的估算方法具有更高的準確率。

關鍵字

關聯式資料庫系統；空值；模糊相關係數；模糊判定係數

並列摘要

Good decision needs to be based on correct information.A database system will not operate properly if any null value of attributes exists. This result will often cause enterprises to lose its niche or even result in loss. To estimate null values in relational database systems is an important research topic. This paper proposes a method to estimate null values in relational databases. We use clustering algorithm to cluster data and then use fuzzy correlation coefficient and fuzzy coefficient of determination methods to calculate the correlation of different attributes; then we derive out each cluster the relative variation value among attributes, and use this relative variation value to derive out the estimation rule to each cluster; then compare the to-be-estimated data with the cluster center of each cluster to find out the closest cluster in terms of distance, followed by including the cluster to get the estimated value with the cluster's estimation rule. The estimated result is calculated using clusters; therefore, time spending on comparison processing is significantly reduced and the calculated result is more accurate than existing estimation approaches.

並列關鍵字

Relational database systems ； Null values ； fuzzy correlation coefficient ； fuzzy coefficient of determination

參考文獻

[2] Chen S-M.; Kao C-H.; Yu C-H. Generating fuzzy rules from training data containing noise for handling classi cation problems. Cybernetics and Systems,33(7):pp.723–748, 2002.

[3] S. M. Chen and H. R. Hsiao. A new method to estimate null values in relational database systems based on automatic clustering techniques. Information Sciences, 169:pp.47–69, 2005.

[4] S.M. Chen and H.H. Chen. Estimating null values in the distributed relational databases environments. Cybernetics and Systems: An International Journal,31(8):pp.851–871, 2000.

[5] Ding-An Chiang and Nancy P. Lin. Correlation of fuzzy sets. Fuzzy Sets and Systems, 102(2):pp.221–226, 1999.

[6] S.M. Chen C.M.Huang. Estimating null values in relational database systems with a negative dependency relationship between attributes. Proceedings of the 2002

國際替代計量

關聯式資料庫系統估計空值方法之探討

主題瀏覽