一個在不確定性資料上計算ε連接的有效方法

連接運算是在資料庫中屬於非常重要的運算方式之一，連接運算的主要目的是從兩個資料庫中找出相同或是相似的物件，例如：感測資料庫、適地性服務或是臉部辨識系統，這些都是連接運算的應用之一。像是感測資料庫中，就會有相當大量的感測資料，如果使用者想要找出相同溫度或是濕度的地區，就會使用到資料庫中的連接運算，連接運算可以在兩個資料表中找出相同屬性的物件，而適地性服務也會有相當多的地理資訊，如何在短時間內找出使用者的需求是相當重要的，如果有一個使用者想要找出附近的餐廳有哪些，必須將所有的屬性為餐廳的物件找出後，並檢查那些物件是在使用者的附近，必須交叉比對後才能將所有符合連接運算條件的結果找出來。除了資料量龐大的因素之外，還會有不確定性的誤差來影響計算的進行，例如感測網路就會因為外在環境的變化造成感測資料的不確定性，基本的連接運算就已經會造成相當大的運算消耗，加上了不確定性資料後，更是會增加當中運算的成本。不但要考慮到不確定性的因素，還要將有可能會符合連接運算條件的物件組合找出來是相當重要的問題。本篇論文將連接運算與不確定性的資料結合，提出了能夠快速進行連接運算的方法，在最後的實驗將會比較本篇論文提出了方法相對傳統方法減少的時間。在實驗的結果中，本篇論文的方法相對於傳統連接運算減少了相當大量的計算時間，對於大量甚至是高維的資料，都能夠在短時間且有效率地找出連接運算的結果。

關鍵字

不確定性資料； ε連接運算；計算時間； R-tree

並列摘要

Join process is a very important computation in the relation database. The main purpose of the join process is to find the object pairs from two database relation and the object pairs have the same attribute or similarity. There are many join application around our life e.g. sensor database, location based service and face recognition system. These applications have their database, and the amount of data set is very big. There are many sensed data in sensor database and location information in LBS database. The join process has to check all the ob-ject combinations in two database. Except the big data set, exist another factor to increase the join process. That is uncertainty of data. The environment and wireless network will cause data yield uncertainty in sensor network. The traditional join will cause a lot of computing and uncertainty of data will let join process become more difficult to compute. It is an important problem to join uncertain data. It necessary to solve the join combine with uncertainty and reduce the compute consumption. This paper propose a join process will re-duce the join computing on uncertain data and high dimensional data. Finally, experimental will show the improvement to prove the benefit of this algorithm.

並列關鍵字

Uncertain Data ； Epsilon join ； Computation time ； R-tree

參考文獻

[1] Reynold Cheng, Sarvjeet Singh, Sunil Prabhakar, Rahul Shah, Jeffrey Scott Vitter and Yuni Xia, “Efficient Join Processing over Uncertain Data,” Proceedings of the 15th ACM international conference on Information and knowledge management, New York, 2006,pp. 738-747

[21] Mohamed A. Soliman and Ihab F. Ilyas, “Ranking with Uncertain Scores,” IEEE 25th International Conference on Data Engineering, 2009, pp.317-328

[2] C. Aggarwal and P. Yu.,”A survey of uncertain data algorithms and applications,” IEEE Transactions on Knowledge and Data Engineering, 2009, pp:609–623

[6] Dmitri V., Dmitri V. and Dmitri V., “Fast similarity join for multi-dimensional data,” ACM Journal Information Systems., 2007, pp.160-177

[13] Nick Koudas and Kenneth C. Sevcik, ”High Dimensional Similarity Joins:Algorithms and Performance Evaluation,” IEEE Transactions on Knowledge and Data Engineering, 2000, pp.3-18

國際替代計量

一個在不確定性資料上計算ε連接的有效方法

未授權

主題瀏覽