有效索引不確定性資料的方法

典型的資料管理是將取得的資訊假設為絕對正確，但是在某些情況與應用環境下，所取得的資訊並不是精確的，而是一個估計值或是一個統計值 (e.g. 平均值、變異數)。在本論文假設每一個不確定資料物件資訊為一個機率分佈(本論文以高斯分佈)，這種資料稱為不確定資料，本論文即考慮如何有效管理不確定資料。在一些不確定資料上進行查詢時，必需以物件之機率分佈來做判斷、篩選、與計算，由其物件數量龐大的時候，只為了找尋特定條件的物件就必需將所有的物件全數看過，相當費時且沒有效率。為了增加系統效能，可以將資料加上索引而與以管理。我們針對不確定性資料提出一個線段為基本的方式，有更好的效能，且能支援多種不同的查詢。本論文提出的索引方式為US-Trees (Uncertain Segment-Trees)和US+-Trees，能對不確定資料做有效的管理，並能支援多種查詢種類，包括了單點查詢、範圍查詢、與最重要k個物件查詢。除了介紹如何維護US-Trees (US+-Trees)及各種查詢方法外，在實驗部分，本論文並與MV-Trees做效能上的比較，包括了I/O次數與精確度等。實驗結果顯示，US-Trees (US+-Trees)除了建構與管理容易外，亦能夠有效地支援多種的查詢。

關鍵字

不確定資料；索引；精確度；樹狀結構；查詢原理； I/O次數

並列摘要

Typical data management assumed the information is absolutely right. In some kinds of environment, the information obtained is not accurate, it’s implies a estimate or statistics (e.g. Mean, variance, etc.). We say that the uncertain data, and we assume the probability distribution is Gauss distribution. In this thesis, we consider how to management of uncertain data effectively. It is time consuming and Inefficient to searching the specific objects in uncertain datasets, especially in large number of data. We use a segment-based method (US-Tree) to index uncertain data, and the method could support many kinds of queries. The system using linear searching when the query didn’t support, that is in great disappointment. We compare the performance of our proposed indexing structure in randomness provider with MV-Tree method. Our method is more efficient in experiments, and the method could support many kinds of queries.

並列關鍵字

Uncertain Data ； Indexing ； accuracy ； tree structures ； Query Processing ； Number of I/O

參考文獻

[10] Beckmann, N., Kriegel, H., Schneider, R., Seeger, B. “The R*-Tree: AnEfficient and Robust Access Method for Points and Rectangles,” Proceedings of SIGMOD, 1990.

[11] Cheng, R. Jinchuan Chen Mokbel, M. Chi-Yin Chow, “Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data,” Proceedings of International Conference on Data Engineering (ICDE) 2008, pages 937-982.

[12] Douglas Comer: "The Ubiquitous B-Tree,” Proceedings of ACM Computing Surveys 11(2): 121–137 (1979).

[15] Reynold Cheng, Jinchuan Chen, Mohamed Mokbel, Chi-Yin Chow, “Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data,” Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE), pp.973-982.

[1] Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar, “Evaluating Probabilistic Queries over Imprecise Data,” Proceedings of the ACM Special Interest Group on Management of Data (ACM SIGMOD 2003), pp. 551-562.

Google Scholar

國際替代計量

有效索引不確定性資料的方法

全文下載

主題瀏覽