一個有效處理多個連續性Top-K查詢的方法

近來, 資料串流的查詢與分析已廣泛的使用在各種應用中。大部份這些關於資料串流的應用會應用到連續性top-k查詢。至於什麼是top-k查詢呢?就是系統根據一個使用者定義的集合函式,如：max、min、或sum函式，去找出目前k個排名最高的答案。以前的研究，大部份的演算專注在一次性top-k查詢上。一次性top-k查詢意指此查詢只要執行一次即可，但這些演算法是不適用在處理連續性的top-k查詢上。因為如果目前的結果沒發生改變，多執行的演算法次數都是浪費系統運算與溝通資源。而雖然也有演算法提出處理連續性top-k查詢，但它們缺少資訊共享的機制以減低過重的系統負荷，當連續性top-k查詢的量很大的時候。在這論文中，我們提出了有效處理多個連續性top-k查詢的方法。舉例來說，假設有m個伺服器，每一個伺服器有同樣的網頁內容，那麼我們會保留些已被這些伺服器傳送過來的網頁id，並且使得它們的點擊數一定落在一個限定的範圍內。我們稱此種結構為Ranked List Table。那麼系統可以嘗試著先運用此結構算出每一網頁的總點擊率而得到目前最受歡迎的網頁。我們亦證明了運用此結構得出的結果一定是正確的。此外，我們提出了一個啟發式的伺服器間的存取策略，以至於系統能以更省系統資源的方式盡快的得到答案。我們的實驗結果顯示了我的提出的方法是實際且有效率的。

關鍵字

Top-K查詢

並列摘要

Recently, querying and analyzing data streams are widely applied in various applications. Most of these applications issue continuous top-k queries over multiple data streams. A top-k query is one that finds the k highest ranked answers to a user defined aggregate function such as max, min, or sum functions. In previous works, most methods focus on one-time top-k query algorithms. One-time top-k queries mean that the top-k query is only executed once. These one-time top-k query algorithms are not appropriate for processing continuous top-k queries. Many operations can be executed for nothing if the answer remains unchanged. There are algorithms for processing the continuous top-k query, but they lack of the mechanisms for sharing information to reduce the heavy bandwidth load for processing multiple continuous top-k queries. In this thesis, we propose a method for efficiently processing multiple continuous top-k queries. For example, assume that there are m servers, each with identical copies of the web content. We keep the pages that have been sent from the servers. The hit of each page is limited to a specific range. These pages and ranges are maintained in the structure named Ranked List Table in the coordinator so that the coordinator can try to decide the most popular pages by summing up the range of each page. Besides, we prove the correctness of the top-k results in the thesis when the coordinator calculates the top-k results by using the structure. However, the top-k result may not be decided by using the structure. We also propose a heuristic access order strategy to schedule the order of accessing servers so that the coordinator can get the accurate hits of the pages to decide the top-k result as early as possible. Our preliminary experimental results show that our approach is practical and efficient.

並列關鍵字

Top-K Query

參考文獻

[3] R. Fagin. Combining fuzzy information from multiple systems. In J. Comput. System Sci., pages 58:83-99, 1999.

[4] S. Nepal and M. V. Ramakrishna. Query processing issues in image (multimedia) databases. In ICDE, pages 22–29, 1999.

[6] R. Fagin, A. Lotem , and M.Naor. Optimal aggregation algorithms for middleware. In Symposium on Principles of Database Systems, 2001.

[7] S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In VLDB’99, pages 397–410, 1999.

[8] N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, 2002.

延伸閱讀

張玉圖（2016）。針對路徑群組的一個有效率的天際線查詢演算法〔碩士論文，逢甲大學〕。華藝線上圖書館。https://doi.org/10.6341/fcu.M0219529
賴妤函（2011）。運用高效率多變量分類器於多目標分類問題〔碩士論文，國立臺中科技大學〕。華藝線上圖書館。https://doi.org/10.6826/NUTC.2011.00063
溫福星（2011）。多層次研究中迴歸係數差異與多重中介效果的比較：一次估完多層次架構的好處。αβγ量化研究學刊，3(1)，1-19。https://www.airitilibrary.com/Article/Detail?DocID=P20110516001-201106-201109140005-201109140005-1-19
Memon, N., Vighio, M. S., Nizamani, S. Z., Memon, N. A., Memon, A. R., & Shaikh, U. R. (2015). Analysis of Query Processing and Optimization. Bahria University Journal of Information & Communication Technologies, 8(1), 14-20. https://www.airitilibrary.com/Article/Detail?DocID=19994974-201504-201609020001-201609020001-14-20
Koh, J. L., & Chen, A. L. P. (2002). An Approach to Querying Multiple Object Databases. Journal of Information Science and Engineering, 18(2), 281-310. https://doi.org/10.6688/JISE.2002.18.2.5

國際替代計量

一個有效處理多個連續性Top-K查詢的方法

主題瀏覽