透過您的圖書館登入
IP:18.116.62.45
  • 學位論文

資料串流環境下查詢方案的最佳化策略

An Optimization Strategy for Efficient Query Execution over Streaming Sources

指導教授 : 陳良弼

摘要


隨著網際網路的發達,網路的安全也日益重要!因應網路的犯罪及病毒的流竄,各網際網路服務提供者開始監控網路的資料串流,為了能夠找出網路中不正常的使用者位址,必須對通過網路伺服器或是網路交換器的各個串流的資料做連接。在已知的技術中,多個資料串流的連接有下列幾種查詢計劃,分別是(1)二元連接,(2)多元連接,和(3)綜合連接。在不一樣的網路環境中,例如流速以及連接率,對不一樣的連接方式會產生不一樣的處理時間,為了因應網路的龐大資料量,我們必須根據不一樣的流速及連接率找到處理時間最少的查詢計畫,也稱為查詢最佳化。根據資料串流的數目,我們產生所有的查詢計畫。窮舉法對所有產生的查詢計畫計算出各自的費用,而後找出最佳計畫。根據實驗,雖然窮舉法總是可以為我們找到最佳的查詢計畫,但是最佳化所需的時間以及記憶體最多只允許我們處理6∼7個資料串流。窮舉法導致時間及記憶體的龐大。因此,我們必須減少產生出的查詢計畫,因此我們提出貪婪法。在產生查詢計畫的步驟上,對在加入資料串流後所產生的所有查詢計畫,根據各個還未加入資料串流的資訊先行計算出所有計畫的期望費用,我們只對有最小期望費用的計畫產生新的查詢計畫,因此我們每次比較所有已經產生出計畫的期望費用,我們將此費用訂為臨界值,比較此臨界值與所有未完成計畫的期望費用,若有任何查詢計畫的期望費用小於此臨界值,此臨界值的查詢計畫將是我們最終的近似最佳計畫。根據實驗,雖然我們無法保證貪婪法能夠每次都找到最佳計畫,但是我們找到的計畫都相當接近最佳計畫,而且可以應用於20個資料串流以上的環境。因此,我們提出的方法可以在時間及空間上有效的解決多資料串流連接的問題。

並列摘要


Continuous queries over data streams, particularly the joins of streams, have gained popularity as the scope of their applications has increased in the past years. Applications range from network monitoring to sensor processing for environmental monitoring or inventory tracking. The cost of evaluating such queries over streaming sources may vary according to the order in which the joins of streams are processed. In order to lower the cost of executing a query, the query optimizer needs to generate an execution plan that better fits the current conditions of the environment. Existing optimizers try to resolve the above problem by finding a better probing order for multi-way join operators or choosing a better sequence for the binary join operators. However, there are cases where the performance of a hybrid plan (query plan containing both types of operators) exceeds the performance of query plans composed of a single multi-way operator or trees binary join operators. We address the problem of finding a low-cost execution plan in order to execute continuous multi-way join queries over infinite data streams. The search space encompasses plans consisting of a single multi-way operator, plans composed of binary join operators, and hybrid plans. We propose heuristics with a partial cost-based optimization technique to address the three main components of a query optimizer, namely the search space, the cost model and the search strategy. The cost model is used to evaluate all feasible query plans and the heuristics are used to prune the candidates that cannot lead us to good plans. In our work, we evaluate the performance of the proposed approach by comparing the time needed to produce the low-cost query execution plan and quality of our result with the optimum solution and the single multi-way operator with probing order. The result shows that our methodology can find a better plan for the current environment and that is close to the optimal query plan.

參考文獻


[1] A. Arasu, B. Babcock, S. Babu, J. McAlister, and J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In Proc. of the ACM PODS Symp. on Principles of Database Systems, 2002, pp. 221-232.
[2] A. M. Ayad and J. Naughton. Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 2004.
[3] B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Stream Systems. In Proc. of the ACM PODS Symp. on Principles of Database Systems, 2002.
[4] S. Babu, and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, 2001, vol. 30.
[5] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS Transactions on Database Systems, 2004, vol. 29.

延伸閱讀