透過您的圖書館登入
IP:18.191.135.224
  • 學位論文

批次資源調配演算法

Resource Partition and Placement Strategy for Distributed Database Systems

指導教授 : 劉邦鋒
共同指導教授 : 吳真貞(Jan-Jan Wu)

摘要


分散式資料庫集合了計算機網路中多個邏輯相關資料庫。以 BigTable 為例,其為一高可用性、高效率,且可擴展之分散式資料存儲系統。BigTable 將資料存儲為鍵值對,其中鍵由行、欄與時間戳記組成,值則為一字串。欄族 (column family) 為一群必須存儲於同一檔案的欄。若資料庫需要取得欄族中的某些欄,則必須將整個欄族的欄全部取出。 本論文探討將欄分割為數個欄族以優化一系列的查詢。於只考慮處理查詢時取得欄族的代價,且已知每個查詢需要取得之欄位,以及所有欄之代價的情況下,求出最低總代價方案。當需要取得欄族任一欄位時,必須將整個欄族全部取回,因此這是個有趣且有挑戰性的問題。 以下為本論文之研究貢獻。我們為欄位分割問題建構數學模型,並明確定義目標函數 與花費度量。我們證明最小化花費之欄族分割問題為 NP-complete,即便是分割為 2 個欄族或欄位花費皆相同亦如此。我們也證明最大化花費之欄族分割問題為 NP-complete。相對地,對於在分割為 2 個欄族,欄位花費皆相同,且分割必須為完美分割 (perfect split) 的限制下,最小化花費問題存在多項式時間演算法。我們在實驗結果中驗證了對於此問題所提出的多項式時間演算法之效率。

關鍵字

資料庫 查詢優化

並列摘要


A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. For exmaple, BigTable [10] is a distributed data storage system with high availability, performance, and scalability. BigTable stores data in a key-value pair, where the key consists of its column, row, and timestamp and the value is a string. A column family is a collection of columns that must be stored in a file, and the system must retrieve all columns in a column family if any column in that column family is needed. The goal of this paper is to optimize a series of queries by partition the columns into several column families. Each query needs to access certain columns, and we would like to reduce the cost. We assume that that the cost of every column is given, and we consider only the cost in fetching column families while answering a query. Since retrieving any column within a column family means fetching the entire column family, this is an interesting yet challenging optimization problem. The contributions of this paper are summarized as follows. We formulate a mathematical formulation of the column partition problem in which the object function and cost measurement are clearly defined. We also prove that minimizing the expense of a column partition is NP-complete, even when The number of partition is 2, or the cost of each column is the same. We also show maximizing the expense of a column partition is also NP-complete. However, a polynomial time algorithm exists for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint. Experiment results that verify the performance of our proposed algorithm for minimizing the expense of a column 2-partition with uniform column cost and perfect split constraint.

並列關鍵字

database query optimization

參考文獻


optimization in distributed mediator systems. SIGMOD Record, 25(2):137--146, June 1996.
[2] Gunnar Andersson and Lars Engebretsen. Better approximation algorithms for set splitting
[3] Apache Software Foundation. Cassandra. http://cassandra.apache.org.
[4] Apache Software Foundation. HBase. http://hbase.apache.org.
[5] Luitpold Babel, Hans Kellerer, and Vladimir Kotov. The k-partitioning problem. Mathemat-

延伸閱讀


國際替代計量