應用混合切割進行分散式資料庫配置

資料量快速成長使得企業面臨巨大的挑戰，分散式資料庫對於儲存日益增加的資料量是個有效的解決方案。而隨著巨量資料時代的來臨，資料表的欄位與記錄也越來越龐大，為了縮短查詢與分析巨量資料所需的時間，資料庫中的資料欄位與記錄都必須能夠快速擷取，也因此恰當的資料庫設計與配置愈顯重要。資料庫領域許多研究經常出現「資料切割」、「資料配置」等關鍵詞，也反映學術界目前正積極發展以資料切割為導向的資料庫設計方案。在實務上，資料最佳配置問題(Data Allocation Problem,DAP)是要將相關的資料放在同一個資料庫上，例如:一些經常存取的屬性、經常使用的查詢條件，以縮短因為查詢或資料修改時所需花費的資料表合併操作與回應時間。目前的研究著重在將資料進行垂直與水平切割，將其配置在企業的大型分散式資料庫中。在分散式資料庫的設計中，不管採取的是那一種分割的方式，最主要的設計考慮因素還是需依企業的需求而定。本研究旨在對於資料表欄位與記錄都能進行有效的分區以達到降低分析查詢的回應時間，因此結合了垂直分區與水平分區這種混合分區的方式提出一個以資料混合切割為基礎的兩步驟資料切割模式—(Vertical Horizontal Partitioning)VHP方法。於分散式新增實驗結果得知透過分散式新增的方法有效縮短了31%的時間，而在分散式查詢實驗中，由於透過混合切割無法完全將每筆查詢查詢交易中的記錄配置在同一張資料表中，但還是有超過半數的查詢查詢交易記錄能達到完美配置，其平均查詢時間降低了12.1%。

關鍵字

分散式資料庫；巨量資料；混合切割；資料配置

並列摘要

In recent years, enterprises are facing a great challenge because of the data amount within the enterprise’s database is growing dramatically, the distributed database is an effective solution for storing increasing amounts of data. However, the data columns and records in the table are growing accordingly to shorten the time for particular columns and records to accelerate analyze is an important issue. Data allocation and data partitioning as the important keywords of the database domain. That reflected academic community is now actively developing data partitioning-oriented database design. In practice, Data Allocation Problem(DAP) is to arrange relevant information on the same database for shorten the response time of table merge operations that data query or modify, for illustration particular query pattern frequently used. The current study focuse on the data vertical and horizontal partitioning, In the design of distributed databases, whether it is the way to take a partitioning, the main design considerations is required in accordance with the needs of enterprises. This study aimed to decrease the query response time analysis effectively of the table columns and records partitioning, for this reason, we propose the two-step data partitioning mode -- VHP methods based on mixed data partitioning that Combination of vertical partitioning and horizontal partitioning. Through experiments to test its feasibility. Distributed insert methods reduce 31% of the time effectively. In the distributed query experiments, more than half of the query transactions to achieve the perfect configuration records, the average query time reduced by 12.1%.

並列關鍵字

Distributed Database ； Big Data ； Mixed Partitioning ； Data Allocation

參考文獻

[30] 陳義雄，基於Cassandra資料庫之雲端資料建模：從SQL到NoSQL，碩士論文，電機工程學研究所，臺灣大學，台北市，2012。

[1] Leavitt, N., “Sorage Challenge: Where will all that big data go?” Computer, Vol. 46, No.9, 2013,pp. 22-25.

[2] D. Talia, "Clouds for Scalable Big Data Analytics," Computer, vol. 46, 2013,pp. 98-101.

[5] T. Wei, M. B. Blake, I. Saleh, and S. Dustdar, "Social-Network-Sourced Big Data Analytics," Internet Computing, IEEE, vol. 17, 2013, pp. 62-69.

[12] A. Lakshman, and P. Malik, "Cassandra: a decentralized structured storage system," ACM SIGOPS Operating Systems Review, vol. 44, 2010,pp. 35-40.

國際替代計量

應用混合切割進行分散式資料庫配置

全文下載

主題瀏覽