透過您的圖書館登入
IP:18.216.34.146
  • 學位論文

雲端環境中有效率之地理資料配置研究

An Efficient Geometry Data Allocation Algorithm in Cloud Environments

指導教授 : 彭文志

摘要


如今,越來越多的地理位置服務被發明出來,這些地理位置服務通常需要大量的地理資料運算而造成地理位置服務的工作量都十分的高,因此我們希望可以透過雲端運算的技術使得地理位置服務的工作量可以分散地執行。然而,即使我們將地理位置服務放置於雲端環境上執行,工作量仍然沒有被分散地執行,因為地理資料並沒有均勻分佈。為了可以使用到最多的機器做運算,我們提出了一些地理資料配置方法。直觀上,我們可以將地圖像貼磁磚似地切割成數個等大的區域,接下來將每塊區域分派至不同的機器,當需要儲存地理資料時,就可以找出這一比地理資料所屬的區域然後將資料儲存至此區域相對應的機器中,但不幸的是,在同一個區域的地理資料彼此是相靠近的,當我們處理查詢範圍的地理資料運算時,仍然需要非常大的工作量。為了解決這個問題,我們提出了一個新的地理資料配置方法「Reversed K-means」,運用這個方法可以將彼此相靠近的地理資料分散在不同的機器之中,因此在執行一個查詢範圍的地理資料運算時,可以用到更多的機器去做運算因為所需的資料是被儲存在很多的機器之中,藉此提升地理資料運算的效能。為了評估我們所提出方法的效能,我們評估了執行地理資料運算所需的機器數量以及所需的運算時間。實驗結果證明,機器的用量是比現有的方法多以及所需時間也是在所有方法中最小的。

並列摘要


The number of location-based services is growing and developing. Usually, these services put a huge amount of effort into geometry data computation. Thus, their workload is generally high. By exploring cloud computing techniques, one could utilize a number of computing nodes to distribute the workload of the systems. However, the workload is usually not equally balanced across computing nodes, if data is not well-distributed. To make the best use of computing nodes, we propose a sophisticated data distribution technology for geometry computation processing. Intuitively, one can simply divide geometry data into tiles so that the geometry data in each tile can be stored on one computing node. Unfortunately, since data in a tile shares spatial-proximity, processing a geometry computation on spatial-proximity data still incurs a huge workload. To address this issue, we propose a new data distribution approach, Reversed K-means, to distribute geometry data that shares spatial-proximity across different computing nodes. In this way, we can use more computing nodes to process geometry computation and get better performance. To evaluate the performance of our proposed algorithm, we evaluate the utility of computing nodes and the response time when performing geometry computations. The experimental results show that the utility of the computing nodes is higher than existing methods, and the response time is the fastest of all methods.

參考文獻


[4] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads.Proceedings of the VLDB Endowment, 2(1):922–933, 2009.
[5] N. Bonvin, T. Papaioannou, and K. Aberer. Cost-efficient and differentiated data availability guarantees in data clouds. In Proceedings of the 26th International Conference on Data Engineering, pages 980–983. IEEE, 2010.
[8] M. Eltabakh, Y. Tian, F. ¨Ozcan, R. Gemulla, A. Krettek, and J.44McPherson. Cohadoop: flexible data placement and its exploitation in hadoop. Proceedings of the VLDB Endowment, 4(9):575–585, 2011.
Press, 1996.
[10] A. Guttman. R-trees: a dynamic index structure for spatial searching, volume 14. ACM,1984.

延伸閱讀