透過您的圖書館登入
IP:18.217.15.58
  • 學位論文

以架構自覺之社群圖管理達到有效率地更新及取得資料

Architecture Aware Social Graph Management in Datacenters for Updating and Fetching Data Efficiently

指導教授 : 周承復

摘要


對於線上社群網路系統而言,如何有效率的管理及查詢大量的圖架構資料是一個日漸重要的問題,特別是要支援大量更新及取得資料的使用者要求。為了解決這個問題,系統會將社群圖資料切割分散到各個機器內,以便平行處理使用者的要求;而在這個狀況之下,處理大量要求的所需總時間以及處理單個要求的回應時間都被處理最慢的機器所限制住。然而,在過去的研究中大多都致力於降低在處理要求時所需的跨機器傳輸量,而非分散負載量以達到消除瓶頸及分攤時間成本。另外在雲端運算的幫助下,提供服務者可低成本地以動態租用不同等級之機器來逐漸升級硬體環境,所以我們在分散負載量時也應該注意到異質性的硬體環境。 在本篇論文中,我們提出理論性的分析以及提出我們的系統設計,其中包含三個在運行時提供動態調整的啟發式的方法,用以降低系統處理大量使用者要求所需的時間。首先,我們提出圖分割的方法以決定使用者的主資料及要求將由哪一台機器負責。第二,我們提出副本策略輔助系統做較佳的副本決定。最後,我們提出遠端讀取方針來輔助系統從多個遠端讀取候選機器中選取較佳的讀取目標。我們也運用了模擬器來評估我們的方法,而結果顯示我們的方法顯著地降低了系統處理大量要求的所需時間。

並列摘要


There is an increasing need for the systems of Online Social Networks (OSNs) to manage and query large volumes of graph-structured data efficiently, especially to support the massive {em update} and {em fetch} queries raised by users. To cope with this, the system partitions the social graph data across a set of machines and processes the user queries in parallel; therefore, the total time spent for processing all queries and the response time of queries are bounded by the machine which takes longest time. However, although there is much work on reducing the total inter-machine traffic during processing queries, up to date there is little work on distributing the loads to eliminate the bottlenecks and amortize the time cost. Further, since it is cost-effective to scale-up the hardware environment by reserving different classes of machines dynamically with the aid of cloud computing, we should also give consideration to the resulting heterogeneous environment when distributing the loads. In this paper, we provide theoretic analysis and propose the design of social graph data management system including three heuristic techniques to minimize the low total time spent for processing massive user queries at runtime. First, we propose a graph partitioning method to make the decision that which machine is in charge of the data and queries of each user. Second, we propose a replication strategy that dictate how replication decisions should be made. Finally, we propose a remote read policy that choose one from all remote replicas to read. We evaluate our method on simulator, and show that our system reduces the time spent significantly.

參考文獻


data center network architecture. In Proceedings of the ACM SIGCOMM
[2] Facebook reports first quarter 2013 results. http://investor.fb.com/releasedetail.
Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley,
[5] M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np-complete problems.
[6] Konstantin Andreev and Harald Racke. Balanced graph partitioning. In Proceedings

延伸閱讀