分散式系統之資源安置與排程

Applying distributed systems is a typical solution for data intensive applications to collect large computational power to handle the enormous data. To enhance overall performance of the distributed systems, we need to address two important groups of problems about how to manage the distributed resources. The first group is how to place the resources at the proper locations of the network to achieve load balance, and the second one is how to schedule the requests of the shared resources to reduce the overhead caused by the requests that share the same resources. In the first problem group, we investigate the I/O server placement and data replica placement. Parallel I/O techniques can help solve the serious bottleneck of performance caused by I/O. However, switch-based clusters of workstations/PCs and distributed systems typically adopt general topologies to allow the construction of scalable systems with incremental expansion capability. These general topologies lack many of the attractive mathematical properties of regular topologies, which makes optimizing parallel I/O performance on general networks a difficult task. Therefore, we optimize server placement for parallel I/O in switch-Based clusters to balance the workload among the I/O servers. In addition, data replication is a typical strategy for improving access performance and data availability in distributed systems with data intensive applications (especially in Data Grids). The existing works usually focus on the infrastructure for data replication and the mechanism of replicas creation and deletion, but the important problem of choosing suitable locations for placing replicas has not been fully studied. Thus, we also address replica placement problem in Data Grids. In the second problem group, we discuss parallel I/O scheduling and multicast scheduling. The lack of global information about I/O traffic between computing nodes and I/O servers impose new challenges in optimizing parallel I/O for distributed systems. Therefore, we develop two distributed algorithms for parallel I/O scheduling with non-uniform data sizes. Moreover, multicast is an important communication pattern, with applications in collective communication operations, and the bandwidth limitation of the links in the routing tree for general topologies make multicast scheduling critical. Thus, we propose an agent based multicast algorithm that guarantee contention free multicast by exploiting the properties of routing tree for general network. Major contributions of this dissertation are summarized as follow. First, in I/O server placement, we formulate the problem as a weighted bipartite matching with the goal of balancing the workload on the I/O servers, and we propose an efficient algorithm to find an optimal solution. To minimize link contention among the subclusters connected as a general topology, we devise a tree-based heuristic algorithm to assign servers among subclusters. Our simulation results demonstrate that our best algorithm is near-optimal in some cases. Second, in replica placement in a Data Grid, we propose a placement algorithm that finds optimal locations for replicas so that the workload among the replicas is balanced, and we also propose an algorithm that determines the minimum number of replicas when the maximum workload capacity of each replica is given. Third, in parallel I/O scheduling problem, we propose distributed scheduling algorithms, and our experimental results indicate that our algorithms yield parallel performance within 6% of the centralized solutions. We also compare the performance of our algorithms with a distributed Highest Degree First method, which divides non-uniform data transfers into units of fixed-sized blocks. The experimental results show that our algorithms require less scheduling and data transfer time. Finally, in multicast scheduling for general networks, our experimental results demonstrate that our agent-based algorithm outperforms the most efficient algorithm reported in existing literature.

並列關鍵字

resource ； placement ； scheduling ； distributed system ； replica placement ； I/O server ； parallel I/O ； multicast ； I/O scheduling ； up-down routing ； Grid

參考文獻

[2] K. Ranganathan, A. Iamnitchi, and I. Foster, “Improving data availability through dynamic model-driven replication in large peer-to-peer communities,” in CCGRID ’02: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002, p. 376.

[3] H. Lamehamedi, B. Szymanski, Z. Shentu, and D. Deelman, “Data replication strategies in grid environments,” in ICA3PP ’02: Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002, p. 378.

[4] A. Chervenak, R. Schuler, C. Kesselman, S. Koranda, and B. Moe, “Wide area data replication for scientific collaborations,” in GRID ’05: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, 2005, pp. 1–8.

[6] W. H. Bell, D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, K. Stockinger, and F. Zini, “Evaluation of an economy-based file replication strategy for a data grid,” in CCGRID ’03: Proceedings of the 3st International Symposium on Cluster Computing and the Grid, 2003, p. 661.

[7] M. M. Deris, J. H. Abawajy, and H. M. Suzuri, “An efficient replicated data access approach for large-scale distributed systems,” in CCGRID ’04: Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid, 2004, pp. 588–594.

國際替代計量

分散式系統之資源安置與排程

全文下載

主題瀏覽