透過您的圖書館登入
IP:18.191.223.123
  • 學位論文

針對多執行緒程式的增強任務選擇策略與改進NUMA系統節點間負載平衡的研究

Improving the Inter-node Load Balancing with Enhanced Task Selection Policies for Multi-threaded Applications on NUMA Systems

指導教授 : 姜美玲
本文將於2024/07/19開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


NUMA 多核心系統將系統的資源分為多個節點,具備多核心系統平行運算 的高效能和擴展性的優點。然而,當處理器核心間負載不均時,會觸發作業系統 的負載平衡機制將process 搬移至不同的處理器核心去執行,由於process 可能被 跨節點搬移,使得遠端記憶體存取於搬移後發生。為了同時維持負載平衡與降低 遠端記憶體存取,先前的研究提出了kernel-level Memory-aware Load Balancing (kMLB)機制以增強Linux 核心的跨節點負載平衡,此機制修改了Linux 核心的 記憶體管理的相關函式及資料結構,以記錄每個task 在執行時所使用的記憶體 Page 分配在各個節點的數量,利用這些記憶體使用的資訊來推估各個task 最大 可能的遠端記憶體存取量,並提出了數個挑選task 的策略,挑選預期會減少最多 的遠端記憶體存取的task 來進行搬移。 在本研究中,我們針對多執行緒process 來考量跨節點負載平衡的問題。在 Linux 核心中將屬於同個process 的多執行緒組成一個群組,彼此共享相同的記 憶體空間。然而,在執行時各執行緒可能會被分配於不同的節點上執行,因此引 發不同程度的遠端記憶體存取。本研究除了發現先前研究中提出的最大效益的 task 挑選策略也適合用於多執行緒process,我們亦提出一個新的task 挑選策略, 它並不依賴kMLB 機制,而是考量了此task 的執行緒群組中的所有執行緒被分 配在各個節點的情形,分佈最發散的即是被選擇的task。 另一方面,雖然挑選最適合的task 來進行跨節點搬移可以降低遠端記憶體 存取,但同時也增加挑選的成本,如需檢視每個能被搬移的task 後才能做出挑選 iv 的決定。所以我們使用一些優化的方式來省略對多執行緒process 而言是多餘的 推估判斷,使得挑選task 的流程更有效率。在使用PARSEC 3.0 測效程式進行的 實驗顯示,與原本的Linux 核心相比,我們修改後的Linux 核心最多可提昇系統 效能達11.1%。

並列摘要


NUMA multi-core systems divide the system resources into several nodes and thus are more scalable. When load imbalance among cores occurs, the load balancing mechanism of the kernel scheduler is triggered to migrate processes between cores, even across NUMA nodes. After the inter-node migration, remote memory access may incur, and it degrades system performance. To maintain load balance as well as to reduce remote memory access, previous research proposed the kernel-level Memoryaware Load Balancing (kMLB) mechanism to enhance the inter-node load balancing of the Linux kernel. It tracks the number of memory pages occupied by each task on each NUMA node and devises several task selection policies. These policies use this information to select the most suitable task that may reduce the most remote memory access after the inter-node migration. In this study, we focus on the issue of inter-node load balancing for multi-threaded processes. In Linux kernel, threads of one multi-threaded process form a thread group and share the memory space. However, threads of one multi-threaded process may be scheduled to run on different NUMA nodes, which may incur different amounts of possible remote memory access. In this study, we find out that the previously proposed Most Benefit policy using kMLB mechanism is also appropriate for multi-threaded processes. Besides, a new task selection policy that does not require kMLB mechanism vi is proposed, which considers the threads’ distribution on each NUMA node for each movable task’s thread group. The task whose thread group with the least exclusivity of thread distribution is selected. It is expected to incur the less influence on the data mapping and thread mapping toward its thread group. On the other hand, though selecting suitable tasks for inter-node migration can reduce remote memory access, the load balancer has to evaluate each movable task in the runqueue, which thus incurs certain overhead. We further use some methods to skip superfluous evaluations for multi-threaded processes and make the selecting procedure more efficient. The experiment results with the popularly used PARSEC 3.0 Benchmark Suite show that our modified Linux kernel using various task selection policies can obtain up to 11.1% performance improvement over the unmodified Linux kernel.

參考文獻


[1] C. Lameter, “An Overview of Non-Uniform Memory Access,”
Communications of the ACM, Vol. 56, Issue 9, pp. 59-65, September 2013.
[2] M. L. Chiang, W. L. Su, S. W. Tu, and Z. W. Lin, “Memory-Aware Kernel
Mechanism and Policies for Improving Inter-Node Load Balancing on
NUMA Systems,” accepted by Software: Practice and Experience.

延伸閱讀