透過您的圖書館登入
IP:18.118.137.243
  • 學位論文

一套於多核心系統上具可攜性且有效率的使用者分配資源機制

A Portable and Efficient User Dispatching Mechanism for Multicore Systems

指導教授 : 薛智文

摘要


在多核心系統上,使用多執行緒的方式來加速效能是相當常見的一種方法。儘管如此,在很多簡單的應用程式中,增加執行緒的數目反而使得效能下降,與預期結果並不相符。而通常使用者都會覺得是建立與結束執行緒所造成的額外花費。然而,在我們的觀察中,其最重要影響的原因是在於執行緒的分配。因此,在本論文中,我們討論執行緒的相關問題並提出一套新穎的使用者分配機制 (UDispatch) 來解決。因為執行緒是執行於使用者空間,並不能直接控制核心空間的系統資源。因此,我們利用一個虛擬裝置來作為兩空間的的橋樑,以避免直接修改核心增加系統呼叫,透過其可很有效率且可攜的與作業系統溝通。另外,我們提供 UDispatch 相關的程式應用介面 (API) 供使用者直接於應用程式原始碼中使用。雖然 API 能幫助使用者,然而,因為一些原因,有時候使用者並不想修改原始碼或沒辦法修改原始碼,使得 UDispatch 的好處沒辦法顯現。因此,我們又提出了一命令列的程式應用介面 ─ 使用者分配機制載入器 (UDLoader),其可幫助使用者操作 UDispatch 而不需修改程式碼。我們也將 UDispatch 與 UDLoader 實驗於兩個多媒體應用程式:跳行二元壓縮應用程式與 H.264/AVC 解碼器。其實驗結果顯示,在跳行二元壓縮應用程式在四核心與八核心機器上分別有 171.8% 與 111.6% 的增進。而 H.264/AVC 解碼器在四核心機器上則有 20.1% 的提昇。

並列摘要


In multicore environment, using multiple threads is a common useful approach to improve application performance. Nevertheless, even in many simple applications, the performance might degrade when the number of threads increases. Users usually impute this phenomenon to the overhead of creation or termination of threads. However, in our observation, the more significant effect is the dispatching of threads. We discuss the problems on using threads, and present a novel User Dispatching Mechanism (UDispatch) that provides controllability in user space to improve application performance. Since user threads cannot directly control system resources, a virtual device is adopted between user space and operating system for portability and efficiency instead of adding new system calls through kernel modification. We provide an application programming interface (API) for users to manipulate UDispatch through modification of application source codes. To avoid source code modification, a command-line UDispatch Loader (UDLoader) is also provided to help users bind threads to specific cores directly. We implement UDispatch on two multimedia applications of multi-threading. The results show that a skip-line application speeds up to 171.8% and 111.6% on a 4-core machine and an 8-core machine, respectively, and an optimized H.264/AVC decoder speeds up to 20.1% on a 4-core machine.

並列關鍵字

Threading Scheduling Dispatching Anomaly Multicore System Call Virtual Device Loader

參考文獻


[7] procfs - Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Procfs, Mar. 2009.
[9] Yen-Kuang Chen, X Tian, Steven Ge, and M. Girkar. Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures. IEEE Proceedings of the 18th International Parallel and Distributed Processing Symposium, page 63, 2004.
[11] L Dagum and R Menon. OpenMP: An Industry-Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 5(1):46–55, Jan. 1998.
[12] RL Graham. Bounds on Multiprocessing Timing Anomalies. SIAM Journal of Applied Mathematics, 17(2):417, 1969.
[14] Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. Proceedings of the 13th international conference on Supercomputing, pages

延伸閱讀