在行動式砌塊繪圖處理器之高效率渲染與快取置換機制

摘要隨著行動式設備上的3D應用程式快速增長，行動式繪圖處理器（Graphics Processing Unit, GPU）已被廣泛使用在此類裝置上以處理即時繪圖所需的龐大運算量。然而，在此類行動式系統中，往往資源是十分受限的，包括了有限的電池電力和記憶體頻寬。在目前行動式處理器中，砌塊式繪圖架構（Tile-based rendering）是一個廣泛被使用在行動式裝置上的技術。然而，在砌塊式繪圖上多GPU平行處理時的平行度也相當好，並能維持其低頻寬需求的特性。因此，在3D應用程式越來越精細的趨勢下，多核心砌塊式繪圖處理器也漸漸在高階行動式裝置成為主流。近年在砌塊式繪圖系統上為了更有效的利用記憶體頻寬，因此階層式砌塊繪圖系統被提出使用。在此篇論文研究中，設計了在階層砌塊繪圖系統下，有效率地階層繪製順序與砌塊繪製法來利用其高可重用性的特性，並根據其特性設計一個高效率的快取置換機制，稱為考量可重用性之快取置換機制（Reusability-aware cache replacement policy; RACR）。實驗結果顯示，使用本研究中所提出的階層繪製順序與RACR機制在砌塊循序繪製法下，list cache平均能降低cache miss rate至少7%，若使用本研究提出的兩種階層繪製順序與兩種砌塊繪製法在RACR機制下，更能有效的降低cache miss rate至少11%。此外，對於外部記憶體的存取需求和存取時間，也能有效降低bandwidth至少30%，在效能上也至少15%。使用本論文設計在primitive cache對於外部記憶體的存取需求和存取時間，在效能上也能提升至少21%。由以上結果可知，本研究可有效的使用階層砌塊繪圖系統的特性，來進一步降低多GPU系統下的頻寬需求，並也可能藉此降低外部記憶體存取的時間成本與耗電。

關鍵字

繪製法；砌塊；階層

並列摘要

Abstract The rapid growth of 3D applications leads consumer electronics devices to equip mobile GPU to overcome these tremendous workloads. Since these systems are resource limited, the tile-based rendering have become a popular technique in these systems. Furthermore, the tiled system has very high parallelism that each tile can be rendered individually. Therefore, the multi-core GPU becomes prevalent to provide superior user experience in 3D applications. In the tiled systems, a technique called hierarchical tiling is adopted to reduced bandwidth further. This technique of tiling has very high locality in a multi-core GPUs which is not been exploited. Therefore, this paper proposes a reusability-aware rendering and cache mechanism to exploit this locality. Intended advantages include: reduced miss rate of shared cache, lowered the bandwidth requirements and possibly performance improvement. The results show the proposed mechanisms reduce list cache miss rates from 46.33% to 39% on average when reusability-aware rendering sequencing is used. It can be further reduced to 34.07% on average when the reusability-aware cache replacement policy is used as well. Furthermore, the bandwidth requirements with both mechanisms are adopted can be reduced about 30.5%. While incooperate the proposed rendering sequence with primitive cache, the miss rate can be reduced from 51.01% to 41.58% on average. It can be further reduced to 37.31% on average when reusability-aware cache replacement policy is adopted as well. Furthermore, the bandwidth requirements with both mechanisms are adopted can be reduced about 30.12%.

並列關鍵字

Rendering Sequence ； approach ； tile

參考文獻

[2] B.-S. Liang, and C.-W. Jen, “Computation-effective 3-D graphics rendering architecture for embedded multimedia system,” IEEE Transactions on Consumer Electronics, vol. 46, no. 3, pp. 735-743, August 2000.

[3] B.-G. Nam, M.-W. Lee, and H.-J. Yoo, “Development of a 3-D graphics rendering engine with lighting acceleration for handheld multimedia systems,” IEEE Transactions on Consumer Electronics, vol. 51, no. 3, pp. 1020-1027, August, 2005.

[5] B. Jurrlink, I. Antochi, D. Crisu, S. Cotofana, and S. Vassiliadis, ”GRAAL: A framework for low-power 3D graphics accelerators,” IEEE Computer Graphics and Applications, vol. 28, issue 4, pp. 63-73, July, 2008.

[6] I. Antochi, B. Juurlink, S. Vassiliadis and P. Liuha, "Efficient tile-aware bounding-box overlap test for tile-based rendering," in Proc. International Symposium on System-on-Chip, 2004, pp. 165-168.

[7] I. Antochi, B. Juurlink, S. Vassiliadis and P. Liuha, "Scene Management Models and Overlap Tests for Tile-Based Rendering," in Proc. EUROMICRO Symp. on Digital System Design, 2004., pp. 424-431.

國際替代計量

在行動式砌塊繪圖處理器之高效率渲染與快取置換機制

全文下載

主題瀏覽