摘要 隨著行動式設備上的3D應用程式快速增長,行動式繪圖處理器(Graphics Processing Unit, GPU)已被廣泛使用在此類裝置上以處理即時繪圖所需的龐大運算量。然而,在此類行動式系統中,往往資源是十分受限的,包括了有限的電池電力和記憶體頻寬。在目前行動式處理器中,砌塊式繪圖架構(Tile-based rendering)是一個廣泛被使用在行動式裝置上的技術。然而,在砌塊式繪圖上多GPU平行處理時的平行度也相當好,並能維持其低頻寬需求的特性。因此,在3D應用程式越來越精細的趨勢下,多核心砌塊式繪圖處理器也漸漸在高階行動式裝置成為主流。近年在砌塊式繪圖系統上為了更有效的利用記憶體頻寬,因此階層式砌塊繪圖系統被提出使用。 在此篇論文研究中,設計了在階層砌塊繪圖系統下,有效率地階層繪製順序與砌塊繪製法來利用其高可重用性的特性,並根據其特性設計一個高效率的快取置換機制,稱為考量可重用性之快取置換機制(Reusability-aware cache replacement policy; RACR)。實驗結果顯示,使用本研究中所提出的階層繪製順序與RACR機制在砌塊循序繪製法下,list cache平均能降低cache miss rate至少7%,若使用本研究提出的兩種階層繪製順序與兩種砌塊繪製法在RACR機制下,更能有效的降低cache miss rate至少11%。此外,對於外部記憶體的存取需求和存取時間,也能有效降低bandwidth至少30%,在效能上也至少15%。使用本論文設計在primitive cache對於外部記憶體的存取需求和存取時間,在效能上也能提升至少21%。由以上結果可知,本研究可有效的使用階層砌塊繪圖系統的特性,來進一步降低多GPU系統下的頻寬需求,並也可能藉此降低外部記憶體存取的時間成本與耗電。
Abstract The rapid growth of 3D applications leads consumer electronics devices to equip mobile GPU to overcome these tremendous workloads. Since these systems are resource limited, the tile-based rendering have become a popular technique in these systems. Furthermore, the tiled system has very high parallelism that each tile can be rendered individually. Therefore, the multi-core GPU becomes prevalent to provide superior user experience in 3D applications. In the tiled systems, a technique called hierarchical tiling is adopted to reduced bandwidth further. This technique of tiling has very high locality in a multi-core GPUs which is not been exploited. Therefore, this paper proposes a reusability-aware rendering and cache mechanism to exploit this locality. Intended advantages include: reduced miss rate of shared cache, lowered the bandwidth requirements and possibly performance improvement. The results show the proposed mechanisms reduce list cache miss rates from 46.33% to 39% on average when reusability-aware rendering sequencing is used. It can be further reduced to 34.07% on average when the reusability-aware cache replacement policy is used as well. Furthermore, the bandwidth requirements with both mechanisms are adopted can be reduced about 30.5%. While incooperate the proposed rendering sequence with primitive cache, the miss rate can be reduced from 51.01% to 41.58% on average. It can be further reduced to 37.31% on average when reusability-aware cache replacement policy is adopted as well. Furthermore, the bandwidth requirements with both mechanisms are adopted can be reduced about 30.12%.