透過您的圖書館登入
IP:18.119.248.13
  • 學位論文

適用於行動多媒體應用之低功率繪圖處理器

Low Power Graphics Processing Units with Programmable Texture Unit and Universal Rasterizer for Mobile Multimedia Applications

指導教授 : 簡韶逸

摘要


近年來手持式行動裝置快速成長,大量的音樂/視訊播放與影像處裡等多媒體應用被整合進諸如手機,個人數位助理 (PDA) 與手持式多媒體播放器 (PMP)。除此之外,光采炫目的繪圖人機介面 (GUI) 以及三維繪圖遊戲更是被視為促使手持式行動裝置下一波的成長動力來源。在當代的繪圖處理器中,由於可程式化渲染器(Programmable Shader)的演進,顯示卡的可控制性及可程式性也大幅增加。因此,將顯示卡用在非傳統的三維計算機圖形(3D Computer Graphics)方面的應用越來越多,這樣的應用叫做 GPGPU (General-purpose computing on graphics processing units) 。因為如此有越來越多的針對各種標準的音頻/視訊標準的硬體加速器以及繪圖處理器被嵌入在手持式行動裝置平台上。從系統的角度來看,如果能將視訊加速器與繪圖處理器加以整合不但能增加硬體利用率,減少晶片面積降低成本更能減低消耗功率,這對於下ㄧ代多媒體手持式行動裝置是個非常重要的因素。在本論文中提出了三項創新的技術來達到低功率與高硬體效率,分別為通用點陣轉化器 ( Universal Rasterizer),可程式化貼圖濾波單元(Programmable Filter Unit),與階層式貼圖壓縮 (Mipmapping Texture Compression)。 首先,我們提出了通用點陣轉化器。為了降低硬體的複雜度,我們提出了高效率塊狀移動演算法及共用硬體的架構。由最後測試的結果顯示出當我們將所提出的架構整合入三為繪圖系統的應用中時,可以符合即時以及有效的處理需求。 接下來我們提出了可程式化貼圖濾波單元。本可程式單元提供了全新資料串流的通道,可以加速將更多非三維繪圖應用,如視訊壓縮或影像處理,有效率地實現在繪圖處理器上。實做的部份,我們將視訊解碼中佔最重的部份的動態補償及影像分割技術。在這兩個應用中,我們都分別提升了28.4%和60%的整體系統效能。 最後為階層式貼圖壓縮演算法。根據分析結果,此壓縮可以減少百分之八十的頻寬。在將透明度與色彩的部份做結合之後,也能夠提升其貼圖的壓縮品質和效率。 以上三項技術都被整合入低功率三維繪圖處理器中。這個處理器具有多媒體串流處理的特性,並且我們將之實現成一個系統晶片的平台。原型晶片利用聯電90nm技術製成,面積為5×5mm2。其處理速度為每秒200百萬頂點以及400百萬像素以及1600百萬貼圖,等同於每秒11億浮點數運算。

關鍵字

繪圖處理器

並列摘要


In the current graphics pipeline, programmable vertex, pixel, and geometry shaders provide programmers with increased flexibility for different rendering applications. Programmable graphics processing unit (GPUs) support not only highquality rendering algorithms but also a large number of general-purpose computations that are mapped into the graphics hardware; such computations are called as general-purpose computations on GPUs (GPGPU). This concept is beneficial, particularly for mobile systems. Owing to the development of advanced GPGPU techniques, we can establish a unified mobile multimedia subsystem by processing different types of contents on GPUs; this can reduce the cost of the entire system because of high hardware utilization and efficiency. However, a mobile device is by definition powered with batteries and is also small in order to be portable. It is important to make sure that the system of the mobile phone uses as little energy as possible. In this thesis, we presented three units adaptable for mobilenGPUs; there are Universal Rasterizer, Programmable Filtering Unit (PFU), and High-Quality Mipmapping Texture Compression with Alpha Map (MTC). First, an Universal Rasterizer in tile-scan triangle traversal with edge equations for low complexity is purposed. The related efficient tiled triangle traversal algorithm is also introduced. The result shows it can minimize the processing time of triangle traversal, and ensure no reiteration when traverse. Besides, the improved hardware architecture realize the efficiency of the traversal and rasterization algorithm. With highly hardware-sharing and the digital signal processing techniques as pipelined and scheduling, it can achieve real-time requirement for graphics application. Second, Programmable Filtering Unit (PFU), which is a newly developed programmable unit formedia-processing application, implemented on the streamprocessing architecture of GPUs. The PFU is located in the texture unit of a GPU, and it can efficiently execute several types of filtering operations by directly accessing the multi-bank texture cache and specially-designed data-paths. Simulation results show that in comparison to conventional texture units, the processing time required in H.264/AVC motion compensation and video segmentation can be reduced by 28.4% and 60%, respectively, by using the PFU. Furthermore, we presents a high-quality mipmapping texture compression (MTC) system with alpha map. With our approach, it can reduce 80% to 90% of texture access memory traffic. By inspecting the similarity between alpha channel and luminance channel, the two channels are efficiently encoded together with linear prediction in Differential mode. Besides, Spilt mode may take care of textures which have no strong relationship between alpha channel and luminance channel. Furthermore, a layer overlapping technique is proposed as well to reduce the texture memory bandwidth of MTC. Simulation results on graphics platform show that MTC can provide high image quality, low bandwidth and less cache miss rate for textures. Integrated with the three purposed units mentioned above, low power graphics processing units for mobile multimedia applications is implemented in this thesis. The prototype chip is fabricated by UMC 90nm technology, and the chip size is 5×5mm2. The designed working frequency is 200MHz, and the worst case power consumption is 26mW. The processing capability of the chip is 200 Mvertices/s of geometry transform and 400 Mpixels/s and 1.6 Gtexels/s of texture filtering, or 11 GFLOPs with PFU.

參考文獻


Pulli, and Fredo Durand, “A reconfigurable architecture for load-balanced
[2] I. E. Sutherland, Sketchpad, aMan-Machine Graphical Communication System,
Ph.D. thesis, Massachusetts Insititute of Technology, January 1963.
[5] Tomas Akenine-M¨oller and Jacob Str¨om, “Graphics for the masses: A hardware
rasterization architecture for mobile phones,” ACM Transactions on

延伸閱讀