透過您的圖書館登入
IP:3.133.149.168
  • 學位論文

高度平行化之非規律性LDPC解碼器在通用式圖形處理器上之設計

A Highly Parallel Design for Irregular LDPC Decoding on GPGPUs

指導教授 : 賴伯承

摘要


低密度奇偶檢查碼解碼是一個複雜且動態運行的行為,為了獲得更高的解碼性能,需要一個強大且具有彈性的的運算平台。通用式圖形處理器(GPGPU)是一個多核高效率的處理器,能夠處理大量的平行運算和有效地提高運算效能,儘管通用式圖形處理器的表現常受限於不足的資料頻寬去支援大量的處理核心的讀取要求。這篇論文專注在設計一個高效能的低密度奇偶檢查碼解碼在時下的通用式圖形處理器。此論文針對傳統的點基礎低密度奇偶檢查碼解碼,提出了一種新穎的資料管理方法來達到更好的解碼效率。此篇論文更進一步提出新穎的線基礎解碼,在線基礎解碼下,資料能用更簡單的方式編排,同時得到和點基礎解碼一樣的記憶體存取效率。此篇論文藉由廣泛的分析和測量,探討兩種平行演算法的設計考量和兩者的優劣點,並給出完整的設計方法流程和完整的效能提升比較。實驗結果顯示,平行演算法跑在Tesla C2050通用式圖形處理器比單核演算法跑在高檔的中央處理器加速126.47倍,最大解碼速率可達到111.43Mbps.

並列摘要


The complex decoding scheme and dynamic execution behavior of LDPC decoding necessitate a powerful yet flexible computation platform to attain high performance. GPGPUs are many-core throughput processors that enable massive parallel computing and superior performance enhancement. However, the GPGPU performance is usually confined by the insufficient data bandwidth to support the demand from enormous processing cores. This paper focuses on designing a high performance LDPC decoding on modern GPGPUs. A novel data management for the conventional node-based LDPC design scheme is proposed and demonstrated better performance enhancement. This paper further introduces an innovative edge-based design scheme that facilitates easier data layout and enables efficient memory accesses when compared with the conventional node-based designs. By comprehensively exploring the design concerns and trade-offs from these two parallelism schemes, this paper proposes complete design solutions for each scheme and has demonstrated significant performance enhancement. The experiments on the Tesla C2050 GPGPU have demonstrated up to 126.47x runtime improvement, when compared with an LDPC decoder on a high-end CPU. The maximum throughput can reach 111.43 Mbps.

並列關鍵字

LDPC GPU

參考文獻


[2] D. J. C. MacKay and R. M. Neal, "Near Shannon limit performance of low density parity check codes," Electronics Letters, vol. 32, p. 1645, 1996.
[3] A. Morello and V. Mignone, "DVB-S2: The Second Generation Standard for Satellite Broad-Band Services," Proceedings of the IEEE, vol. 94, pp. 210-227, 2006.
[9] R. M. Tanner, "A recursive approach to low complexity codes," Information Theory, IEEE Transactions on, vol. 27, pp. 533-547, 1981.
[10] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, "Factor graphs and the sum-product algorithm," Information Theory, IEEE Transactions on, vol. 47, pp. 498-519, 2001.
[11] A. J. Blanksby and C. J. Howland, "A 690-mW 1-Gb/s 1024-b, rate-1/2 low-density parity-check code decoder," Solid-State Circuits, IEEE Journal of, vol. 37, pp. 404-412, 2002.

延伸閱讀