在這篇論文中,我們分別提出了H.264和VC-1兩種視訊編解碼器所有轉換的統一架構。藉由我們提出的單一轉換四步運算共用合成法及不同轉換運算節點共用法來設計H.264轉換統一架構,包含二維4x4正/反轉換、二維4x4/2x2 Hadamard轉換和一維8x8正/反轉換,共使用31個加減法器、7個加法器、6個減法器、38個移位器和4組多工器。VC-1轉換統一架構包含二維4x4正/反轉換、一維4x4正/反轉換和一維8x8正/反轉換,共使用45個加減法器、51個加法器、14個減法器、124個移位器和8組多工器。H.264轉換統一架構針對二維4x4正/反整數轉換平行地計算16筆資料輸入和8筆資料輸出,並針對一維8x8正/反整數轉換平行地計算8筆資料輸入和8筆資料輸出。VC-1統一架構針對二維4x4正/反整數轉換平行地計算16筆資料輸入和8筆資料輸出,並針對一維8x8正/反整數轉換平行地計算8筆資料輸入和8筆資料輸出,一維4x4正/反整數轉換平行地計算4筆資料輸入和4筆資料輸出。H.264統一架構的二維4x4正/反轉換、二維4x4/2x2 Hadamard轉換以及VC-1統一架構的二維4x4正/反轉換的轉置運算不需要使用暫存器陣列。我們提出的設計,針對一個4:2:0格式的巨區塊,H.264統一架構以每週期8個像素的輸出產能,可以在48個時脈週期內完成一維8x8正/反和二維4x4正/反轉換。VC-1統一架構以每週期8個像素的輸出產能,可以在48個時脈週期內完成一維8x8正/反和二維4x4正/反轉換。
In this paper, the unified hardware architectures for the two complete sets of transforms in H.264 and VC-1 codecs are presented. By our proposed 4-step operation sharing process for the matrix multiplications of respective transform and operation units sharing method for all transforms, the unified architecture for H.264 has been mapped into the 2-D 4x4 forward/inverse transforms, the 2-D 4x4/2x2 Hadamard transforms, and the 1-D 8x8 forward/inverse transforms resulting in 31 sub/adders, 7 adders, 6 subtractors, 38 shifters and 4 multiplexers. The unified architecture for VC-1 has been mapped into the 2-D 4x4 forward/inverse transforms, the 1-D 4x4 forward/inverse transforms, and the 1-D 8x8 forward/inverse transforms resulting in 45 sub/adders, 51 adders, 14 subtractors, 124 shifters and 8 multiplexers. The unified architecture for H.264 calculates 16 inputs and 8 outputs in parallel for the 2-D 4x4 integer forward/inverse transforms, and 8 inputs and 8 outputs in parallel for the 1-D 8x8 integer forward/inverse transforms. The unified architecture for VC-1 calculates 16 inputs and 8 outputs in parallel for the 2-D 4x4 forward/inverse transforms, 8 inputs and 8 outputs in parallel for the 1-D 8x8 forward/inverse transforms, and 4 inputs and 4 outputs in parallel for the 1-D 4x4 forward/inverse transforms. The register array is not necessary for transpose operations of the 2-D 4x4 forward/inverse and the 2-D 4x4/2x2 Hadamard transforms in H.264 and the 2-D 4x4 forward/inverse transforms in VC-1. With 8 pixels/cycle throughput in H.264, the proposed unified architecture design can complete the computation in 48 clock cycles for the 1-D 8x8 forward/inverse and the 2-D 4x4 forward/inverse transforms. With 8 pixels/cycle throughput in VC-1, the proposed unified architecture design can complete the computation in 48 clock cycles with the 2-D 4x4 forward/inverse transforms for one macroblock in 4:2:0 format.