感知導向視訊編碼器:人眼感知分析引擎之硬體架構設計及其於H.264 視訊編碼器之應用

現存的多媒體技術已經深深影響人的生活，由於對計算複雜性和傳輸頻寬的限制，高品質的影像視訊處理還有進步的空間，雖然最新的視訊壓縮標準H.264/AVC可提供數十到數百壓縮比率，但由於視訊影片最後的接收者是人的眼睛，傳統的標準只使用峰值信號雜訊比(Peak Signal to Noise Ratio, PSNR)來當成壓縮視訊影像的品質指標，並沒有考慮太多在人類視覺系統(Human Visual System, HVS)的特性，因此視訊壓縮的位元分配也就沒有對人眼的感知做最佳化的處理。因此，如何在有限的頻寬內有效地為不同的視訊內容分配位元率是很重要的，使用適當的位元分配，例如在畫面中重要的區域分配到更多的位元率，相對的，不重要的區域則分配到的較少的位元，如此一來位元率將可被有效的降低並提供更好的壓縮視訊品質。此一做法的關鍵點在於如何考慮並使用HVS中人眼感知的特性，但是一個可以模擬人眼感知視覺特性進一步使用在編碼系統來降低位元率的系統，通常需要龐大計算複雜性和系統頻寬，因此它在編碼系統中常常無法達到即時處理的效果。因此，我們提出人眼知覺評估引擎的演算法，用來加強視訊編碼器在位元分配率上的功能，最後並更進一步提出有效率的硬體架構設計。本篇論文的主要目標在於模擬HVS中的人眼感知特性，提出一知覺評估引擎，其必須能分析目前系統正在處理的視訊畫面的內容，並且決定這些數據可以被分配到的位元率多寡。

關鍵字

H.264視訊編碼；人眼感知系統；注視模型；感知模型；對比敏感函數；對比

並列摘要

The existing multimedia has been affecting the life of human beings nowadays. Due to the limitation of computation complexity and transmission bandwidth, the data processing of the high quality video needs improvement. The newest video compression standard, H.264/AVC, offers tens of to hundreds of compression ratio. The final receiver of the video information is human eyes. However, the traditional standard only uses Peak Signal-to-Noise-Ratio (PSNR) as the quality index for compressed video bit stream. PSNR index does not consider the properties in human visual system (HVS). The bit allocation of the video bit stream is usually not optimized for the perception of human eyes. How to allocate the bit rate for different content of the video effectively within a limited bandwidth is important. With proper allocation of bits, such as more bits for important area in one frame and fewer bits for indifferent area, the bit rate can be reduced. In other words, the compressed video shows better perceptual quality compared with the other compressed video in the same bit rate. The key point in bit allocation is considering the human eye perception in HVS. But the system which can model properties in the human eye perception and reduce the bit rate of the video bit stream with offering the same perceptual quality often needs a huge computation complexity and system bandwidth. It cannot satisfy the real time requirements in video encoding systems. Therefore, we proposed a bio-inspired human eye perception evaluation algorithm, which can improve the functionality of bit allocation of video encoders, and we further proposed an efficient hardware architecture. The main target of this thesis is modeling the properties in HVS. One perception evaluation engine must analyze the content of current video frame data and determine the bit allocation for these data.

並列關鍵字

Video encoder ； H.264 ； Inter ； Intra ； Human visual system ； Attention model ； Perceptual model ； SSIM ； JND ； Motion ； Contrast sensitivity function ； Contrast

參考文獻

[1] Z. Wang, L. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Trans. Image Proc., vol. 13, no. 4, Apr. 2004.

[2] R.-L. Hsu, M. A.-M., and A. K. Jain, “Face Detection in Color Images,”IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 5, May 2002.

[3] A.-C. Tsai, J.-F. Wang, J.-F. Yang, and W.-G. Lin, “Effective Subblock-Based Pixel-Based Fast Direction Detections for H.264 Intra Prediction,” IEEE Trans. Circuits Syst. Video Techn., vol. 18, no. 7, July 2008.

[7] J. G. Robson, “Spatial and temporal contrast sensitivity functions of the visual system,” J. Opt. Soc. Amer., vol. 56, pp. 1141–1142, 1966.

[8] S. Daly, “Engineering Observations from Spatiovelocity and Spatiotemporal Visual Models,” IS&T/SPIE Conference on Human Vision and Electronic Imaging, vol. 3299, Jan. 1998.

國際替代計量

感知導向視訊編碼器:人眼感知分析引擎之硬體架構設計及其於H.264 視訊編碼器之應用

全文下載

主題瀏覽