幀內編碼之預測器、掃描順序與熵編碼技術之改良

隨著網際網路頻寬的提升，人們互動的方式自原先簡陋且只支援純文字的電子布告欄系統 (BBS) 演變為支援圖片、影音的社群軟體如：Instagram、X、YouTube、抖音等，根據網路設備供應商思科系統 (Cisco Systems, Inc.) 於2017年之預測報告，視訊影片佔全球網際網路流量之比例將在2022達到82%，由此可見視訊壓縮之必要性以及重要性。視訊歸根究柢乃大量之圖片於時域上連續撥放，其中幀內編碼之技術與單一影像之壓縮十分雷同，本篇論文以聯合圖像專家小組所制定的JPEG壓縮標準為基底作出改進，將原先的之字形掃描模式改為依據直流係數之梯度、不偏性之字形權重矩陣以及鄰近塊同座標處為零之機率此三者所共同決定之可適性掃描順序，旨在減少遊程編碼時的零數目，並將原先被一同以霍夫曼編碼的連續零數目與非零值分別選定合適的前文模型後，再各自進行可適性算術編碼。實驗結果顯示我們提出之方法在經典的Kodak測試資料集中，較JPEG有8.92%的交流係數像素位元數減少。此外，我們也基於H.264視訊壓縮標準中的熵編碼CAVLC，將原先仍採遊程編碼的幀內預測殘差以其原值逕行前文建模，並保留CAVLC由右下至左上漸變長度之核心思想，於正向順序編碼的同時將頻率表依先前已編碼過之對角線最大值做可適性的調整，使原先CAVLC之後綴一經變動後即不具可逆性的問題得以解決。實驗數據說明我們提出的方法在QCIF、CIF、及FHD各種解析度的測試影片中皆有7~8%之像素位元數減少。最後，我們以不同的觀點去解釋原先被應用於無損影像壓縮之邊緣預測器 (MED)，並提出數種改良方法，即是在MED的預測值上考慮其鄰近值的預測誤差。而當編碼端有足夠緩衝器的情形下，考慮經水平或垂直翻轉與未經翻轉的兩宏塊之平方誤差和，選定較小者並以一至二個額外的位元做為記錄翻轉與否之依據。實驗展示了不同宏塊大小下使用各預測器之熵，其中最佳的預測器與MED相比減少了約3%。

關鍵字

視訊壓縮；幀內預測殘差編碼；影像壓縮；無損影像壓縮；熵編碼；前文參考之可適性算術編碼；遊程編碼；邊緣預測器

並列摘要

With the improvement of internet bandwidth, the mode of human interaction has evolved from primitive electronic bulletin board systems (BBS) that only support pure text to social media platforms supporting images and videos such as Instagram, X, YouTube, and TikTok. According to the prediction report by Cisco Systems, Inc. in 2017, video traffic will account for 82% of global internet traffic by 2022, which underscores the necessity and importance of video compression. The core concept of video is, in fact, a continuous sequence of still images played in the time domain. Among the video compression techniques, intra-frame encoding is very similar to that of compressing a single image. This thesis improves the JPEG compression standard, established by the Joint Photographic Experts Group, by modifying the original zigzag scan pattern to an adaptive scanning order based on the gradient of DC coefficients, a non-biased zigzag matrix, and the probabilities of coefficients at the same coordinate in neighboring blocks being zeros. The purpose is to reduce the number of zeros during run-length encoding. Subsequently, consecutive zero counts and non-zero values, previously encoded together using Huffman coding, are now separately assigned appropriate context models before undergoing adaptive arithmetic coding. Experimental results show that our proposed method reduces the bit per pixel of AC coefficients by 8.92% compared to JPEG on the classic Kodak test dataset. Furthermore, based on the entropy coding CAVLC in the H.264 video compression standard, we model the intra-frame prediction residue directly, which was previously still encoded using run-length encoding, based on its original value and retain the core idea of CAVLC's gradually changing suffix length from bottom-right to top-left. While encoding in forward order, the frequency table is adjusted adaptively based on the maximum value of the diagonal previously encoded, resolving the issue of the irreversibility of CAVLC suffixes once changed. Experimental data demonstrate a reduction of pixel bit count by 7-8% in videos of various resolutions, including QCIF, CIF, and FHD. Finally, we reinterpret the Median Edge Detector (MED) predictor originally applied to lossless image compression from different perspectives and propose several improvement methods, including considering the prediction error of neighboring values based on the MED predict value. When there is sufficient buffer space at the encoding end, the sum of square differences (SSD) of macroblocks that are horizontally or vertically flipped and those without flipped are considered. The one with smaller value is selected along with one to two extra bits used to record whether flipping occurs. Experiments exhibit the entropy of using various predictors at different macroblock sizes, with the best predictor reducing approximately 3% compared to MED.

並列關鍵字

video compression ； intra predicted residue encoding ； image compression ； lossless image compression ； entropy coding ； context-based adaptive arithmetic coding ； run-length encoding ； median edge detection predictor

參考文獻

[1] M. Nowell, Cisco VNI Forecast update, Cisco, [Online]. Available: https://www.ieee802.org/3/ad_hoc/bwa2/public/calls/19_0624/nowell_bwa_01_190624.pdf. [Accessed: Mar. 21, 2024].

Google Scholar

[2] T. Zhang and S. Mao, "An Overview of Emerging Video Coding Standards," GetMobile: Mobile Computing and Communications, vol. 22, no. 4, pp. 13-20, 2019.

Google Scholar

[3] I. M. Pu, Fundamental Data Compression, Oxford, MA, USA: Butterworth-Heinemann, 2006.

Google Scholar

[4] K. Sayood, Introduction to Data Compression, Boston, MA, USA: Elsevier, 2006.

Google Scholar

[5] I. E. G. Richardson, Video codec design: developing image and video compression systems. John Wiley & Sons, 2007.

Google Scholar

國際替代計量

幀內編碼之預測器、掃描順序與熵編碼技術之改良

主題瀏覽