資料壓縮用Tunstall符號剖析樹之改良研究

資料壓縮方法分為有損與無損兩種，而大部分無損壓縮法遇雜訊皆有嚴重錯誤擴散情形，例如Huffman編碼、算術編碼、Ziv-Lempel編碼等。只有Tunstall編碼因採用固定長度字碼，又未使用動態字典，有較高的容錯能力。可是Tunstall編碼的壓縮率並不高，故本文目的在於保持Tunstall容錯能力下，希望能提昇其壓縮率。由於傳統Tunstall符號剖析樹的建構法較無彈性，每次挑最大機率節點作擴展時，必長滿所有符號節點，容易浪費珍貴字碼於低機率符號節點上，或有多餘的字碼未指派。本文共提出A、B、C三種Tunstall符號剖析樹的改良方案，A、B兩方案以傳統Tunstall符號剖析樹為基礎，A方案經過新增，B方案經過刪除符號剖析樹節點，使字碼可以充分指派於高機率節點上；C方案則改以無窮延伸符號剖析樹為基礎，除了可以充分指派字碼外，指派字碼時亦盡可挑選符號剖析樹中機率最高的節點，以長出最佳符號剖析樹。最後，將上述三種改良方案與傳統Tunstall符號剖析樹比較其資料壓縮率表現。根據實驗結果顯示，在一般資料集上C方案擁有較佳達6%的壓縮提昇表現，且消耗時間也不會比A、B兩方案差。

關鍵字

資料壓縮； Tunstall編碼；符號剖析樹；容錯能力

並列摘要

There are lossy and lossless techniques for data compression. Most lossless compression techniques are vulnerable to noise. When bit flips occur due to noise, serious error propagation will be seen in such lossless compression techniques as Huffman coding, arithmetic coding, and Ziv-Lempel coding. By using fixed-length codewords and not using an adaptive dictionary, Tunstall coding stands out as an error-resilient lossless compression technique. However the compression ratio of Tunstall coding is not good. Therefore the aim of this work is to enhance the compression ratio of Tunstall coding without compromising its error-resilience. Traditional Tunstall coding grows a parse tree by iterative expansion of the maximum probability leaf. On each node expansion, it adopts the exhaustive branching of all symbol leaves, which will cause some precious codewords unassigned or waste of codewords on low probability nodes. In this work, three revised versions A, B, C of Tunstall coding are proposed. Versions A and B are based on the traditional Tunstall parse tree and aim to improve assignment of the complete codewords to high probability nodes via node insertion and deletion respectively. Version C is based on an infinitely extended parse tree and aims to grow the optimal parse tree by assigning the complete codewords only to top probability nodes in the infinite tree. For evaluation, the performances of the three revised versions are compared with that of the traditional Tunstall coding. The experiment shows that on common datasets, version C has better performance than versions A and B and can have up to 6% increase of compression ratio over traditional Tuntsall Coding. Furthermore, version C is not worse than version A and B in terms of compression time.

並列關鍵字

Data Compression ； Tunstall Coding ； Parse Tree ； Error-Resilience

參考文獻

[1] Abrahams, J., “Code and parse trees for lossless source encoding,” Compression and Complexity of Sequences, 1997, pp.145-171.

[2] Algra, T., “Fast and efficient variable-to-fixed length coding algorithm,” Electronics Letters, 28(15), 1992, pp.1399-1401.

[3] Briffa, J., “Investigation of the error performance of Tunstall coding,” B.Eng. Final Year Dissertation, Faculty of Engineering, University of Malta, 1997.

[4] Gzip for Windows, Available at:

[5] Hashempour, H., Schiano, L. and Lombardi, F., “Error-resilient test data compression using Tunstall codes,” 19th IEEE International Symposium, 2004, pp.316-323.

國際替代計量

資料壓縮用Tunstall符號剖析樹之改良研究

全文下載

主題瀏覽