唯一可解一對一編碼之分析與實作

在這篇論文中，我們提出可藉由在連續的一對一字碼(One-to-One Code)之間安插特定字元(Unique Word)來分開一對一字碼，使得連續的一對一字碼符合唯一可解(Uniquely Decodable)性質，藉此產生一種新的編碼方式，我們稱為「唯一可解一對一碼」(Uniquely Decodable One-to-One Code)。我們接著分析唯一可解一對一碼的組合特性與壓縮能力，並由此推導出給定長度n的唯一可解一對一碼的個數公式。經由此公式，我們即可計算給定統計特性之信息源的唯一可解一對一碼的最佳平均碼長。我們另外推導出三個平均碼長的上限公式，與兩個平均碼長的極限值的上限公式，據此，我們證明當特定字元(Unique Word)的長度與信息源字元的整合個數皆趨近無限大時，唯一可解一對一碼的平均碼長的極限值可以達到信息源的理論極限熵值(Entropy)。論文接著提供在不同條件下的快速編碼與快速解碼的演算法。我們的數值實驗結果顯示，相對於赫夫曼碼(Huffman Code)，在相同的解碼複雜度的前提下，唯一可解一對一碼具有可與其相比的低壓縮率。相對於藍柏吉夫碼(Lembel-Ziv Code)，唯一可解一對一碼則具有較佳的平均碼長，此數值結果說明了唯一可解一對一碼在實際應用的可行性。

關鍵字

一對一碼；資料壓縮

並列摘要

In this thesis, we consider the so-called uniquely decodable one-to-one code (UDOOC) that is formed by inserting a “comma” indicator, termed the unique word (UW), between consecutive one-to-one codewords for separation. Along this research direction, we first investigate the general combinatorial properties of UDOOCs, in particular the enumeration of UDOOC codewords for any (finite) codeword length. Based on the obtained formula on the number of length-n codewords for a given UW, the average codeword length of the optimal UDOOC for a given source statistics can be computed. Upper bounds on the average codeword length of UDOOCs are next established. The analysis on bounds of the average codeword length then leads to two asymptotic bounds on ultimate per-letter average codeword length, one of which is achievable and hence tight for a certain source statistics and UW, and the other of which proves the achievability of source entropy rate of UDOOCs when both the block size of source letters for UDOOC compression and UW length go to infinity. Efficient encoding and decoding algorithms for UDOOCs are subsequently given. Numerical results show that when grouping three English letters as a block, the UDOOCs with UW = 0001, 0000, 000001 and 000000 can respectively reach the compression rates of 3.531, 4.089, 4.115 and 4.709 bits per English letter (with the lengths of UWs included), where the source stream to be compressed is the book titled Alice’s Adventures in Wonderland. In comparison with the first-order Huffman code, the second-order Huffman code, the third-order Huffman code and the Lembel-Ziv code, which respectively achieve the compression of 3.940, 3.585, 3.226 and 6.028 bits per single English letter, the proposed UDOOCs can potentially in comparable compression rate to the Huffman code under similar decoding complexity and yield a better average codeword length than that of the Lempel-Ziv code, thereby confirming the practicability of the simple idea of separating OOC codewords by UWs.

並列關鍵字

data compression ； one-to-one code ； source coding ； uniquely decodable

參考文獻

[1] T. M. Cover and J. A. Thomas, Elements of Information Theory, New York, NY: John

Wiley & Sons, 1991.

[2] Sam E. Ganis, Notes on the Fibonacci Sequence, Amer. Math. Monthly, 1959, pp. 129-

[3] A. D. WYNER, An Upper Bound on the Entropy Series, Inf. Control, vol. 20, pp. 176-181,

[4] S. K. Leung-Yan-Cheong and T. M. Cover, Some equivalences between Shannon entropy

國際替代計量

唯一可解一對一編碼之分析與實作

全文下載

主題瀏覽