本論文主要使用動態型顯樹量化進階音訊編碼器(Advanced Audio Coding, AAC)的修正型離散餘弦轉換係數(modified discrete cosine transform coefficients, MDCT coefficients),並且在過程中進行位元分配。我們使用動態明顯樹量化法(Dynamic significance tree quantization, DSTQ),此外也使用層級樹集合分派法(Set partitioning in hierarchical trees, SPIHT),和結合明顯樹量化法(Combined significance tree quantization, CSTQ),MPEG-1 layer 3 (MP3)和開放的音源編碼軟體Ogg Vorbis來對經過心理聲學處理和未經過心理聲學處理的修正型離散餘弦轉換係數做編碼。實驗資料庫包含有12種不同類別音訊,即男聲、女聲、吉他、琵琶、揚琴、嗩吶、鋼琴、交響樂、二胡、爵士樂、薩克斯風、以及笛子。實驗的位元率有32 kbps、48 kbps、64 kbps、及96 kbps。實驗結果發現在這些位元率下,所有使用明顯樹模型量化離散餘弦轉換係數之編碼後音訊依然可以保有相當接近原音的音訊品質。在各種樹模型中,又以加入心裡聲學的DSTQ模型(psy-DSTQ-512)在聲音感知量測的模擬中表現最佳。
In this thesis, we employ dynamic significance tree quantization (DSTQ) to quantize the modified discrete cosine transform coefficients (MDCT) of audio signals and at the same time to carry on the bit allocation during the encoding process. We compare DSTQ with other famous encoding methods including set partitioning in hierarchical trees (SPIHT), combined significance tree quantization (CSTQ), MPEG-1 layer 3 (MP3), and open source audio encoding software Ogg Vorbis. It is also investigated for each significance tree model method that the psychoacoustic model is either included or not included within the encoder. The experimental database contains 12 categories of audio signals, i.e., male voices, female voices, guitar, lute, dulcimer, suona, piano, symphony, erhu, jazz, saxophone, and flute. Encoded bit-rates are set to be 32 kbps, 48 kbps, 64 kbps, and 96 kbps, respectively. The experimental results show that all of the significance tree models can maintain nearly transparent audio quality at these bit rates mentioned above. Among them, DSTQ with psychoacoustic model (psy-DSTQ-512) performs the best in our simulation based on the perceptual evaluation of audio quality (PEAQ) measure.