透過您的圖書館登入
IP:3.142.195.204
  • 學位論文

使用動態明顯樹之音訊編碼

Audio Coding Using Dynamic Significance Tree

指導教授 : 簡福榮
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文主要使用動態型顯樹量化進階音訊編碼器(Advanced Audio Coding, AAC)的修正型離散餘弦轉換係數(modified discrete cosine transform coefficients, MDCT coefficients),並且在過程中進行位元分配。我們使用動態明顯樹量化法(Dynamic significance tree quantization, DSTQ),此外也使用層級樹集合分派法(Set partitioning in hierarchical trees, SPIHT),和結合明顯樹量化法(Combined significance tree quantization, CSTQ),MPEG-1 layer 3 (MP3)和開放的音源編碼軟體Ogg Vorbis來對經過心理聲學處理和未經過心理聲學處理的修正型離散餘弦轉換係數做編碼。實驗資料庫包含有12種不同類別音訊,即男聲、女聲、吉他、琵琶、揚琴、嗩吶、鋼琴、交響樂、二胡、爵士樂、薩克斯風、以及笛子。實驗的位元率有32 kbps、48 kbps、64 kbps、及96 kbps。實驗結果發現在這些位元率下,所有使用明顯樹模型量化離散餘弦轉換係數之編碼後音訊依然可以保有相當接近原音的音訊品質。在各種樹模型中,又以加入心裡聲學的DSTQ模型(psy-DSTQ-512)在聲音感知量測的模擬中表現最佳。

並列摘要


In this thesis, we employ dynamic significance tree quantization (DSTQ) to quantize the modified discrete cosine transform coefficients (MDCT) of audio signals and at the same time to carry on the bit allocation during the encoding process. We compare DSTQ with other famous encoding methods including set partitioning in hierarchical trees (SPIHT), combined significance tree quantization (CSTQ), MPEG-1 layer 3 (MP3), and open source audio encoding software Ogg Vorbis. It is also investigated for each significance tree model method that the psychoacoustic model is either included or not included within the encoder. The experimental database contains 12 categories of audio signals, i.e., male voices, female voices, guitar, lute, dulcimer, suona, piano, symphony, erhu, jazz, saxophone, and flute. Encoded bit-rates are set to be 32 kbps, 48 kbps, 64 kbps, and 96 kbps, respectively. The experimental results show that all of the significance tree models can maintain nearly transparent audio quality at these bit rates mentioned above. Among them, DSTQ with psychoacoustic model (psy-DSTQ-512) performs the best in our simulation based on the perceptual evaluation of audio quality (PEAQ) measure.

參考文獻


[4] Z. Lu and W. A. Pearlman, “An efficient, low complexity audio coder delivering multiple levels of quality for interactive applications,” Multimedia Signal Processing, Redondo Beach, Dec. 1998, pp. 529–534.
[5] J. M. Shaprio, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. on Signal Processing, vol. 41, no. 12, Dec. 1993, pp. 3445-3463.
[6] K. Brandenburg, “MP3 and AAC explained,” presented at the 17” AkS Conference on Hith Quality Audio Coding, Florence, 1999.
[7] M. Raad and A. Mertins, “From lossy to lossless audio coding using set partitioning in hierarchical trees,” Digital Audio Effects, Hamburg, Germany, Sep. 2002, pp. 245–250.
[8] Te Li, S. Rahardja, Soo Ngee Koh, ”Frequency region-based prioritized bit-plane coding for scalable audio,” IEEE Trans. on Audio, Speech and Language Processing, vol. 16, no. 12, Jan 2008, pp. 94-105.

延伸閱讀