由麥克風陣列訊號合成出虛擬聆聽點的３Ｄ音訊

本論文的目標是為了在無原始麥克風錄音訊號的虛擬聆聽點上合成出 3D 音訊。為了達到這個目標，我們在空間中佈置麥克風陣列用以進行音源訊號的錄製工作。３Ｄ音訊合成可分為兩個主要步驟，第一個步驟是由混合的錄製訊號去估測各個音源訊號，此步驟通常是以盲訊號源分離 (blind source separation, BSS) 的技術來達成。第二個步驟則是在選定的回響空間內某一個虛擬聆聽點上合成出該點的３Ｄ音訊。此音訊的３Ｄ空間感可藉由頭部相關轉移函數 (head-related transfer function, HRTF) 與代表該點房間回響感覺的聽覺轉移函數 (acoustic transfer function, ATF) 對已分離訊號進行濾波而得到。在本論文內，我們採用頻率域獨立成份分析 (frequency domain independent component analysis, FD-ICA) 和最小平方誤差近似解 (least squares optimization approach) 將混合訊號分離。我們以訊號干擾比 (signal to interference ratio, SIR) 來評估分離矩陣的效果。在重建３Ｄ音訊的過程中，我們會先計算出該回響空間的聽覺轉移函數總集 (ATF-pool)，接著從 ATF-pool 當中選取對應的ATF來對已分離訊號濾波，然後再以適當的 HRTF 合成出 3D 雙聲道音訊。對於不在 HRTF 和 ATF 測量點上的虛擬聆聽點，其對應的 HRTF 和 ATF 分別以現有的 HRTF 和 ATF總集用內差的方式求得。最後，在任意位置的虛擬聆聽點和所選的空間回響環境內展示出具有３Ｄ效果的合成音訊。

關鍵字

麥克風陣列；３Ｄ音訊合成；盲訊號源分離；頭部相關轉移函數；聽覺轉移函數；虛擬聆聽點

並列摘要

The target of 3D virtual listening point audio synthesis is to reconstruct 3D audio at a virtual point where the original recording microphone does not exist. To facilitate this idea, the source music is recorded by a microphone array that consists of more than a few recording microphones arranged in a designed spatial pattern. The 3D acoustic signal synthesis can be divided into two key steps. The first step is to estimate the individual source signal from the mixed, recorded signals. This step is usually accomplished by using the blind source separation (BSS) technique. The second step is to synthesize a 3D acoustic signal at a virtual listening point in a chosen reverberant room environment. The 3D feeling of an acoustic signal can be enhanced by filtering the separated signals in step one by the head-related transfer function (HRTF) and the acoustic transfer function (ATF), which represents the room acoustic effect. In this study, we adopt the frequency domain independent component analysis (FD-ICA) and a least-square optimization approach to separate the mixture signals. We investigate the effectiveness of the BSS methods by evaluating their demixing matrices using the signal to interference ratio (SIR) metric. In the reconstruction process, we first calculate the ATFs of the reverberant room to form an ATF-pool. Then, the separated signals are mixed using the adequate ATFs drawn from the ATF-pool. Finally, the 3D two-channel audio is synthesized with the help of appropriately chosen HRTFs. A few problems have to be solved in the aforementioned procedure. For example, for an off-grid virtual listening point, its HRTF and ATF are interpolated using the existing HRTF library and the ATF-pool, respectively. At the end, the synthesized 3D acoustic signals are demonstrated with arbitrary virtual listening point and selected room reverberation environments.

並列關鍵字

microphone array ； 3D acoustic signal synthesis ； blind source separation (BSS) ； head-related transfer function (HRTF) ； acoustic transfer function (ATF) ； virtual listening point

參考文獻

[1] S. Choi, et al., "Blind Source Separation and Independent Component Analysis: A

Component Analysis for Blind Separation of Acoustic Signals,” IEEE Trans. on

[5] S. Haykin, Ed., Unsupervised Adaptive Filtering (Volume I: Blind Source Separation),

John Wiley & Sons, 2000.

[6] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and

國際替代計量

由麥克風陣列訊號合成出虛擬聆聽點的３Ｄ音訊

全文下載

主題瀏覽