透過您的圖書館登入
IP:18.223.112.172
  • 學位論文

壓縮感知及其於影片與音訊處理之應用

Compressive Sensing and its Applications in Video and Audio Processing

指導教授 : 貝蘇章
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


傳統的取樣定理 (Sampling Theorem) 指出,若信號為帶寬限制的, 且取樣頻率大於等於信號最高頻率的兩倍時,則取樣後的信號完整保 留了原始信號的訊息,可完整的重建原來的信號。若取樣頻率不足, 會導致頻譜的重疊進而造成失真,稱為混疊 (Aliasing Effect)。由於人 說話的頻率大約介於 300Hz 到 3400Hz 之間,通常電話的取樣頻率約 為 8000Hz。而人耳能聽見的頻率範圍在 20Hz 到 20000Hz 之間,所以 通常音樂檔案的取樣頻率為 44100Hz。至於影片檔的樣頻率都非常的 大,常常是 MHz 的等級。如此導致資料量非常龐大,常常需要另外進 行壓縮以便儲存或者傳輸。 壓縮感知 (Compressive Sensing) 為近年來發展出的一種全新的信號 取樣與重建理論。此理論係由 Emmanuel Candes, David Donoho, 以及 Terence Tao 等人共同提出。此理論的核心概念為:若某信號在特定正 交空間具有稀疏性 (sparsity),則即可以一遠低於 Nyquist 取樣頻率之頻 率來進行取樣,且仍能精確的重建此信號。若用傳統的取樣方式,若 信號包含五百個數據,則得至少進行五百次測量才能精確的復原此信 號。也就是說要用五百條方程式來解出五百個未知數。而壓縮感知則 是假定信號具有稀疏性,也就是只有少數地方有值,因為這個稀疏性, 可以讓我們只藉由一百次測量就完整的復原此信號。也就是說用一百 條方程式來解出五百個未知數。如此一來,需要被記錄的資料量就比 原本少了許多,進而達到壓縮的效果。 目前已有許多壓縮感知相關的應用,例如降噪 (denoising) 、影像 修補 (inpainting) 、影像恢復、物體偵測、臉部辨識、雷達脈衝取樣... 等。近年也有手機的相機晶片採用此方法來降低記錄照片所需之能量。 而目前的核磁共振 (Magnetic Resonance Imaging) 之掃描取樣也有用到 此方法。 而基於壓縮感知理論,發展出另一個快速低階近似之演算法 GoDec。以往都用奇異值分解 (Singular Value Decomposition) ,其時 間複雜度較高,而此快速低階近似方式大大降低了所需耗時,效果也 不比奇異值分解差。本篇論文先介紹了壓縮感知以及一些現存應用,也討論了高維度資料之恢復。之後研究了低階近似對於時間序列的資 料(例如影片)之意義,也探討了此演算法於影片以及音訊上的應用。 把快速低階近似結合了一些周邊技術,得到一個影片除雨演算法。此 演算法不但能有效除去影片裡的雨,更改善了整體的亮度及色相表現。 不只是雨,此方法也能應用於除霧、除雪。且由於並非如傳統的演算 法那般進行偵測雨、移除、填補空隙的步驟,所需時間遠小於其他現 存方法。所求演算法複雜度低、也不需太大的記憶體,除雨、雪、霧 的效果上以及色彩上的恢復都優於目前已知的其他方式。若再進行優 化,在硬體上進行即時處理可達到更多應用。

並列摘要


The Shannon and Nyquist’s sampling theorem states that: when a signal is bandlimited, and the sampling frequency is at least twice faster than signal bandwidth, then the sampled signal preserve enough information to recon- struct the original signal. If the sampling frequency isn’t high enough, there will be some overlapping of the sampled signals and cause aliasing effect. The frequency range of human speech is between 300Hz to 3400Hz, so the standard sampling frequency of telephone communication is 8000Hz. And the hearing range of human ear is given as 20Hz to 20000Hz, so 44100Hz is a common sampling frequency for most of the digital audio files. As for video file, the sampling frequency is usually several MHz. The data would be quite huge, compression is always needed for the convenience of transmission or storage. Compressive sensing is a newly developed signal sampling and recon- struction theorem. This theorem is proposed by Emmanuel Candes, David Donoho, and Terence Tao. The core concept of this theorem is that, when a signal has sparsity in a specific orthogonal space, we can sample this signal at a frequency which is much lower than Nyquist rate, and still be able to reconstruct it precisely. If we use traditional sampling method, for a signal with length 500, we need at least 500 masurements to reconstruct this signal. Just like we need 500 equations to solve a linear system with 500 unknowns. Compressive sensing assumes that signals have sparsity in specific domain,i.e. only a few positions are nonzero. Due to the sparsity, we are able to recon- struct the whole signal with simply 100 measurements. Just like we can solve a linear system with 500 unknowns with only 100 equations. Compression is therefore achieved since the amount of data is drastically reduced. There are many applications based on compressive sensing, like denois- ing, image inpainting, image restoration, object detection, face recognition, incoherent sampling of radar impulse...etc. Compressive sensing is also brought to a mobile phone camera sensor and MRI imaging. Based on compressive sensing, a fast low-rank approximation algorithm GoDec is developed. In the past, singular valude decomposition (SVD) is used to obtain a low-rank approximation. But the time complexity of SVD is too high, GoDec outperforms SVD when considering time consumption. In this thesis, basic concepts and some existing applications of compressive sens- ing are introduced. Compressive sensing for high dimensional data restora- tion is also discussed. Then I demonstrate the physical meaning of “low-rank approximation” of time series data, and show some applications of this algo- rithm on video and audio file. Combining this fast low-rank approximation with some existing techniques, a new video rain removal algorithm is pro- posed. Not only the rain is removed by this algorithm, the performance of hue and brightness is also improved. It can be further applied to haze or snow removal. Unlike existing rain removal algorithms, which are usually consist of rain detection, rain removal and pixel interpolation procedures, the compu- tational time of my algorithm is much shorter than other algorithms. The time complexity of this proposed method is low, and it doesn’t require huge mem- ory, The ability of rain, snow, haze removal and color restoration are both better than existing methods. Real-time processing on hardware is possible under optimization.

參考文獻


[1] Cesar F Caiafa and Andrzej Cichocki. Computing sparse representations of mul- tidimensional signals using kronecker bases. Neural computation, 25(1):186–220, 2013.
[5] Kshitiz Garg and Shree K Nayar. Detection and removal of rain from videos. In
Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 1, pages I–528. IEEE, 2004.
[7] HJ Landau. Sampling, data transmission, and the nyquist rate. Proceedings of the IEEE, 55(10):1701–1706, 1967.
[8] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on pure and ap- plied mathematics, 59(8):1207–1223, 2006.

延伸閱讀