針對稀疏重建應用之暖啟動演算法與多功能平台設計

稀疏性是一種自然界的重要特質。數十年來，自然訊號的稀疏性在如資料壓縮等的眾多領域中，扮演著重要的角色，甚至人類的大腦亦建基於這樣的特性。近年來，壓縮感知和稀疏表示法在訊號處理及電腦視覺社群中受到極大的關注。壓縮感知證明，如果目標訊號是稀疏的，即使我們以低於Nyquist-Shannon 理論所提出的速率進行採樣，仍可以精準地回復該訊號，並且於訊號感知的過程中同時進行壓縮。稀疏表示法將訊號表示為字典中元素的稀疏線性組合，已在多種電腦視覺應用中展現出優異的效能。此兩項技術建基於稀疏重建演算法之上，稀疏重建演算法是在假設解答或目標訊號為稀疏的情況下，對一欠定線性方程組進行求解。稀疏重建的主要難題在於其高複雜度的運算，使得我們難以將其使用在需要即時處理或有嚴格功率限制的應用之中。在此篇論文中，為了解決前述的問題，我們提出了一個以暖啟動同倫為基礎的演算法(WarmL1)，以及相對應的稀疏訊號重建平台。WarmL1提供一種通用的方法，可在多種動態系統變化之下快速重建稀疏訊號。即使系統僅遭輕微更動，從頭開始解起仍需要耗費大量計算資源。因此，在這樣的情況下，WarmL1採用前一輪的結果做為對新問題求解的起點，而非由基準面開始重新進行計算。根據實驗結果，非暖啟動同倫演算法對於測量值的需求量、速度、重建精準度與對雜訊的抵抗能力等指標上，在現有的非暖啟動演算法中有著最平衡的表現。此外，在此研究中，我們也納入數項針對實際應用的演算法最佳化方式。我們已進行了許多的數值實驗，並提出許多應用，其結果顯示WarmL1與相關演算法相比，可以加速3.2倍到37.5倍，或在以同樣的運算成本下，其l2錯誤僅為其它演算法的5100分之1。藉由專用積體電路(Application-specific Integrated Circuit)的實作方式，我們提出的平台滿足了即時且低能耗的應用需求。此外，考量到平台的多功能性，此平台同時支援壓縮感知及稀疏表示法的應用，及非暖啟動和暖啟動重建演算法。最大的設計挑戰在於對頻寬與儲存空間的高度需求，以及運算上的高複雜度。為了解決頻寬及儲存空間的問題，我們提出一個含快速狀態跳躍轉移技術的線上矩陣生成機制。此方法只需儲存種子值，而感知矩陣則由這些種子值及給定的狀態轉換隨機生成。因此，對儲存空間的需求由1 Gb降為10 Kb。另外，快速狀態跳躍轉移技術實現了對感知矩陣的高速隨機存取。為了處理高度複雜的計算，我們提出使用19個核心的多核心設計，達到256重的平行度，且每個核心皆可停用，來進一步降低能耗。而為了將速度再提升，我們也提出了特別設計的隨機定位條狀矩陣。接著，我們設計了一個基於Cholesky矩陣分解的高效率線性代數引擎，並提出精心設計的資料流、有效的緩衝機制，以及就地矩陣更新技術。我們所提出的稀疏重建平台，整體運作功率僅242.4毫瓦。在車輛追蹤應用上，達到每秒128.7幀的速度；在監控影片重建任務上，達到每秒28.1幀的速度。因此滿足了即時、低功率與多功能的應用需求。

關鍵字

稀疏重建；暖啟動；壓縮感知；稀疏表示法；多核心設計

並列摘要

Sparsity is a property of the natural world. For decades, the sparsity of natural signals has played an important role in various fields such as data compression. The powerfulness of sparsity is also demonstrated in the human brains. In recent years, compressed sensing and sparse representation have attracted substantial attention in the signal processing and computer vision communities. Compressed sensing demonstrates that a sparse signal can be recovered precisely using a sampling rate lower than that proposed in Nyquist-Shannon theory; additionally, the compression is conducted as the signal is sensed. Sparse representation models a signal as a sparse linear combination of atoms from a dictionary, achieving state-of-the-art performance for several computer vision routines. These two techniques are built on sparse reconstruction algorithms. Sparse reconstruction algorithms solve an underdetermined system of linear equations based on the assumption that the answer or the target signal is sparse. The main challenge of sparse reconstruction is the high computational complexity, which hinders its integration into applications that require real-time processing or that have stringent power constraints. In this thesis, to address the issues mentioned previously, we propose a warm-start homotopy-based algorithm (WarmL1) and a corresponding platform for sparse signal reconstruction. WarmL1 provides a general method for rapidly reconstructing a sparse signal under several dynamic system modifications. Although the system is altered slightly, resolving the new problem from a base level requires significant computational resources. WarmL1 solves the new problem by adopting the previous result as the starting point rather than recalculating a new solution from the base level. According to the experimental results, the non-warm-start homotopy-based algorithm provides the most balanced performance regarding the need for measurements, speed, and reconstruction accuracy with noiseless and noisy measurements among existent non-warm-start algorithms. In this study, we also included several algorithm optimization methods for practical implementation. Extensive numerical experiments and applications were performed and proposed. The results show that WarmL1 can achieve 3.2x to 37.5x acceleration or up to 1/5100 l2-error at the same computational costs compared to that achieved in related works. The platform with the Application-Specific Integrated Circuit (ASIC) implementation enables real-time and low-power sparse reconstruction applications to be realized. Considering the versatility, the platform supports both compressed sensing and sparse representation applications, and both non-warm-start and warm-start reconstructions. The primary challenges are the high bandwidth demands, storage requirements, and computational complexity. To address the bandwidth and storage requirements, we proposed an online matrix generation scheme with fast state-skipping technique. In this scheme, only the seeds are stored, and the entries in the sensing matrix are randomly generated using the seeds and given state transitions. Thus, the storage requirements are reduced from 1 Gb to 10 Kb. Moreover, the fast state-skipping technique enables rapid random access to the sensing matrix. To address the high computational complexity, we proposed a multi-core design with 19 cores to provide a parallelism of up to 256. Additionally, each core can be disabled to reduce the power further. We also proposed a specifically designed randomly positioned stripe-based matrix to boost the speed further. Subsequently, we developed an efficient Cholesky-factorization-based linear algebra engine. A carefully designed data flow, efficient buffering mechanism, and in-place matrix updating technique were proposed. The proposed sparse reconstruction platform achieved 128.7 fps for the vehicle tracking application and 28.1 fps for the surveillance video reconstruction task, with a power of 242.4 mW. Thus, the real-time, low-power, and versatility requirements were satisfied.

並列關鍵字

Sparse reconstruction ； Warm start ； Compressed sensing ； Sparse representation ； Multi-core design

參考文獻

K. Kelly, and R. Baraniuk, “An architecture for compressive imaging,”

in Image Processing, IEEE International Conference on, pp. 1273–

1276, oct. 2006.

[3] B. Olshausen and D. Field, “Sparse coding of sensory inputs,” Current

opinion in neurobiology, vol. 14, pp. 481–487, aug 2004.

國際替代計量

針對稀疏重建應用之暖啟動演算法與多功能平台設計

全文下載

主題瀏覽