透過您的圖書館登入
IP:18.118.126.241
  • 學位論文

高硬體效能之高度視差範圍雙眼匹配系統之架構與演算法設計

Hardware-efficient Algorithm and Architecture Design for Extremely Large Label Counts Stereo Matching

指導教授 : 陳良基
本文將於2026/02/03開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


精準的3D資訊在電腦視覺的眾多應用中是關鍵技術之一,舉凡自駕車、機器人以及擴增實境。隨著影像解析度的提升,深度圖的視差範圍勢必也需要隨著提升。然而,過往的研究多半專注在如何在既有的測資中進行設計,深度圖的視差範圍(disaprity range)增加而緊接帶來的運算複雜度以及記憶體大小的需求增加並沒有著墨太多。絕大多數的硬體架構,並不適用於大視差範圍的情況,隨著視差範圍的增加,運算複雜度以及記憶體(SRAM)大小往往都會成長至無法負荷。在本論文中,我們著重在提供一個可用於大視差範圍的深度偵測系統演算法以及架構設計。本系統包含高硬體與記憶體效能的基於信心傳遞(Belief-propagation)深度偵測模組以及佔用恆量記憶體的深度圖優化模組。 第一個部分提出了用於信心傳遞深度偵測模組高硬體與記憶體的架構。信心傳遞深度偵測演算法因為其規律性以及優良的品質多被選為實現的演算法。然而,我們觀察到目前大多相關架構多因大量記憶體的需求限制以及陷於複雜度以及速度的兩難當中,而無法將其設計擴充支援至更高的視差範圍。這個部份我們提供了兩個技術來分別對應解決這兩個問題。首先,我們藉由觀察信心傳遞演算法的資料特性,設計出高記憶體效能的資料傳遞方式。其次,我們將原有的龐大樹狀比較器架構,置換成由可分享的單位組成,除了大幅降低硬體複雜度之外,同時仍保留低延遲的好處,在現有的架構中取得一個最好的平衡。在這個部分的架構中,可以將所需要的記憶體降低67.8%。在視差範圍達到512時,更可以節省86.2%的邏輯閘,並且不會帶來對於品質的影響。透過實驗,可以顯示所設計的深度偵測架構更能夠適用於高視差範圍的情況。 第二個部分我們提出了一個使用恆量記憶體的深度圖優化硬體架構,可支援極大的高視差範圍,簡稱為CMWMF。這個模組希望解決的是現有深度圖優化引擎都會面臨到的問題:隨著視差範圍隨著影像解析度提升,深度圖優化模組的運算複雜度以及所需要的記憶體大小也同樣的隨之提升。透過觀察以及善用自然圖片中絕大多數都是深度連續的特性,本篇論文提出了可有效降低記憶體需求的硬體架構。此外,我們也希望所提出來的架構能夠支援多種不同的演算法。該架構包含兩項技術,分別為使用恆量記憶體的硬體架構以及可同時支援三種不同的演算法包含權重極值濾波器、權重中位數濾波器、權重平均濾波器。首先,我們藉由保留最具有指標性的資料,在避免儲存過多資料的同時可以同時降低對於結果的影響。其次,我們改善了權重中位數濾波器運算複雜度過大而難以設計硬體架構的問題,並且將三種不同的演算法融合成雷同的資料流動型態。如此一來,所提出的架構能夠支援三種不同的演算法。在架構中採用了索引檢查的機制用於查找並處理亂序的權重統計資料。融合了以上幾種技術,我們提出了一個使用恆量記憶體的深度圖優化硬體架構並且同時可支援三種不同的濾波器。最終結果該架構可以見少92.4%的記憶體需求,並且以幾乎無法觀察到的品質下降作為代價。根據既有的測資包含KITTI、Middlebury以及實際由深度相機拍攝取得的深度圖證明,在我們所提出的方法中除了大幅降低演算法的需求之外,亦有保留足夠的資訊所以並沒有大幅的影響品質。

並列摘要


Accurate 3D information is one of the most curial techniques in computer vision (CV) applications such as autonomous driving, robotics, and AR. In previous decades, most researchers focus on algorithm and architecture design for different kinds of stereo matching methods under limited disparity ranges. Achieving a higher video and disparity resolution with reasonable depth granularity is an emerging problem. Most of the methods are not suitable for large label counts which are referred to as the disparity range, since the hardware or memory complexity becomes unaffordable for VLSI implementation. In this work, we focus on a cost-effective stereo matching stereo system for large label counts. The proposed system is composed of a hardware and memory-efficient architecture for belief propagation (BP) based disparity estimation and constant memory architecture for depth enhancement. The first part proposes a hardware and memory-efficient architecture for belief propagation (BP) based disparity estimation. Belief propagation (BP)-based stereo matching has popular owing to its regularity and ability to yield promising results. Some commonly observed hardware-implementation challenges pertaining to the use of this algorithm are large memory requirements and trade-offs between speed and chip area, along with an increasing disparity range. This part presents a hardware- and memory-efficient architecture for building a BP-based disparity estimation system capable of overcoming issues associated with large disparity ranges. The proposed architecture is memory-efficient owing to the regularity of its underlying algorithm. In addition, the improved hardware efficiency can be attributed to processing element modifications to demonstrate shareable characteristics. Results obtained in this study reveal a 67.8 % reduction in required memory corresponding to a time–area term complexity of O(L(logL)^{2}), where L denotes the disparity range. This result is in stark contrast to the O(L^{2}logL) and O(L^{2}) complexities observed in extant studies. Compared to state-of-the-art simple mentations, the proposed architecture offers an 86.2% gate count reduction for message update units at a disparity range of $512$. These results confirm the proposed architecture's suitability for use in large disparity scenarios. The second part proposes a constant memory hardware architecture that can support weighted mode, median, and joint bilateral filters, which are referred to as CMWMF. This part aims to meet the high memory and computation requirements of processing depth map with a large number of depth candidates. In the proposed architecture, we leverage the geometry smoothing characteristic of natural images to reduce the static random access memory (SRAM) size for hardware implementation. The architecture preserves a constant number of disparity values instead of depending on the label count and size of the local supporting window. A novel weighted median search procedure is proposed, which assigns a computation to each input cycle, thereby rendering the process hardware friendly. An index-checking technique is proposed to process out-of-order joint histograms. We adopted the above-mentioned techniques in our architecture as they consume a constant SRAM size and supports multiple types of filters. As a result, this architecture reduces the SRAM size by 92.4 % with a negligible decrease in performance. According to our analysis on the KITTI, and Middlebury datasets, and with actual depth cameras, the preserved information is sufficient. The proposed architecture is one of the most suitable depth refinement architectures for scenarios having a large number of depth candidates.

參考文獻


[1] M. Menze and A. Geiger, "Object scene flow for autonomous vehicles," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3061-3070.
[2] J. Sun, N.-N. Zheng, and H.-Y. Shum, "Stereo matching using belief propagation," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 25, no. 7, pp. 787-800, July 2003.
[3] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient belief propagation for early vision," International Journal of Computer Vision (IJCV), vol. 70, no. 1, pp. 41-54, 2006.
[4] C. Cheng, C. Liang, Y. Lai, H. Chen, and L. Chen, "Fast belief propagation process element for high-quality stereo estimation," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2009, pp. 745-748.
[5] H. Chen, C. Huang, S. Wu, C. Hung, T. Ma, and L. Chen, "23.2 a 1920x1080 30fps 611 mw five-view depth-estimation processor for light-field applications," in IEEE International Solid State Circuits Conference (ISSCC), Feb 2015, pp. 1-3.

延伸閱讀