適用於CUDA運算平台之平行化粒子濾波器設計

粒子濾波器 (particle filter) 乃以連續蒙地卡羅 (sequential Monte Carlo) 方法為基礎之濾波器。相較於傳統的卡爾曼濾波器 (Kalman filter)，粒子濾波器特別在日常生活中常見的非線性及非高斯 (non-Gaussian) 系統中擁有更優越的濾波能力，使其被廣泛地應用在許多場合，諸如目標追蹤、監視系統、機器人視覺、定位導航等等。正由於其廣泛的應用情境，因此在設計及實作上會特別有高度可重組性 (reconfigurability)、快速原型驗證 (prototyping)、以及在裝置上即時平行運算之需求。目前新興的統一運算架構 (CUDA) 圖形處理器 (GPU) 乃被認為最符合此設計要求之計算平台。因此，如何有效率地運用CUDA平台之兩大特色 ---單指令多執行緒 (SIMT) 以及階層化記憶體---來設計平行化之粒子濾波器，已經是一重要卻仍未解決之研究課題。本論文之主要目的，即透過定性以及定量之分析，提供在CUDA平台上有效率的平行化粒子濾波器設計。首先，透過可平行化程度 (parallelization degree) 及資料區域性 (data locality) 之分析，確認濾波器步驟中的重取樣 (resampling) 為平行化之瓶頸。此步驟涉及大量的全局資料交換，這是在CUDA平台中最緩慢耗時的一種運作程序，因為CUDA架構鼓勵區域性資料快速平行化之運算，而非費時的全局資料交換。對此，本論文提供了兩種技巧：(一) 有限次最大化權重之事前機率編輯 (finite-redraw importance-maximizing prior editing) 以及(二) 區域化重取樣 (localized resampling)，藉由付出少量的額外區域性快速平行化運算成本，來減少耗時的全局資料交換程序。在CUDA平台上實作的結果，不僅驗證了粒子濾波器中平行化程度與資料區域性分析所預期之現象，還更進一步證實了減少全局資料交換與付出額外區域性計算成本之間的取捨關係。藉由所提供之設計技巧，平行化粒子濾波器得以在CUDA上以較少之粒子取樣數 (number of particles) 達到較快的運行速度。在低階與中階的CUDA平台 GeForce 9400m 以及 GeForce GTS250 上，所提供之設計技巧最高分別可以達到 5.73 與 5.37 倍於直接實作於這些平台之粒子濾波器設計之顯著加速。

關鍵字

粒子濾波器；平行化設計；統一運算架構；通用圖形處理器

並列摘要

Particle filtering is a sequential Monte Carlo (SMC) based method which outperforms traditional Kalman based filters in a wide range of real-worlds applications involving the nonlinear/non-Gaussian Bayesian estimation, such as target tracking in surveillance systems, recognition in robot vision, positioning, navigation, and so on. Due to its demand for a great deal of reconfigurability, fast prototyping, and online parallel signal processing, the emerging GPU platform called compute unified device architecture (CUDA) may be regarded as the most appealing platform for implementation. Since the CUDA based platform features the single-instruction multiple-thread (SIMT) execution model and the hierarchical memory model for fine-grained scalability, how to implement an efficient parallelized particle filter design on CUDA becomes an essential yet unsolved problem. The objective of this thesis is to provide an efficient implementation method of parallelized particle filters on CUDA based computing platforms with conceptual and quantitative analysis. Based on the parallelization degree and data locality analysis, two design techniques, 1) finite-redraw importance-maximizing (FRIM) prior editing and 2) localized resampling, are proposed to conquer the bottleneck stage of the particle filtering, ie., the resampling stage, which involves data-dependent global operations. Since the characteristics of CUDA encourage the fast data-independent parallel computation rather than the slow global operations, the proposed techniques aim to reduce the time-consuming global operations with little overhead of additional local computation. The implementation results not only validate the analysis on parallelization degree and data locality of particle filters, but also verify the tradeoff relationships between the reduction on global operations and the local computation overhead. By using the proposed techniques, particle filters can be implemented on CUDA based platforms with less sample sizes and less execution time. On the low- and middle-end CUDA-enabled platforms, NVIDIA GeForce 9400m and GTS250, the speedup brought by proposed techniques can reach 5.73 and 5.37 times, respectively, compared with the direct implementations on these platforms.

並列關鍵字

Particle filter ； Parallelized design ； CUDA ； GPGPU

參考文獻

[1] R. E. Kalman, "A new approach to linear filtering and prediction problems," J. Basic Eng., vol. 82, pp. 35-45, 1960.

[5] N. J. Gordon, D. J. Salmond, and A. F. M. Smith, "Novel approach to nonlinear/non-Gaussian Bayesian state estimation," IEE Proc. F: Radar Signal Process., vol. 140, no. 2, pp. 107-113, Apr. 1993.

[7] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, "A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking," IEEE Trans. Signal Process., vol. 50, no. 2, pp. 174-188, Feb. 2002.

[8] F. Gustafsson, F. Gunnarsson, N. Bergman, U. Forssell, J. Jansson, R. Karlsson, and P.-J. Nordlund, "Particle filters for positioning, navigation, and tracking," IEEE Trans. Signal Process., vol. 50, no. 2, pp. 425-437, Feb. 2002.

[9] O. Cappé, S. J. Godsill, and E. Moulines, "An overview of existing methods and recent advances in sequential Monte Carlo," Proc. IEEE, vol. 95, no. 5, pp. 899-924, May 2007.

被引用紀錄

唐德成（2013）。以平行運算法進行火場模擬之初探〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-1508201312320400

國際替代計量

適用於CUDA運算平台之平行化粒子濾波器設計

全文下載

主題瀏覽