近年來,繪圖處理器(GPU)和川流結構的單一指令同步處理多組資料(SIMD)的架構技術,已經被廣泛地應用於電腦網路封包的輔助加速處理。本論文探討使用OpenCL語言,並以平行架構方式將基於速寫演算法之網路流量變異偵測系統,實現於繪圖處理器(GPU)、多核心微處理器(CPU)以及Cell寬頻處理器之中,並探討其效能之優劣。本文中將速寫演算法資料結構儲存到全域記憶體,讓資料處理以平行方式執行,實驗結果顯示,與序方式處理資料的微處理器(CPU)相比,使用AMD Radeon HD 5870繪圖處理器的平行能力處理速寫演算法可以明顯的提升執行速度,其中在雜湊涵數計算(4-Universal Hash)中可以達到15.3倍的速度提升,預估變異數之處理(Estimate)中也可以達到9.1倍的速度提升,在程式核心中,使用繪圖處理器最高可以使用超過50% (78.64 GB/s)的記憶體頻寬,實驗結果也顯示繪圖處理器運算方式非常適合使用在多個觀測點中之流量變異偵測計算。尤其當提升觀測點數量後,使用序列微處理與繪圖處理器比較,更可以發現其計算速度與效能之顯著提升。當提高到16個觀測點時,從本機記憶體傳輸資料到繪圖處理器記憶體,速度可以提高到2.28 GB/s。預估變異數之處理在多核心微處理器和Cell寬頻處理器中,也分別可以得到5.7倍以及5.83倍的速度提升。
GPU and other SIMD stream architecture have been used for accelerating packet processing applications. This thesis explores the parallel implementation of sketch-based network traffic change detection application on GPU, multi-core CPU, and Cell processor using OpenCL parallel programming framework. Due to parallel nature of sketch data structure, the sketch computations can be mapped to the OpenCL execution model on GPU, multi-core CPU, and Cell processor. The sketch data structure is mapped to buffer object in device's global memory and work-items are executed on these sketches in parallel. The experiment results on Radeon HD 5870 GPU show that the parallel implementation of these sketch operations can speedup the computation time compared to sequential CPU implementation. The hash computation and ESTIMATE operation achieved 15.3X and 9.1X speedup, respectively. Our kernel implementation can reached more than 50% (78.64 GB/s) peak memory bandwidth of the 5870 GPU. The results also show that GPU is suitable for the sketch computations from multi-monitor and the data transfer rate from CPU to GPU is more effective if more than one monitor is used. For 16 monitors, the transfer rate for transferring keys from CPU memory to buffer in GPU memory can reached 2.28 GB/s. On multi-core CPU and Cell processor, using the same kernels with GPU without any optimizations, compared to sequential CPU implementation, the ESTIMATE operation can achieved 5.7X and 5.83X speedup, respectively.