隨著視訊技術的提升,能夠支援的解析度越來越高,但同時也造成視訊編碼在視訊壓縮上計算複雜度,而此計算複雜度主要來自越來越高的解析度會提升視訊編碼中運動估計的計算量。在運動估計中存在許多高重複性、高複雜度的運算,若將這些運算平行化或是分散給多台電腦執行,將能降低傳統單台電腦在執行上的負擔。本研究基於平行化與分散式運算架構兩個方向提出運動估計演算法,我們採用OpenMP運算架構將現有的完全搜尋演算法與多方向梯度下降搜尋演算法平行化,完全搜尋演算法除了將搜尋點數平均分配至各個子執行緒外,同時加入了運動估計的特性修改搜尋點數的分配,使各個子執行緒的工作性質更為相近,進而減少同步成本達到加速效果;分散式運算基於Hadoop分散式運算架構設計高效能運動估計演算法計算叢集平台,並在此叢集平台上設計三種級別的平行化方式:畫面級平行化、區塊級平行化,與搜尋點數平行化,以這三種平行化針對搜尋範圍、叢集數量、測試序列特性進行實驗,並分析其結果。實驗結果顯示畫面級平行化具有較好的平行化效果。
Along with video technology enhancement, supported resolution has been getting higher and higher, however in the meantime, resulting in increased the computation complexity of video compression. The computation complexity comes mainly from the motion estimation process during video coding. For the process of motion estimation, most calculations have high repetitive nature. If these operations are parallelized or distributed to multiple threads/computers, the burden can be dramatically reduced by comparing with the traditional implementation on a single computer. This work has proposed two types of motion estimation algorithm based on parallel and distributed computing architecture. For the first type of motion estimation algorithm, we adopt OpenMP computing architecture to parallelize existing full search algorithm and multi-directional gradient descent search algorithm. In the Full search algorithm, we not only schedule search points to threads averagely, but also add motion estimation behavior to fix full search algorithm’s parallel schedule rule. This proposed method makes threads’ work more similar, and then reduce synchronization cost to enhance algorithm’s encoder speed. In other hand, for the second type of motion estimation algorithm, a motion estimation cluster platform based on hadoop distributed computing architecture is proposed. Three levels parallel methods are designed including frame level, block level, and search point level. These three level parallel methods are also evaluated in different search range, number of cluster, and test sequence, and analyze the experiment results. The experiment results show that frame level has better parallel performance.