透過您的圖書館登入
IP:3.15.225.173
  • 學位論文

軟式計算於視訊超解析度之應用

Soft Computing Methods for Video Super-resolution

指導教授 : 黃國勝 林迺衛
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


隨著多媒體與通訊裝置的蓬勃發展,顯示器解析度的增加已成為未來必然的趨勢,舊有低解析度視訊影片,在新興高解析度顯示裝置中撥放,已無法滿足使用者的視覺需求。視訊超解析度提升(video super-resolution),即在此科技需求中孕育而生。視訊超解析度提升方法,是將低解析度視訊中的影像,與前後多張影像一併考慮處理後放大為高解析度影像。視訊超解析度提升時,會考慮該視訊中前後多張具有時間關聯性的低解度影像,彼此在空間與時間上的特徵資訊進行影像的放大,因此在影像提升後會具有較佳的視覺影像品質。 軟式計算(soft computing)可區分為最佳化演算法(optimization algorithm)和機器學習(machine learning)兩類。常見的最佳化演算法有粒子群最佳化(particle swarm optimization, PSO)與基因演算法(genetic algorithm, GA)等。機器學習方面最常用的有類神經網路(artificial neural network, ANN)與支撐向量機(support vector machine, SVM)等。粒子群最佳化演算法是一種以族群為基礎的迭代演算法,可以在較短的訓練時間內獲得可靠的近似解,其可以較低的運算來解決高複雜性問題。類神經網路是模擬生物神經元訊息處理,與傳導結構的數學模型,可以利用統計學中的統計模型加以詮釋,因此,成為數學統計學習方法中可應用於實際問題處理的模型。 本論文中,我們首先提出一種採用影像融合方式的視訊超解析度(super-resolution)提升方法。在本方法中,我們使用視訊運動補償(motion compensation)與影像內插方法,分別產生四張解析度較佳的視訊影像。隨後,利用時間與空間上的特徵資訊,對要處理的視訊影像進行分類,依據分類的結果,使用粒子群最佳化演算法來找尋可靠且有效的融合參數,對四張解析度較佳的視訊影像進行融合成為超解析度視訊影像(super-resolved frame)。 第二個部分,我們以nonlocal-means (NLM)視訊超解析度強化方法為基礎,提出可移動式之視訊運動搜尋(motion search)方法,在減低視訊運動搜尋計算量的同時,進一步保持運動搜尋效果。此外,一個適應性的patch大小調整方法,則使用來提升超解析度視訊影像的視覺效果。 第三個部分,我們提出一個以類神經網路學習方法為基礎的視訊超解析度提升方法。在本方法中,我們利用視訊運動搜尋方法,對視訊影像收集適當的訓練資料供ANN進行訓練,依據訓練結果所得之參數與權重可以有效提升視訊影像的解析度。我們也在此基礎下,加入適當的分類方法,對要提升的視訊影像進行分類處理,以改善超解析度視訊影像中物體邊緣的視覺影像品質。此外,一個以雙向濾波器為基礎的方法則被運用來對收集的訓練資料進行前處理,以進一步改善類神經網路的學習效果,並提升超解析度視訊影像的品質。實驗結果顯示,本論文所提出的三個方法皆可有效改善視訊超解析度於影像提升後的品質。 本論文研究之主要貢獻,在利用所提之方法提取出空間與時間特徵,並與軟式計算方法加以結合運用在視訊超解析提升上,最後以實驗結果證明所提之概念的可行性與有效性。

並列摘要


Video super-resolution is an important issue in contemporary applications. Two main reasons prompt the demand for such technology, namely low-resolution (LR) capturing and low bandwidth communication, in which high-resolution (HR) display is required at the user end. In video super-resolution, construction of one HR frame can be performed from a set of successive LR frames, instead of from just one LR frame. The better construction results can be obtained. Soft computing can be divided into two categories, namely, optimization algorithm and machine learning. Optimization algorithm includes particle swarm optimization (PSO), genetic algorithm (GA), and etc. Machine learning includes artificial neural network (ANN), support vector machine (SVM), and etc. PSO is a population-based algorithm for searching sub-optima. It can be used to solve complicated optimization problems with low cost. Artificial neural network (ANN) is a biologically motivated learning machine inspired from biological neurons and the nervous systems. ANN serves as powerful computational tool for nonlinear prediction problem. In this thesis, first we propose a super-resolution method which consists of three main modules, i.e., supersampling, spatio-temporal classification, and frame fusion using PSO. In the proposed method, the LR frames are super-resolved to high-resolution frames through the fusion of four full-resolution frames. One of four full-resolution frames is obtained using direct spatial interpolation, and the other three are obtained using motion compensation with given reference frames. The essence of the proposed method is the spatio-temporal classification mechanism that exploits the temporal variation between frames and the spatial energy inside the frame. Using the classification results, PSO is used to determine the optimal weights for frame fusion. Second, a new video super-resolution approach using a mobile search strategy and adaptive patch size is proposed. Based on the modified nonlocal-means (NLM) super-resolution algorithm, a mobile search strategy for motion estimation and adaptive patch size are proposed to reduce the computational complexity of the proposed approach and improve the visual quality of the final video super-resolution results, respectively. Finally, a classification-based video super-resolution method using artificial neural network (ANN) is proposed to enhance low-resolution (LR) to high-resolution (HR) frames. The proposed method consists of four main steps: classification, motion-trace volume collection, temporal adjustment, and ANN prediction. A classifier is designed based on the edge properties of a pixel in the LR frame to identify the spatial information. To exploit the spatio-temporal information, a motion-trace volume is collected using motion estimation, which can eliminate unfathomable object motion in the LR frames. In addition, temporal lateral process is employed for volume adjustment to reduce unnecessary temporal features. In the final step, ANN is applied to each class to learn the complicated spatio-temporal relationship between LR and HR frames.

參考文獻


[54] R. C. Gonzalez, R. E. Woods, Digital Image Processing, 2nd ed., New Jersey: Prentice Hall, 2002.
[17] X. Li, K. M. Lam, G. Qiu, L. Shen, S. Wang, Example-based image super-resolution with class-specific predictors, Journal of Visual Communication and Image Representation 20 (5) (2009) 312-322.
[1] G. Caner, A. M. Tekalp, W. Heinzelman, Super resolution recovery for multi-camera surveillance imaging, in Int. Conf. on Multimedia and Expo, Baltimore, USA, 2003, pp. 109-112.
[2] L. Zhang, H. Zhang, H. Shen, P. Li, A super-resolution reconstruction algorithm for surveillance images, Signal Processing 90 (3) (2010) 848-859.
[3] B. K. Gunturk, A. U. Batur, Y. Altunbasak, M. H. Hayes, III, R. M. Mersereau, Eigenface-domain super-resolution for face recognition, IEEE Trans. on Image Processing 12 (5) (2003) 597-606.

延伸閱讀