影像視訊縮放技術之畫質增強與評估

本論文研究重點為影像/視訊縮放技術的內容增強與品質評估技術。第一部分為影視訊內容增強技術，目的為從低解析度影像回復成高解析度影像並還原其細節，又稱超解析度技術。以學習為基礎之超解析度技術 (Learning-based super-resolution) 通常具有較佳之效能，在本論文中，我們將探討字典(Dictionary) 分類與選擇之議題，提升超解析度技術在各種不同的輸入低解析度多媒體訊號之重建品質。首先，影像超解析度技術發展已久，過去未有研究討論結構性雜訊 (例如：區塊效應) 干擾的低解析度影像問題。若將有塊效應之低解析度影像直接放大，將強化區塊效應。因此，本論文透過型態成分分析 (Morphological component analysis) 的訊號分解概念，將以習得之字典分割成純淨與具區塊效應兩個子字典 (Sub-dictionary)，並分別利用此兩子字典重建，透過僅保留純淨影像之重建結果，達到同時去除區塊效應與超解析度的效果。第二，人臉為一種結構性之輸入訊號，因此具有很強的先備資訊 (Prior)，因此可還原解析度較低之人臉影像。以學習為基礎之人臉仿真技術 (Face hallucination) 雖然可重建超低解析度之人臉 (Extreme low-resolution)，然而表示此低解析度人臉之重建係數 (Coefficients) 卻會因為解析度過低而估計不精確。本論文提出透過最大化事後機率 (Maximum-a-posterior) 模型修正原本估計之係數，提升重建人臉之品質。接著透過非負矩陣分解法 (Nonnegative matrix factorization) 訓練人臉局部區域 (即眼睛、鼻子與嘴巴) 之字典，並從中篩選出有用之基底 (Basis)，進一步提升整體品質。最後，本論文將以學習為基礎的超解析度演算法延伸至具有動態紋理背景之視訊上，過去少有文獻探討。具動態紋理背景視訊之重點為播放連續性 (Temporal coherency) 。這類型的影視訊可透過特殊設計的紋理字典與紋理合成技術放大，並得到絕佳的視覺品質；然在具有動態紋理的視訊上將面臨播放不連續性的問題。本論文透過將合成後的連續禎幅 (具有播放不連續性) 轉換為動態紋理合成 (Dynamic texture synthesis) 之線性系統，重繪超解析度之連續禎幅，保持播放連續性；並引入雙向區塊移動補償技術 (Bi-directional overlapped block-wise motion compensation)，只將超解析度技術應用於關鍵禎幅 (key-frame)，而非關鍵禎幅則利用移動補償來放大，可有效降低整體時間複雜度。論文的第二部分為影視訊縮放技術之品質評估。影視訊濃縮技術雖然有數種傑出的演算法，但時至今日都還沒有可靠的評估機制，可以客觀地量測出影像濃縮技術後的結果。本論文分析影視訊濃縮技術所可能隱含的失真類型，透過SIFT-Flow建立原始影像與濃縮影像的點對點對應地圖 (Dense correspondence map)，局部性地分析每個貼片的變異量以建立局部失真分數，再結合重要影像 (Saliency map) 建立每個區域的重要程度以及訊息損失 (Information loss) 程度來計算分數。最後，本論文拓展至視訊濃縮技術的評估。除了空間域上的失真，還必須考慮時間軸上的不連續性問題。因此，本文推廣SIFT-Flow到時間軸上的連續禎幅，透過分析時間軸之SIFT-Flow來計算時間軸上的失真分數，並結合空間域上的失真分數，發展出一套有效的視訊濃縮技術品質評估方法。實驗結果驗證了本文提出之影視訊濃縮技術之品質評估方法有效性。

關鍵字

動態紋理合成；品質評估；人臉仿真；影視訊超解析度；影視訊濃縮技術

並列摘要

This dissertation studies quality enhancement and assessment for image/video resizing. To achieve high-quality reconstruction of high-resolution (HR) details for a low-resolution (LR) image/video, super-resolution (SR) has proven to be an efficient approach. Particularly, learning-based SR schemes usually show superior performance, compared to conventional multi-frame SR approach. In part-I, we address three issues in learning-based image and video SR. The first task for real-world SR applications is to achieve simultaneous SR and deblocking for a highly compressed image. In our method, we propose to learn image sparse representations for modeling the relationship between low and high-resolution image patches in terms of the learned dictionaries for image patches with and without blocking artifacts, respectively. As a result, image SR and deblocking can be simultaneously achieved via sparse representation and MCA (morphological component analysis)-based dictionary classification. In this way, the learned dictionary can be successfully classified into two sub-dictionaries with and without blocking artifacts. Second, we propose a two-step face hallucination. Since the coefficients for representing a LR face image with LR dictionary is unreliable due to insufficient observed information, we propose a maximum-a-posterior (MAP) estimator to re-estimate the coefficients, which significantly improves the visual quality of the reconstructed face. Besides, the facial parts (i.e., eyes, nose and mouth) are further refined using the proposed basis selection method for overcomplete nonnegative matrix factorization (ONMF) dictionary to eliminate unnecessary information in basis. Third, we propose a texture-synthesis-based video SR method, in which a novel dynamic texture synthesis (DTS) scheme is proposed to render the reconstructed HR details in a temporally coherent way, which effectively addresses the temporal incoherence problem caused by traditional texture synthesis based image SR methods. To reduce the computational complexity, our method only performs the texture synthesis-based SR on a selected set of key-frames, while the HR details of the remaining non-key-frames are simply predicted using the bi-directional overlapped block motion compensation. After all frames are upscaled, the proposed DTS-SR is applied to maintain the temporal coherence in the HR video. The second part of this dissertation is quality assessment for image/video resizing techniques. Image/video retargeting algorithm has been comprehensively studied in past decade. However, there is no accurate objective quality assessment algorithm for image/video retargeting. We therefore propose a novel full-reference objective metric for automatically assessing visual quality of a retargeted image based on perceptual geometric distortion and information loss. The proposed metric measures the geometric distortion of retargeted images based on the local variance of SIFT flow vector fields. A visual saliency map is further derived to characterize human perception of the geometric distortion. Besides, the information loss in a retargeted image, which is estimated based on the saliency map, is also taken into account in the proposed metric. Furthermore, we extend the SIFT flow estimation to temporal domain for video retargeting quality assessment, in which the local temporal distortion can be measured by analyzing the local variance of the SIFT map vector fields. Experimental results demonstrate the proposed metrics for image and video retargeting significantly outperform existed state-of-the-art metrics.

並列關鍵字

Face hallucination ； Image/video super-resolution ； Texture synthesis ； Quality assessment ； Image/video retargeting

參考文獻

[1] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Mag., vol. 20, no. 3, pp. 21−36, May 2003.

[2] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe super resolution,” IEEE Trans. Image Process., vol. 13, no. 10, pp. 1327−1344, Oct. 2004.

[3] H. S. Hou and H. C. Andrews, “Cubic splines for image interpolation and digital filtering,” IEEE Trans. Acoust. Speech Signal Process., vol. 26, no. 6, pp. 508–517, Dec. 1978.

[4] B. Baker and T. Kanade, “Limits on superresolution and how to break them,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 9, pp. 1167–1183, Sept. 2002.

[5] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super-resolution,” IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56–65 Mar./Apr. 2002.

被引用紀錄

徐豪斌（2015）。攝影機移動下之壓縮域視訊縮小解碼技術〔碩士論文，國立中正大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0033-2110201614035322

國際替代計量

影像視訊縮放技術之畫質增強與評估

全文下載

主題瀏覽