透過您的圖書館登入
IP:3.21.76.0
  • 學位論文

智慧影像超解析和透明度估計與基於度量學習的圖像分類

Image Super Resolution, Image Alpha Estimation, and Metric Learning for Image Classification

指導教授 : 莊仁輝 王才沛

摘要


電腦視覺,使用圖像感測器獲取物體的圖像,將圖像轉換成數位影像,並利用演算法模擬人類的判別準則去理解和識別圖像,達到分析圖像和作出結論。電腦視覺演算法結合了人工智慧、機器學習、圖像處理、電腦科學及神經生物學等多領域的綜合學科。電腦視覺技術使用數位攝影機模擬人眼視覺,用電腦程式和演算法來模擬人對事物的認知和思考,替代人類完成為其設定的工作。在本論文中,我們討論了三種不同的應用並結合機器學習與圖形識別的方法,從影像的點、區塊與特徵為單位做運算,來達成優於傳統影像處理演算法的成果。 首先,第一個主題,本論文提出一種智慧影像超解析的演算方法。超解析影像是使用一個或外部多張高解析影像輸入生成高解析影像的過程。此演算法是用於增強影像圖像解析度不足的問題,也就是當影像放大時產生解析度或影像品質下降的問題。一般常用的內插方法通常會使圖像邊緣和細緻內容模糊化。雖然目前已經有許多影像超解析方法被提出了,但生成影像超解析圖像的仍然是一項具有挑戰性的任務。因此,我們提出了一種結合了內部和外部影像超解析方法優點的演算法。在前處理部分,我們利用影像本身的資訊來產生初始高解析度的影像。接下來,我們補償缺少的高頻資訊以生成最終的超解析度的影像。然後採用迭代反投影來進一步提高視覺效果。實驗結果顯示本論文所提出的新方法對於增強影像解析度有令人滿意的效果。 第二個主題,自然影像擷取與透明度估計的演算方法是指從圖像中擷取前景的問題。早期的演算法所使用的取樣方法通常只在未知像素附近的空間採集樣本,如果沒有找到真實的前景和背景樣本點(群)則可能會導致較差的結果。在本研究中,我們提出了一種較穩定的方法,通過像素顏色和相鄰位置來從圖像中提取前景和背景元素。此外,我們視影像中每一個點為獨立的點,在圖形運算架構中,在影像中的點與點彼此之間有邊相連,並且對影像中的每一個點作運算。基於圖形運算能重新校正周圍點的優點,我們所提出的方法能夠將影像未知的透明度估計做進一步的校正,達到更符合人眼視覺的效果。這樣的方法能夠突破傳統局部分析所可能造成的誤判,並且提升分析複雜影像之可行性。實驗結果顯示,我們可以產生視覺上合理的透明度估計且可以精確地將影像前景物件與背景分離。 第三個主題是圖像的分類問題,圖像分類是許多電腦視覺應用的核心任務。基於大數據分類天氣圖像類別是一個具有挑戰性的問題。然而,使用彩色圖像進行氣候相關影像識別的研究很少,特別是對於大量影像數據。在這項研究中,我們提出了一個基於度量學習的方法來處理兩類影像分類問題。度量學習是個有效率的分類方法,它可以根據資料的屬性針重新定義距離矩陣。從圖像中提取特徵並且具有辨識意義是一項具有關鍵性的任務。在本文中,我們根據不同天氣條件下擷取的戶外圖像的定義特徵,並且在分類的過程中,我們使用度量學習方法來提高分類的準確度。實驗結果表示,基於度量學習的分類方法在影像分類是有效的並且結果優於傳統的分類方法。最後,我們也與深度學習的方法比較,度量學習的方法也有速度上的優勢。

並列摘要


Computer vision technology uses digital cameras to simulate human vision, computer programs and algorithms to simulate people's understanding and thinking about things. Computer vision algorithms combine a wide range of disciplines such as artificial intelligence, machine learning, image processing, and neurobiology. In this thesis, we discuss three different applications combined with machine learning and pattern recognition algorithms, from pixel, patch, and feature units of the image to achieve better results than traditional image processing algorithms. In the first part, image super resolution is the process of generating a high-resolution (HR) image using one or more low-resolution (LR) inputs. Many SR methods have been proposed but generating the small-scale structure of an SR image remains a challenging task. We hence propose a single-image SR algorithm that combines the benefits of both internal and external SR methods. First, we estimate the enhancement weights of each LR-HR image patch pair. Next, we multiply each patch by the estimated enhancement weight to generate an initial SR patch. We then employ a method to recover the missing information from the high-resolution patches and create that missing information to generate a final SR image. We then employ iterative back-projection to further enhance visual quality. The method is compared qualitatively and quantitatively with several state-of-the-art methods, and the experimental results indicate that the proposed framework provides high contrast and better visual quality, particularly for non-smooth texture areas. The primary focus of the second part was to present a new approach for extracting foreground elements from an image by means of color and opacity (alpha) estimation which considers available samples in a searching window of variable size for each unknown pixel. Alpha-matting is conventionally defined as the endeavor of softly extracting foreground objects from a single input image and plays a central role within the realm of image-processing. In particular, the challenging case of natural image matting has received considerable research attention since there are virtually no restrictions for characterizing background regions. Many algorithms are presently available for estimating foreground samples and background samples for all unknown pixels of an image, along with opacity values. Given a trimap configuration of background/foreground/unknown regions of an input image, a straightforward approach for determining an alpha value is to sample (collect) unknown foreground and background colors for each unknown pixel defined in the trimap. Such a proposed sampling method is robust in that similar sampling results can be generated for input trimaps of different unknown regions. Moreover, after an initial estimation of the alpha matte, a fully-connected conditional random field (CRF) can be adopted to correct a predicted matte at the pixel level. In the third part, we developed effective weather features to solve the problem of weather recognition using a metric learning method. The recognition of weather conditions based on single image in large datasets is a challenging problem in computer vision. Although previous approaches have proposed methods to classify weather conditions into classes such as sunny and cloudy, their performance is still far from satisfactory. Under different weather conditions, we defined several categories of more robust weather features based on observations of outdoor images. We improve the classification accuracy using metric learning approaches. The results indicate that our method is able to provide much better performance than previous methods. The proposed method is also straightforward to implement and is computationally inexpensive, demonstrating the effectiveness of metric learning methods with computer vision problems.

參考文獻


[1] Park SC, Park MK, Kang MG (2003) Super resolution image reconstruction: A technical overview. IEEE Signal Process Mag 20: 21–36
[2] Keys RG (1981) Cubic convolution interpolation for digital image processing. IEEE T Acoust Speech ASSP-29: 1153–1160
[3] Hou HS, Andrews HC (1978) Cubic splines for image interpolation and digital filtering. IEEE T Acoust Speech 26: 508–517
[4] Li X, Orchard MT (2001) New edge-directed interpolation. IEEE T Image Process 10: 1521–1527
[5] Shan Q, Li Z, Jia J, Tang CK (2008) Fast image/video upsampling. ACM T Graph 27:1–5.

延伸閱讀